量子位
Search documents
论文自动变漫画PPT!Nano Banana同款用秘塔免费生成,还有一对一语音讲解
量子位· 2025-12-09 05:39
Core Viewpoint - The article discusses the innovative features of the AI tool "秘塔" (Mita), which offers a comic-style presentation generation similar to "Nano Banana 2," enhancing the learning experience by transforming complex texts into engaging visual formats [1][4][12]. Group 1: Product Features - Mita provides over 20 different styles for generating presentations, allowing users to choose their preferred visual format [5][18]. - The tool can convert academic papers and reports into clear, illustrated PowerPoint presentations with accompanying voice explanations, making learning more efficient [6][10][11]. - Users can upload their own materials or search for online resources, and the AI will automatically create a presentation based on the provided content [15][20]. Group 2: Accessibility and User Experience - Mita emphasizes a zero-threshold approach, offering free access without the need for complicated applications or waiting lists [8][48]. - The platform refreshes daily with 100 points, equivalent to 100 pages of PPT, which is sufficient for most users' learning needs [49][51]. - The tool is designed to facilitate self-learning, transforming the process of creating presentations from a burden into a shortcut for knowledge acquisition [56][57]. Group 3: Market Positioning - Unlike many competitors that focus on aesthetic templates and animations, Mita prioritizes internal input and user-driven learning scenarios [54][55]. - The product evolution reflects a commitment to reducing barriers to information access, moving from simple searches to deeper understanding and comprehension [58][59]. - Mita's approach represents a significant opportunity for users, as it aims to democratize knowledge acquisition through technology [60][61].
量子位编辑作者招聘
量子位· 2025-12-09 05:39
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are full-time and based in Beijing, with various levels of roles open for application [2][4]. Group 2: Job Responsibilities - **AI Industry Direction**: Focuses on innovations in infrastructure, including chips, AI infrastructure, and cloud computing [6]. - **AI Finance Direction**: Involves tracking venture capital and financial reports in the AI sector, monitoring capital movements within the industry [6]. - **AI Product Direction**: Concentrates on the application and hardware advancements in AI, including software applications and product evaluations [6]. Group 3: Benefits and Growth Opportunities - Employees will have the chance to engage with the latest AI technologies, enhance their work efficiency through new AI tools, and build personal influence by creating original content [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses [6]. Group 4: Company Achievements - As of 2025, Quantum Bit has over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sector according to third-party data platforms [12].
梁文锋,Nature全球年度十大科学人物!
量子位· 2025-12-09 01:21
Core Points - Liang Wenfeng has been recognized as one of the top ten scientists of 2025 by the prestigious journal Nature for his significant contributions to the AI field through the DeepSeek model [1][3] - DeepSeek's model has disrupted the AI industry by achieving remarkable cost-effectiveness and enhancing the global visibility of domestic large models [9][10] - The recent release of DeepSeek's V3.2 model has set a new benchmark in the Agent evaluation, marking a significant advancement in open-source models [11][12] Group 1: Recognition and Impact - Liang Wenfeng is described as a "Tech disruptor" by Nature, highlighting his dual identity as a financial expert and a pioneer in AI [4][5] - The introduction of DeepSeek has been a game-changer for the AI sector, proving that high-performance models can be developed without excessive data or resources [10][21] - The model's cost efficiency has positioned it as a competitive player in the global AI landscape [9] Group 2: Background of Liang Wenfeng - Liang Wenfeng was born in 1985 in Guangdong and excelled academically, earning a place at Zhejiang University [14][15] - He transitioned into quantitative investment in 2008, capitalizing on the emerging trend of quantitative trading in China [17][18] - In 2021, his firm became one of the largest quantitative private equity firms in China, prompting him to explore opportunities in large models [19][20] Group 3: Other Recognized Scientists - Mengran Du, another Chinese researcher, was also recognized for her groundbreaking work in deep-sea ecology [6][22] - Du's research led to the discovery of the deepest known animal ecosystems, challenging existing models of extreme life and carbon cycling [25][26] - Her academic journey includes significant contributions to deep-sea science and technology, with multiple publications in prestigious journals [33]
准确率腰斩!大模型视觉能力一出日常生活就「失灵」
量子位· 2025-12-09 01:21
Core Insights - The article discusses the limitations of existing Machine Learning Language Models (MLLMs) in specialized fields such as surgery, industry, extreme sports, and animal perspectives, highlighting the need for a new evaluation benchmark called EgoCross [1][3][9]. Group 1: EgoCross Benchmark - EgoCross is the first cross-domain egocentric video question-answering benchmark, covering four high-value professional fields and providing nearly a thousand high-quality QA pairs [3][9]. - The benchmark includes both closed-book (CloseQA) and open-book (OpenQA) evaluation formats, addressing a significant gap in the assessment of MLLMs [3][9]. Group 2: Model Evaluation and Findings - The research team tested eight mainstream MLLMs, revealing significant cross-domain shortcomings, with the best models achieving less than 55% accuracy in CloseQA and under 35% in OpenQA for cross-domain scenarios [4][12]. - The study found that models performed well in everyday activities but saw a drastic drop in accuracy when applied to specialized fields, with a notable decline from 73.58% in daily activities to 43.14% in cross-domain scenarios [12][18]. Group 3: Task Types and Challenges - The benchmark assesses four core tasks: identification, localization, prediction, and counting, with 15 sub-tasks designed to evaluate model capabilities comprehensively [11][12]. - Prediction tasks, such as forecasting the next action, showed a more significant decline in performance compared to basic identification tasks [18]. Group 4: Improvement Strategies - The research explored three improvement methods: prompt learning, supervised fine-tuning (SFT), and reinforcement learning (RL), with RL showing the most significant performance enhancement, averaging a 22% increase in CloseQA accuracy [15][14]. - SFT demonstrated nearly a 20% performance boost in the industrial domain, indicating the potential for targeted model training [15]. Group 5: Future Directions - The findings provide valuable insights into the current capabilities and limitations of large models, suggesting directions for developing more generalized multimodal systems [16][17].
看完最新国产AI写的公众号文章,我慌了!
量子位· 2025-12-08 12:00
Core Insights - The article discusses the capabilities of the newly upgraded AI model GLM-4.6V, highlighting its ability to generate comprehensive content, including articles and reports, from minimal input [8][10][27]. Group 1: AI Capabilities - GLM-4.6V can interpret academic papers and create structured articles by dividing content into logical sections such as introduction, core issues, and conclusions [4]. - The model can process images and tables, incorporating them into articles with appropriate captions, demonstrating its proficiency in visual content integration [5][7]. - It allows users to compare research papers or financial reports by generating visual and textual analyses quickly [16][22]. Group 2: Financial Analysis - The article provides a comparative analysis of Q3 2025 financial results for major companies, including Alphabet, Amazon, Meta, and Apple, showcasing their revenue and profit growth rates [19]. - Alphabet reported Q3 2025 revenue of 102.346 billion, a 16% increase from the previous year, while Amazon's revenue was 180.169 billion, reflecting a 13% growth [19]. - Meta experienced the highest growth rate at 26%, with Q3 2025 revenue of 51.242 billion, while Apple reported a 10% increase with revenue of 94.036 billion [19]. Group 3: Cost Efficiency - The pricing for using GLM-4.6V has been reduced by 50% compared to its predecessor, with input costs as low as 1 yuan per million tokens and output costs at 3 yuan per million tokens [39]. - This cost reduction enhances the model's accessibility for various applications, including document analysis and coding tasks [38][39]. Group 4: Technical Advancements - GLM-4.6V features a context window size of 128K tokens and has achieved state-of-the-art results in multiple multimodal benchmarks, indicating significant advancements in its technical capabilities [67]. - The model integrates function call capabilities into its architecture, enabling seamless transitions from visual perception to actionable tasks, which is crucial for real-world applications [69].
100万亿Token揭示今年AI趋势!硅谷的这份报告火了
量子位· 2025-12-08 11:36
Core Insights - The report titled "State of AI: An Empirical 100 Trillion Token Study with OpenRouter" analyzes the usage of over 300 models on the OpenRouter platform from November 2024 to November 2025, focusing on real token consumption rather than benchmark scores [3][6][8]. Group 1: Open Source vs. Closed Source Models - Open source models (OSS) have evolved from being seen as alternatives to closed source models to finding their unique positioning, becoming the preferred choice in specific scenarios [9]. - The relationship between open source and closed source models is now more complementary, with developers often using both types simultaneously [10]. - The usage of open source models is expected to reach approximately one-third by the end of 2025, with Chinese models experiencing significant growth from 1.2% to 30% in weekly usage share [12][13]. Group 2: Market Dynamics and Model Diversity - The dominance of DeepSeek as the largest contributor to open source model usage is diminishing as more models enter the market, leading to a diversified landscape [16]. - By the end of 2025, no single model is expected to maintain over 25% of token usage, with the market likely to be shared among 5 to 7 models [17][18]. - The report indicates a shift towards medium-sized models, which are gaining market favor, while small models are losing traction [20][21]. Group 3: Evolution of Model Functionality - Language models are transitioning from dialogue systems to reasoning and execution systems, with reasoning token usage surpassing 50% [22]. - The use of model invocation tools is increasing, indicating a more competitive and diverse ecosystem [29][31]. - AI models are evolving into "intelligent agents" capable of independently completing tasks rather than just responding to queries [43]. Group 4: Usage Patterns and User Retention - The complexity of tasks assigned to AI has increased, with users now requiring models to analyze extensive documents or codebases [35]. - The average input to models has quadrupled, reflecting a growing reliance on contextual information [36]. - The "glass slipper effect" describes how certain users become highly attached to models that perfectly meet their needs upon release, leading to high retention rates [67][70]. Group 5: Regional Insights and Market Trends - The share of paid usage in Asia has doubled from 13% to 31%, indicating a shift in the global AI landscape [71]. - North America's AI market share has declined to below 50%, while English remains dominant at 82%, with Simplified Chinese holding nearly 5% [80]. - The impact of model pricing on usage is less significant than expected, with a 10% price drop resulting in only a 0.5%-0.7% increase in usage [80].
小冰之父李笛智能体创业,公司取名Nextie!陆奇是股东
量子位· 2025-12-08 10:53
Core Viewpoint - The article discusses the emergence of a new startup called Nextie, founded by Li Di, who previously created the AI chatbot Xiaobing. The company aims to leverage "collective intelligence" to enhance AI cognition and decision-making processes, moving beyond traditional models. Group 1: Company Overview - Li Di, known for developing Xiaobing, has launched a new company named Nextie, which means "next journey" [4][7] - The core team of Nextie consists of key members from the Xiaobing project, including co-founder Zeng Min and algorithm head Wang Wenlan [4][45] - Nextie is currently planning to raise tens of millions of dollars in funding, with Qiji as one of the investors [5][8] Group 2: Technology and Innovation - Nextie aims to teach AI about "cognition" through a framework of collective intelligence, which allows multiple AI agents to collaborate and debate to reach better conclusions [11][12] - The company has compiled a comprehensive database of human papers from 1800 to 2020 to support its technology development [18] - Nextie's internal product, "Tuanzi," operates in two modes: a sister group for personal issues and a research group for academic inquiries [22][24] Group 3: Product Features - Tuanzi distinguishes itself from traditional AI by showcasing the interactions and debates among AI agents rather than relying on a single reasoning chain [24][30] - The product has achieved state-of-the-art (SOTA) results during internal testing, outperforming existing single large models [31][32] - Nextie plans to adopt a pricing model based on task outcomes rather than token usage, reflecting the varying value of tasks [33][35] Group 4: Future Prospects - The technology testing for Nextie is nearing completion, with a public launch expected on January 7 of the following year [36] - Li Di's transition to Nextie follows his departure from Xiaobing, where he remains a significant shareholder [41][42] - The article draws parallels between Li Di's new venture and Steve Jobs' NeXT, suggesting a potential for significant impact in the AI industry [62][63]
机器人集体到香港户外极限挑战,狗比人强
量子位· 2025-12-08 06:07
Group 1 - The core event of the ATEC2025 offline challenge was to encourage robots to complete tasks autonomously without remote control, showcasing their capabilities in real-world environments [1][8] - The competition featured various challenges including garbage sorting, autonomous watering, orienteering, and bridge crossing, emphasizing the importance of autonomous operation and limiting human intervention [10][16] - The event was organized by the Chinese University of Hong Kong and included participation from several prestigious institutions, with a panel of renowned robotics experts serving as judges [8][9] Group 2 - The competition revealed that quadruped robots (robot dogs) significantly outperformed bipedal robots (humanoid) in all tasks, particularly in outdoor orienteering where bipedal robots struggled due to their high center of gravity and fewer contact points [26][27][29] - Notably, the winning team, Zhejiang University Wongtsai, demonstrated exceptional performance in fully autonomous tasks, earning a prize of $150,000 [25][33] - The event highlighted the challenges faced by robots in outdoor environments, such as variations in lighting and wind, which can disrupt perception and task execution [37][39][42] Group 3 - The competition exposed several shortcomings in current robotic capabilities, particularly in multi-step reasoning and environmental adaptation, as robots often struggled to plan subsequent actions after completing a single task [46][56] - Many teams employed a strategy of decoupling upper body operations from lower body movements, leading to inefficiencies in task execution [50][51] - The event served as a testing ground for the future of robotics, pushing the industry to rethink how robotic capabilities are measured and improved in real-world scenarios [58][61]
量子位编辑作者招聘
量子位· 2025-12-08 06:07
编辑部 发自 凹非寺 量子位 | 公众号 QbitAI AI热潮还在汹涌,但如果你还不知道如何参与……那为什么不来 量子位 呢? 我们是一家以 追踪AI新进展 为核心的内容平台,经过8年积累,目前拥有顶流影响力,广泛且备受认可的产业资源,以及时代风口的最佳观 测和学习生态位。 目前,我们有 三大方向 岗位招聘,希望你是 (或者能成为) 这三个方向的内容专家: 岗位均为全职,工作地点:北京中关村。 岗位面向: 加入我们,你可以获得: 任职要求: 以下是岗位详情: 所有岗位不同能力层级职位均在开放,欢迎结合个人履历和经验申请。 AI产业方向 岗位职责: AI产业方向 :关注基建层创新,包含芯片、AI Infra、云计算; AI财经方向 :关注AI领域创投和财报,跟踪产业链资本动向; AI产品方向 :关注AI在应用和硬件终端方向的进展。 社招:覆盖编辑、主笔、主编各个层级,按能力匹配岗位; 校招:应届毕业生,接受实习且可转正。 站在AI浪潮之巅 :第一时间接触和了解AI领域最新技术和产品,构建完整的AI认知体系。 玩转AI新工具 :将各种AI新技术、新工具应用于工作,提升工作效率和创造力。 打造个人影响力 :通过撰 ...
哈萨比斯:DeepMind才是Scaling Law发现者,现在也没看到瓶颈
量子位· 2025-12-08 06:07
Core Insights - The article emphasizes the importance of Scaling Laws in achieving Artificial General Intelligence (AGI) and highlights Google's success with its Gemini 3 model as a validation of this approach [5][19][21]. Group 1: Scaling Laws and AGI - Scaling Laws were initially discovered by DeepMind, not OpenAI, and have been pivotal in guiding research directions in AI [12][14][18]. - Google DeepMind believes that Scaling Laws are essential for the development of AGI, suggesting that significant data and computational resources are necessary for achieving human-like intelligence [23][24]. - The potential for Scaling Laws to remain relevant for the next 500 years is debated, with some experts expressing skepticism about its long-term viability [10][11]. Group 2: Future AI Developments - In the next 12 months, AI is expected to advance significantly, particularly in areas such as complete multimodal integration, which allows seamless processing of various data types [27][28][30]. - Breakthroughs in visual intelligence are anticipated, exemplified by Google's Nano Banana Pro, which demonstrates advanced visual understanding [31][32]. - The proliferation of world models is a key focus, with notable projects like Genie 3 enabling interactive video generation [35][36]. - Improvements in the reliability of agent systems are expected, with agents becoming more capable of completing assigned tasks [38][39]. Group 3: Gemini 3 and Its Capabilities - Gemini 3 aims to be a universal assistant, showcasing personalized depth in responses and the ability to generate commercial-grade games quickly [41][44][45]. - The architecture of Gemini 3 allows it to understand high-level instructions and produce detailed outputs, indicating a significant leap in intelligence and practicality [46]. - The frequency of Gemini's use is projected to become as common as smartphone usage, integrating seamlessly into daily life [47].