量子位
Search documents
MiniMax把自家“实习生”放出来了!
量子位· 2026-01-20 13:04
Core Insights - The article discusses the evolution of AI agents, emphasizing the need for them to deeply integrate into work environments and understand professional contexts to become effective long-term partners [3][29]. Group 1: AI Agent Evolution - Traditional workflows that separate demand, design, and code are rapidly dissolving [1]. - The new MiniMax AI-native workspace, Agent 2.0, is designed to act as a reliable partner by directly accessing local resources and adhering to established workflows [4][8]. - The update focuses on two core components: Desktop App for execution and Expert Agents for understanding business contexts [5][24]. Group 2: Desktop App Functionality - The Desktop App connects cloud capabilities directly to local computers, enabling it to read files and perform various tasks seamlessly [6][7]. - It can autonomously retrieve local resources, eliminating the need for users to manually input information [8]. - A complex task was designed to test the Desktop App's capabilities, requiring it to gather detailed information on 20 products and generate a comprehensive report and presentation [12][22]. Group 3: Expert Agents - Expert Agents allow for the injection of private knowledge and experience into the AI system, enabling it to understand specific business logic [26]. - This approach addresses the limitations of general models in handling highly specialized tasks [25]. Group 4: Long-term Partnership with Agents - The ultimate goal is for agents to evolve into long-term partners capable of delivering results by fully embedding themselves in the work environment [29]. - Key capabilities include continuous memory, the ability to internalize implicit experiences, and a keen awareness of the business environment [31][33][35]. Group 5: Real-world Applications - The article illustrates practical applications of Agent 2.0 in various departments, showcasing its ability to generate customized emails, modify website code, and analyze system alerts [36][37][39]. - The release of Agent 2.0 standardizes a high-efficiency production model that has already been successfully implemented within MiniMax [40][41].
豆包的新身份曝光:在国际艺术展当起了“AI讲解员”
量子位· 2026-01-20 10:04
Core Viewpoint - The article discusses the innovative use of AI, specifically the Doubao model, as an art exhibition guide, showcasing its advanced capabilities in real-time visual understanding and interaction with users [1][38]. Group 1: AI Capabilities - Doubao, the AI guide, demonstrated the ability to identify and recommend key artworks in a high-density exhibition environment, effectively filtering important pieces for the user [10][11]. - The AI's real-time visual perception allows it to continuously understand the presented images during video calls, providing seamless explanations of artworks without requiring additional user input [14][15]. - Doubao can autonomously search for additional information during the interaction, enriching the conversation with deeper insights about the artworks being discussed [20][22]. Group 2: Model Performance - The Doubao model 1.8 exhibits superior multi-modal processing capabilities, significantly improving its performance in visual understanding tasks compared to previous versions [24][25]. - In various benchmark tests, Doubao 1.8 outperformed other leading models in areas such as reasoning, visual comprehension, and real-time interaction, establishing itself in the top tier of AI models [26][34]. - The model's ability to handle complex instructions and maintain logical coherence during dynamic interactions highlights its advanced capabilities in practical applications [36][37]. Group 3: User Experience - The interaction with Doubao feels natural and human-like, enhancing the overall user experience during art exhibitions by providing a continuous flow of information and engagement [36][40]. - The AI's role in real-life scenarios, such as guiding users through exhibitions, signifies a shift towards more integrated and useful AI applications in everyday life [39][41].
量子位编辑作者招聘
量子位· 2026-01-20 04:17
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as producing accessible reports on technical conferences and papers [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and analyzing capital movements within the AI industry, including interviews with investors and entrepreneurs [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, writing in-depth product evaluations, and engaging with product experts [11]. Group 3: Benefits and Work Environment - Employees can expect a vibrant team atmosphere, opportunities for personal influence through original content creation, and professional mentorship from senior editors [6][11]. - The company offers competitive salaries and comprehensive benefits, including social insurance, meal allowances, and performance bonuses [6]. Group 4: Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
从「能用」到「好用」:数据可视化的三个维度,你还在第一层吗?——人大提出图表创作新方式
量子位· 2026-01-20 04:17
Core Insights - The article discusses the evolution of data visualization from merely creating charts to addressing deeper challenges such as enhancing visual appeal and storytelling through dynamic data representation [2][9] - It highlights the need for tools that can streamline the process of creating visually engaging and interactive data presentations, moving beyond traditional methods that are often labor-intensive and not easily reusable [10][12] Group 1: Challenges in Data Visualization - The first challenge is creating visually appealing data representations without excessive manual effort, which often leads to time-consuming processes in design software [2][3][4] - The second challenge involves animating data visualizations, where the complexity of coding and limited flexibility in templates can deter users from implementing dynamic features [5][6] - The third challenge is the repetitive nature of implementing interactive features across different visualization types, which often requires starting from scratch with each new project [7][8] Group 2: Proposed Solutions - The IDEAS Lab team has developed three key projects: PiCCL for enhancing static chart creation, CAST for simplifying animation processes, and Libra for improving interactive capabilities [11][12][13] - PiCCL redefines the creation of static charts by focusing on graphic operations and constraints, allowing for more efficient and reusable designs [20][21][23] - CAST introduces a declarative model for animation that emphasizes data-driven timing structures, making it easier to create complex animations without extensive coding [28][35][36] Group 3: Enhancements in Interactivity - Libra aims to treat interactivity as a first-class citizen by breaking it down into reusable components, enhancing the ability to create complex interactions without starting from scratch [39][45] - The system supports features like undo/redo and provides a structured approach to managing interactions, making it easier to implement and maintain [42][43] - By leveraging the capabilities of PiCCL, CAST, and Libra, the future of data visualization is expected to incorporate more efficient and user-friendly tools, potentially utilizing large models for enhanced visualization generation [44]
首个真正“能用”的LLM游戏Agent诞生!可实时高频决策,思维链还全程可见
量子位· 2026-01-20 04:17
Core Viewpoint - The article discusses the emergence of AI in the gaming industry, highlighting the capabilities of a new AI agent called COTA developed by Chao Can Shu Technology, which demonstrates advanced decision-making and operational skills in gaming environments [1][6][55]. Group 1: AI in Gaming - A mysterious gaming account named "快递员" has gained significant attention for its impressive performance in League of Legends, raising questions about the role of AI in gaming [2][4]. - The gaming industry is increasingly focusing on AI, with various companies exploring this technology to enhance gaming experiences [6][7]. - Chao Can Shu Technology has successfully commercialized AI agents across multiple game types, showcasing their expertise in this field [8][9]. Group 2: COTA's Features and Performance - COTA is described as a versatile gaming agent capable of cognitive reasoning, operational execution, tactical planning, and assistance, all powered by a large model [9][10]. - The agent has demonstrated professional-level performance in a first-person shooter (FPS) game demo, where it must make rapid decisions in high-stakes environments [12][13]. - COTA's design allows it to perform complex actions fluidly, simulating human-like gameplay while maintaining high levels of strategy and decision-making [28][34]. Group 3: Technical Innovations - COTA employs a dual-system architecture that separates fast action execution from deep analysis, mimicking human cognitive processes [40][41]. - The agent utilizes a base model called Qwen3-VL-8B-Thinking, balancing performance and efficiency to meet the demands of real-time gaming [39]. - COTA's training pipeline includes stages for supervised fine-tuning, self-play for strategy optimization, and alignment with human preferences, enhancing its gameplay realism [50][51][52]. Group 4: Industry Implications - COTA represents a significant advancement in AI gaming technology, indicating a shift from experimental models to practical applications in the gaming industry [55][56]. - The success of COTA suggests a broader trend where AI agents are becoming integral to enhancing player experiences and game design [57][59]. - The potential applications of COTA extend beyond gaming, offering insights into solving complex real-world problems through its innovative architecture [72][76].
谷歌新发现:DeepSeek推理分裂出多重人格,左右脑互搏越来越聪明
量子位· 2026-01-20 04:17
Core Insights - The article discusses how advanced AI models like DeepSeek-R1 exhibit a phenomenon where they internally "split" into different virtual personas during problem-solving, resembling a debate or discussion among various character types [1][7][13] - This internal dialogue enhances the model's ability to tackle complex tasks, as the conflict of perspectives leads to a more comprehensive examination of solutions [4][11] Group 1: AI Internal Dynamics - AI models develop distinct virtual roles, such as creative, critical, and execution-oriented personas, which contribute to diverse problem-solving approaches [8][9] - The intensity of internal discussions increases significantly when faced with challenging tasks, while simpler tasks see a reduction in this internal dialogue [4][5] Group 2: Research Methodology - Researchers utilized Sparse Autoencoders (SAE) to decode the AI's reasoning process, successfully identifying the internal dialogues by analyzing the activation patterns of hidden layer neurons [14][17] - The study involved extracting and categorizing the AI's thought processes during complex reasoning tasks, leading to the identification of various logical entities within the model [15][18] Group 3: Performance Insights - The dialogue-driven behavior of reasoning models like DeepSeek-R1 occurs more frequently compared to standard instruction models, indicating a correlation between conversational dynamics and reasoning accuracy [19] - Enhancements in dialogue features, such as emphasizing expressions of surprise, significantly improved the model's accuracy in arithmetic reasoning tasks, doubling the success rate from 27.1% to 54.8% [21] Group 4: Training Implications - The research highlights that models can learn to adopt dialogue-based thinking without explicit training signals, showing that reinforcement learning can lead to faster improvements when using multi-agent dialogue data [24] - In early training stages, models fine-tuned with dialogue data outperformed those trained with monologue data by over 10%, with the gap widening to 22% in later stages [24]
智谱新模型也用DeepSeek的MLA,苹果M5就能跑
量子位· 2026-01-20 04:17
Core Viewpoint - The article discusses the launch of the new lightweight language model GLM-4.7-Flash by Zhipu AI, which aims to replace its predecessor GLM-4.5-Flash and is available for free API access. Group 1: Model Specifications - GLM-4.7-Flash features a total of 30 billion parameters, with only 3 billion activated during inference, significantly reducing computational costs while maintaining performance [4][10]. - The model is designed as a mixed expert (MoE) architecture, specifically positioned for local programming and intelligent assistant tasks [4][9]. - It achieved a score of 59.2 in the SWE-bench Verified code repair test, outperforming similar models like Qwen3-30B and GPT-OSS-20B [4]. Group 2: Performance and Applications - The model is optimized for efficiency and retains core capabilities in coding and reasoning from the GLM-4 series [7]. - Besides programming, GLM-4.7-Flash is recommended for creative writing, translation, long-context tasks, and role-playing scenarios [8]. - Initial tests on a 32GB unified memory Apple laptop showed a speed of 43 tokens per second [17]. Group 3: Technical Innovations - The introduction of the MLA (Multi-head Latent Attention) architecture marks a significant advancement, previously validated by DeepSeek-v2 [12]. - The model's structure is similar in depth to GLM-4.5 Air and Qwen3-30B-A3B, but it utilizes 64 experts, activating only 5 during inference [13]. Group 4: Market Position and Pricing - GLM-4.7-Flash is offered for free on the official API platform, with a high-speed version available at a low cost [19]. - Compared to similar models, GLM-4.7-Flash has advantages in context length support and output token pricing, although latency and throughput require further optimization [19].
算力越高收入越多!OpenAI率先验证AI商业Scaling Law:最新收入200亿美元
量子位· 2026-01-20 01:34
Core Viewpoint - OpenAI's revenue has significantly increased, with annual recurring revenue (ARR) rising from $2 billion to $20 billion over two years, indicating a strong growth trajectory despite high operational costs [2][12]. Revenue and Growth - OpenAI's ARR has surged to $20 billion, reflecting a tenfold increase in revenue projected from 2023 to 2025, alongside a 9.5-fold increase in computing power [2][13]. - The relationship between computing power and revenue is emphasized, where increased investment in computing drives research and model capabilities, leading to higher revenue, which in turn supports further investment [9][12]. Comparison with Competitors - In comparison to a competitor (Claude's parent company), OpenAI's computing power and ARR are significantly larger, with projections showing a growth from 0.2 GW and $2 billion in 2023 to 1.9 GW and over $20 billion by 2025 [14][17]. Operational Costs - OpenAI's operational costs are substantial, with an estimated $7 billion spent on computing resources in 2024, primarily through cloud services from Microsoft [21][22]. - The company is also investing heavily in building its own AI data centers, indicating a long-term strategy to manage costs and enhance capabilities [18][19]. Business Model and Future Plans - OpenAI's business model is evolving, with the introduction of advertising aimed at providing decision support in commercial scenarios, alongside subscription services and API usage [27][30]. - The company plans to launch its first hardware product in the second half of 2026, which is expected to further integrate into its revenue-computing cycle [33][34].
定位大模型「作弊」神经回路!新研究首次揭示:虚假奖励如何精准激活第18-20层记忆
量子位· 2026-01-20 01:34
Core Insights - The article discusses the phenomenon of "Spurious Rewards" in large language models (LLMs) and how they can enhance accuracy even with false reward signals during training [1][2] - It highlights the concept of "Perplexity Paradox," where models show decreased perplexity for answers but increased perplexity for questions, indicating a trade-off between general understanding and specific memorization [3][6] Group 1: Key Findings - The research team identified that the model's internal memory shortcuts are activated by false RLVR, leading to a more efficient retrieval of contaminated knowledge rather than genuine learning [1][6] - The critical memory nodes are located in layers 18-20, which serve as functional anchors for retrieving memorized answers [10][20] - The study utilized various analytical methods, including Path Patching and Jensen-Shannon Divergence (JSD), to pinpoint the layers responsible for memory retrieval and structural adaptation [9][15] Group 2: Mechanisms and Dynamics - The research demonstrated that the model's decision-making process occurs at layers 18-20, where it chooses between reasoning paths and memory shortcuts [23] - The introduction of Neural ODEs allowed the team to model the continuous evolution of hidden states, confirming that separation forces peak at the critical layers [21] - The team successfully manipulated memory retrieval by scaling the activation of specific neurons, demonstrating a dose-dependent relationship in memory retrieval accuracy [25][30] Group 3: Implications and Future Directions - The findings provide new tools for evaluating RLVR effectiveness, suggesting that improvements may be illusory if they stem from memory activation circuits [36] - The research opens new avenues for detecting data contamination through internal neural activation patterns, moving beyond traditional statistical methods [38] - It proposes controllable methods for reducing reliance on contaminated knowledge without retraining the model, paving the way for new techniques in reasoning and decontamination [39]
ChatGPT强行上马广告,因为OpenAI真的太烧钱
量子位· 2026-01-19 09:30
Core Viewpoint - OpenAI is facing a financial crisis, prompting the introduction of advertising in ChatGPT as a potential solution to generate revenue and avoid bankruptcy [7][15][51]. Financial Situation - OpenAI is projected to run out of funds within 18 months, with reports indicating a possible acquisition by larger companies like Microsoft or Amazon [7][15]. - The company raised $40 billion in funding last year, but its expenses are significantly higher, with projected annual burn rates exceeding $8 billion in 2025 and reaching $40 billion by 2028 [10][13]. - OpenAI's revenue for the previous year was only $20 billion, highlighting a substantial financial gap compared to its expenditures [15]. - The AI industry is estimated to have a funding shortfall of at least $800 billion, exacerbating OpenAI's financial challenges [15][16]. Advertising Strategy - OpenAI plans to test advertising in the free version of ChatGPT, marking a shift from a subscription-based revenue model to include advertising income [26][28]. - The ads will be labeled as "sponsored content" and will not affect the objectivity of ChatGPT's responses [27][29]. - OpenAI anticipates generating "low billions" in revenue from advertising by 2026, with plans to scale this income source over time [22][28]. Business Model Expansion - The introduction of advertising is part of OpenAI's broader commercial strategy, which includes subscription services and API usage-based billing [25][41]. - OpenAI's CFO emphasized that the business model should expand in line with the value created by its intelligence [36]. - Future revenue growth is expected to come from various sources, including subscriptions, API usage, and potential new pricing models as AI technology evolves [41]. User Engagement and Growth - OpenAI's weekly and daily active user metrics are at all-time highs, driven by a cycle of investment in computing power, research, and product development [43][44]. - The company is experiencing a 9.5 times increase in computing power from 2023 to 2025, with revenue growth projected to match this increase [46][55].