Workflow
量子位
icon
Search documents
马斯克罕见低头:开源𝕏推荐算法,自嘲“很烂”不过未来月更
量子位· 2026-01-21 04:09
Core Viewpoint - GitHub has made Elon Musk's open-source recommendation algorithm system fully visible, which is primarily driven by AI models [1][2] Group 1: Algorithm Transparency and Community Reaction - The open-source announcement has generated significant excitement within the community, with many praising the transparency of the system [2] - Musk acknowledged the algorithm's shortcomings, stating it is "dumb" and requires substantial improvements, but emphasized the importance of transparency in the improvement process [4][5] - Musk has consistently criticized the previous platform's lack of openness and has followed through on his promise to publicly share Twitter's core recommendation algorithm since the acquisition [6][7] Group 2: Algorithm Mechanism - The recommendation system is built on a Transformer architecture similar to Grok-1, which learns from users' historical interactions (likes, replies, retweets) to recommend content [9] - The system begins by identifying the user and their recent activities, aiming to create a "real-time user profile" without pre-set assumptions [12][13] - Two types of user information are collected: Action Sequences (direct interest signals) and Features (long-term attributes) [14] Group 3: Content Filtering and Scoring - The algorithm filters through a vast amount of tweets to select a few thousand potentially relevant ones, using both familiar and external sources [16][17] - The system employs a Hydration module to complete candidate tweet information and a Filtering module to eliminate unwanted content [21][22] - The final scoring is done by a Phoenix ranking model, which predicts various user interactions and assigns scores based on weighted combinations of these predictions [25][26] Group 4: Key Features of the System - The system is purely data-driven, rejecting manual rules and allowing AI models to learn directly from raw user data [33] - It utilizes a candidate isolation mechanism to ensure independent scoring of each piece of content [34] - The algorithm predicts multiple user behaviors rather than providing a single recommendation score [36] - The modular design of the system supports rapid iteration and development [37] Group 5: Acknowledgment of Limitations - Despite the praise for transparency, the algorithm has been criticized for certain flaws, such as the lack of a time decay mechanism for "block" signals, which could negatively impact account recommendations [39][41] - Musk himself acknowledged the algorithm's deficiencies, indicating a commitment to ongoing improvements and updates every four weeks [42][44]
世界模型+强化学习=具身智能性能翻倍!清华&加州伯克利最新开源
量子位· 2026-01-21 04:09
BOOM团队 投稿 量子位 | 公众号 QbitAI 在具身智能 (Embodied AI) 的快速发展中, 样本效率 已成为制约智能体从实验室环境走向复杂开放世界的瓶颈问题。 不同于纯数字域的对话任务, 具身任务 通常涉及极度复杂的物理环境感知以及高维度的连续控制输出,这意味着智能体面临着巨大的状态- 动作搜索空间,导致学习效率低下且难以收敛。 传统的无模型强化学习由于缺乏对底层物理逻辑的理解,完全依赖于海量的盲目试错来获取学习信号。 然而,在现实物理世界中,每一次交互都伴随着不可忽视的时间损耗、高昂的硬件维护成本以及潜在的安全风险,这使得动辄数亿次的交互 需求变得极不现实。 在线规划能够让智能体在环境交互前通过模拟未来轨迹来优化动作,显著提升强化学习的样本效率。 为了应对这一挑战, 世界模型强化学习 (World Model RL) 研究应运而生。 其核心范式在于通过额外学习一个能够表征环境内在转移规律的预测模型,使智能体具备在想象空间中进行自我进化的能力。 这种机制允许智能体在潜空间内进行大规模、低成本的轨迹预演与策略优化,从而显著降低对环境交互的依赖,加速具身智能机器人的落地 应用。 在世界模型强化学 ...
2026年OpenAI最看好的3个方向
量子位· 2026-01-21 04:09
Core Insights - The podcast featuring OpenAI's CFO Sarah Friar and investor Vinod Khosla discusses AI trends for 2026, emphasizing the emergence of multi-agent systems and the transformative impact of AI on various industries, including healthcare and embodied intelligence [1][3][5]. Group 1: AI Trends and Predictions - 2026 is identified as the year of multi-agent systems, which will mature and have a significant impact on both enterprise and consumer applications [9][10]. - The correlation between computing power and revenue is highlighted, indicating that increased investment in computing power leads to enhanced model capabilities and revenue growth [6][20]. - The true measure of the AI bubble is API call volume, not stock price fluctuations, suggesting that the AI sector is not in a bubble but rather experiencing genuine productivity gains [7][32][33]. Group 2: Technological Advancements - Significant improvements in large model capabilities, including memory, continuous learning, and hallucination suppression, are expected [14]. - The gap between technical capabilities and user experience is anticipated to narrow, allowing AI to evolve from simple chatbots to effective task executors [16][17]. - The healthcare sector is projected to undergo revolutionary changes due to AI, enhancing doctors' access to research and improving patient interactions [40][41]. Group 3: Economic Implications - A large-scale deflationary economic era is predicted within the next decade as AI integration reduces labor costs and the costs of professional knowledge [8][49][50]. - The potential for robots to surpass the automotive industry in market size is noted, particularly in addressing human loneliness and providing companionship [45][46]. Group 4: Business Strategies and Models - OpenAI is focusing on a multi-faceted transformation, including infrastructure diversification, product expansion, and innovative business models such as tiered subscription services and advertising [27][30][31]. - The company emphasizes the importance of computing power as a foundational infrastructure for AI, with a strong positive correlation between investment in computing and revenue generation [19][21][24].
MiniMax把自家“实习生”放出来了!
量子位· 2026-01-20 13:04
Core Insights - The article discusses the evolution of AI agents, emphasizing the need for them to deeply integrate into work environments and understand professional contexts to become effective long-term partners [3][29]. Group 1: AI Agent Evolution - Traditional workflows that separate demand, design, and code are rapidly dissolving [1]. - The new MiniMax AI-native workspace, Agent 2.0, is designed to act as a reliable partner by directly accessing local resources and adhering to established workflows [4][8]. - The update focuses on two core components: Desktop App for execution and Expert Agents for understanding business contexts [5][24]. Group 2: Desktop App Functionality - The Desktop App connects cloud capabilities directly to local computers, enabling it to read files and perform various tasks seamlessly [6][7]. - It can autonomously retrieve local resources, eliminating the need for users to manually input information [8]. - A complex task was designed to test the Desktop App's capabilities, requiring it to gather detailed information on 20 products and generate a comprehensive report and presentation [12][22]. Group 3: Expert Agents - Expert Agents allow for the injection of private knowledge and experience into the AI system, enabling it to understand specific business logic [26]. - This approach addresses the limitations of general models in handling highly specialized tasks [25]. Group 4: Long-term Partnership with Agents - The ultimate goal is for agents to evolve into long-term partners capable of delivering results by fully embedding themselves in the work environment [29]. - Key capabilities include continuous memory, the ability to internalize implicit experiences, and a keen awareness of the business environment [31][33][35]. Group 5: Real-world Applications - The article illustrates practical applications of Agent 2.0 in various departments, showcasing its ability to generate customized emails, modify website code, and analyze system alerts [36][37][39]. - The release of Agent 2.0 standardizes a high-efficiency production model that has already been successfully implemented within MiniMax [40][41].
豆包的新身份曝光:在国际艺术展当起了“AI讲解员”
量子位· 2026-01-20 10:04
Core Viewpoint - The article discusses the innovative use of AI, specifically the Doubao model, as an art exhibition guide, showcasing its advanced capabilities in real-time visual understanding and interaction with users [1][38]. Group 1: AI Capabilities - Doubao, the AI guide, demonstrated the ability to identify and recommend key artworks in a high-density exhibition environment, effectively filtering important pieces for the user [10][11]. - The AI's real-time visual perception allows it to continuously understand the presented images during video calls, providing seamless explanations of artworks without requiring additional user input [14][15]. - Doubao can autonomously search for additional information during the interaction, enriching the conversation with deeper insights about the artworks being discussed [20][22]. Group 2: Model Performance - The Doubao model 1.8 exhibits superior multi-modal processing capabilities, significantly improving its performance in visual understanding tasks compared to previous versions [24][25]. - In various benchmark tests, Doubao 1.8 outperformed other leading models in areas such as reasoning, visual comprehension, and real-time interaction, establishing itself in the top tier of AI models [26][34]. - The model's ability to handle complex instructions and maintain logical coherence during dynamic interactions highlights its advanced capabilities in practical applications [36][37]. Group 3: User Experience - The interaction with Doubao feels natural and human-like, enhancing the overall user experience during art exhibitions by providing a continuous flow of information and engagement [36][40]. - The AI's role in real-life scenarios, such as guiding users through exhibitions, signifies a shift towards more integrated and useful AI applications in everyday life [39][41].
量子位编辑作者招聘
量子位· 2026-01-20 04:17
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as producing accessible reports on technical conferences and papers [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and analyzing capital movements within the AI industry, including interviews with investors and entrepreneurs [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, writing in-depth product evaluations, and engaging with product experts [11]. Group 3: Benefits and Work Environment - Employees can expect a vibrant team atmosphere, opportunities for personal influence through original content creation, and professional mentorship from senior editors [6][11]. - The company offers competitive salaries and comprehensive benefits, including social insurance, meal allowances, and performance bonuses [6]. Group 4: Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
从「能用」到「好用」:数据可视化的三个维度,你还在第一层吗?——人大提出图表创作新方式
量子位· 2026-01-20 04:17
Core Insights - The article discusses the evolution of data visualization from merely creating charts to addressing deeper challenges such as enhancing visual appeal and storytelling through dynamic data representation [2][9] - It highlights the need for tools that can streamline the process of creating visually engaging and interactive data presentations, moving beyond traditional methods that are often labor-intensive and not easily reusable [10][12] Group 1: Challenges in Data Visualization - The first challenge is creating visually appealing data representations without excessive manual effort, which often leads to time-consuming processes in design software [2][3][4] - The second challenge involves animating data visualizations, where the complexity of coding and limited flexibility in templates can deter users from implementing dynamic features [5][6] - The third challenge is the repetitive nature of implementing interactive features across different visualization types, which often requires starting from scratch with each new project [7][8] Group 2: Proposed Solutions - The IDEAS Lab team has developed three key projects: PiCCL for enhancing static chart creation, CAST for simplifying animation processes, and Libra for improving interactive capabilities [11][12][13] - PiCCL redefines the creation of static charts by focusing on graphic operations and constraints, allowing for more efficient and reusable designs [20][21][23] - CAST introduces a declarative model for animation that emphasizes data-driven timing structures, making it easier to create complex animations without extensive coding [28][35][36] Group 3: Enhancements in Interactivity - Libra aims to treat interactivity as a first-class citizen by breaking it down into reusable components, enhancing the ability to create complex interactions without starting from scratch [39][45] - The system supports features like undo/redo and provides a structured approach to managing interactions, making it easier to implement and maintain [42][43] - By leveraging the capabilities of PiCCL, CAST, and Libra, the future of data visualization is expected to incorporate more efficient and user-friendly tools, potentially utilizing large models for enhanced visualization generation [44]
首个真正“能用”的LLM游戏Agent诞生!可实时高频决策,思维链还全程可见
量子位· 2026-01-20 04:17
Core Viewpoint - The article discusses the emergence of AI in the gaming industry, highlighting the capabilities of a new AI agent called COTA developed by Chao Can Shu Technology, which demonstrates advanced decision-making and operational skills in gaming environments [1][6][55]. Group 1: AI in Gaming - A mysterious gaming account named "快递员" has gained significant attention for its impressive performance in League of Legends, raising questions about the role of AI in gaming [2][4]. - The gaming industry is increasingly focusing on AI, with various companies exploring this technology to enhance gaming experiences [6][7]. - Chao Can Shu Technology has successfully commercialized AI agents across multiple game types, showcasing their expertise in this field [8][9]. Group 2: COTA's Features and Performance - COTA is described as a versatile gaming agent capable of cognitive reasoning, operational execution, tactical planning, and assistance, all powered by a large model [9][10]. - The agent has demonstrated professional-level performance in a first-person shooter (FPS) game demo, where it must make rapid decisions in high-stakes environments [12][13]. - COTA's design allows it to perform complex actions fluidly, simulating human-like gameplay while maintaining high levels of strategy and decision-making [28][34]. Group 3: Technical Innovations - COTA employs a dual-system architecture that separates fast action execution from deep analysis, mimicking human cognitive processes [40][41]. - The agent utilizes a base model called Qwen3-VL-8B-Thinking, balancing performance and efficiency to meet the demands of real-time gaming [39]. - COTA's training pipeline includes stages for supervised fine-tuning, self-play for strategy optimization, and alignment with human preferences, enhancing its gameplay realism [50][51][52]. Group 4: Industry Implications - COTA represents a significant advancement in AI gaming technology, indicating a shift from experimental models to practical applications in the gaming industry [55][56]. - The success of COTA suggests a broader trend where AI agents are becoming integral to enhancing player experiences and game design [57][59]. - The potential applications of COTA extend beyond gaming, offering insights into solving complex real-world problems through its innovative architecture [72][76].
谷歌新发现:DeepSeek推理分裂出多重人格,左右脑互搏越来越聪明
量子位· 2026-01-20 04:17
Core Insights - The article discusses how advanced AI models like DeepSeek-R1 exhibit a phenomenon where they internally "split" into different virtual personas during problem-solving, resembling a debate or discussion among various character types [1][7][13] - This internal dialogue enhances the model's ability to tackle complex tasks, as the conflict of perspectives leads to a more comprehensive examination of solutions [4][11] Group 1: AI Internal Dynamics - AI models develop distinct virtual roles, such as creative, critical, and execution-oriented personas, which contribute to diverse problem-solving approaches [8][9] - The intensity of internal discussions increases significantly when faced with challenging tasks, while simpler tasks see a reduction in this internal dialogue [4][5] Group 2: Research Methodology - Researchers utilized Sparse Autoencoders (SAE) to decode the AI's reasoning process, successfully identifying the internal dialogues by analyzing the activation patterns of hidden layer neurons [14][17] - The study involved extracting and categorizing the AI's thought processes during complex reasoning tasks, leading to the identification of various logical entities within the model [15][18] Group 3: Performance Insights - The dialogue-driven behavior of reasoning models like DeepSeek-R1 occurs more frequently compared to standard instruction models, indicating a correlation between conversational dynamics and reasoning accuracy [19] - Enhancements in dialogue features, such as emphasizing expressions of surprise, significantly improved the model's accuracy in arithmetic reasoning tasks, doubling the success rate from 27.1% to 54.8% [21] Group 4: Training Implications - The research highlights that models can learn to adopt dialogue-based thinking without explicit training signals, showing that reinforcement learning can lead to faster improvements when using multi-agent dialogue data [24] - In early training stages, models fine-tuned with dialogue data outperformed those trained with monologue data by over 10%, with the gap widening to 22% in later stages [24]
智谱新模型也用DeepSeek的MLA,苹果M5就能跑
量子位· 2026-01-20 04:17
Core Viewpoint - The article discusses the launch of the new lightweight language model GLM-4.7-Flash by Zhipu AI, which aims to replace its predecessor GLM-4.5-Flash and is available for free API access. Group 1: Model Specifications - GLM-4.7-Flash features a total of 30 billion parameters, with only 3 billion activated during inference, significantly reducing computational costs while maintaining performance [4][10]. - The model is designed as a mixed expert (MoE) architecture, specifically positioned for local programming and intelligent assistant tasks [4][9]. - It achieved a score of 59.2 in the SWE-bench Verified code repair test, outperforming similar models like Qwen3-30B and GPT-OSS-20B [4]. Group 2: Performance and Applications - The model is optimized for efficiency and retains core capabilities in coding and reasoning from the GLM-4 series [7]. - Besides programming, GLM-4.7-Flash is recommended for creative writing, translation, long-context tasks, and role-playing scenarios [8]. - Initial tests on a 32GB unified memory Apple laptop showed a speed of 43 tokens per second [17]. Group 3: Technical Innovations - The introduction of the MLA (Multi-head Latent Attention) architecture marks a significant advancement, previously validated by DeepSeek-v2 [12]. - The model's structure is similar in depth to GLM-4.5 Air and Qwen3-30B-A3B, but it utilizes 64 experts, activating only 5 during inference [13]. Group 4: Market Position and Pricing - GLM-4.7-Flash is offered for free on the official API platform, with a high-speed version available at a low cost [19]. - Compared to similar models, GLM-4.7-Flash has advantages in context length support and output token pricing, although latency and throughput require further optimization [19].