Workflow
量子位
icon
Search documents
开源编程模型王座易主了,谁能想到新SOTA是快手
量子位· 2025-10-11 06:04
Core Insights - The article highlights the emergence of KAT-Dev-72B-Exp from Kuaishou as the leading open-source programming model, achieving a score of 74.6% on the SWE-Bench certification leaderboard [1][4]. Group 1: Model Performance - KAT-Dev-72B-Exp is an experimental reinforcement learning version of the KAT-Coder model, which has also outperformed GPT-5 (non-Codex mode) and Claude 4 Sonnet on the SWE-Bench certification [3][4]. - KAT-Coder demonstrates capabilities such as recreating a complete version of the game "Fruit Ninja" within a web environment, including scoring and life systems [6]. Group 2: Visualization and Interaction - The model excels in visualizing physical laws through code, with examples including a cyberpunk clock that triggers explosion effects and a solar system simulation created using three.js [10][13]. - KAT-Coder can generate interactive effects and animations that adhere to real physical principles, such as a 60-story building collapse simulation [15]. Group 3: Key Technologies - KAT-Coder employs multiple training phases, including mid-training, supervised fine-tuning (SFT), and reinforcement fine-tuning (RFT), leading to emergent behaviors in the model [17][25]. - The model's interaction count required to complete tasks decreased by 32% after reinforcement learning, indicating improved efficiency [26]. Group 4: Industrial-Grade Framework - Kuaishou's self-developed industrial-grade reinforcement learning framework, SeamlessFlow, supports complex scenarios like multi-agent and online reinforcement learning [28][29]. - SeamlessFlow has shown a 100% throughput improvement in single-round RL tasks and a 62% reduction in overall training time compared to mainstream VERL frameworks [35]. Group 5: Training Optimization - The introduction of Trie Packing mechanism and the restructuring of the training engine allow KAT-Dev-72B-Exp to efficiently train on shared prefix trajectories, achieving an average speed increase of 2.5 times [37].
腾讯开源强化学习新算法!让智能体无需专家示范就“自学成才”,还即插即用零成本接入
量子位· 2025-10-11 06:04
Youtu-Agent 团队 投稿 量子位 | 公众号 QbitAI 让智能体自己摸索新方法,还模仿自己的成功经验。 腾讯优图实验室 开源 强化学习算法—— SPEAR (Self-imitation with Progressive Exploration for Agentic Reinforcement Learning)。 主打一个让AI自学成才! 该算法首次让大语言模型(LLM)驱动的智能体在无需大量专家示范的情况下,通过"自我模仿+渐进探索"实现熵稳定的学习过程。 在ALFWorld、WebShop、AIME24/25等基准上 平均提升16%以上 ,刷新业界最佳成绩,为长周期、稀疏奖励场景下的智能体训练提供了 即插即用 的新范式。 △ SPEAR算法核心概念示意图 简单来说,SPEAR算法既能大胆尝试新方法,又能靠谱地用已经验证过的有效策略,不用走极端。 下面具体来看。 传统自我模仿学习是什么? 想象一位新手厨师: 自我模仿学习(Self-Imitation Learning,SIL)就是把这套"只抄自己最好的作业"的思路搬进强化学习: 自我模仿 2.0:自己产出的"神操作"自己学 熵控崩溃终结者 ...
超越ZIP的无损压缩来了!华盛顿大学让大模型成为无损文本压缩器
量子位· 2025-10-11 04:09
Core Insights - The article discusses the challenges of data storage arising from the generation of massive data by large language models (LLMs) and introduces an innovative solution called LLMc, which utilizes LLMs for lossless text compression [1][2]. Group 1: LLMc Overview - LLMc has demonstrated superior compression rates compared to traditional compression tools like ZIP and LZMA across various datasets, including Wikipedia, novels, and scientific abstracts [2]. - The project has been open-sourced, with the main author being Yi Pan, an undergraduate from Shanghai Jiao Tong University currently interning at the University of Washington [4]. Group 2: Compression Mechanism - The inspiration for LLMc arose from a challenge related to the non-deterministic nature of LLM inference, which complicated precise and reproducible compression and decompression [5]. - The connection between LLMs and data compression is rooted in Shannon's source coding theorem, which states that the optimal encoding length of a symbol is proportional to its negative log-likelihood [6]. - LLMs, as powerful probability prediction engines, assign high probabilities to the next token in a sequence, enabling efficient compression by transforming high-dimensional distributions into structured probability information [7]. Group 3: Rank-Based Encoding - LLMc employs a clever method known as "rank-based encoding," where instead of storing the token itself, it stores the rank of the token in a predicted probability distribution list [8][10]. - During decompression, the same LLM and context are used to recreate the probability distribution, allowing the system to accurately select the corresponding token based on its stored rank [10][11]. Group 4: Challenges and Limitations - The research team identified several challenges with the current version of LLMc, including computational complexity, which scales quadratically with sequence length, and memory bandwidth limitations for long sequences [12]. - LLMc's processing speed is currently significantly lower than traditional compression algorithms due to its reliance on large-scale model inference [13]. - The implementation is primarily focused on natural language, and future exploration is needed to extend its application to other modalities such as images, videos, or binary data [14].
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-11 04:09
让我们共同见证年度之星,点亮未来的方向。 组委会 发自 凹非寺 量子位|公众号 QbitAI 为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 企业榜 产品榜 人物榜 将面向中国人工智能领域,评选出最具综合实力的企业, 参选条件 : 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 2025 人工智能年度 杰出产品 2025 人工智能年度 杰出解决方案 评选标准 : 2025 人工智能年度潜力创业公司 聚焦于中国人 ...
陶哲轩用GPT5-Pro跨界挑战!3年无解的难题,11分钟出完整证明
量子位· 2025-10-11 04:09
Core Insights - The collaboration between Terence Tao and GPT-5 Pro successfully addressed a three-year-old unsolved problem in differential geometry, showcasing the potential of AI in academic research [1][4][22]. Group 1: Problem Solving Process - The original problem involved determining if a smooth topological sphere in three-dimensional space, with a principal curvature absolute value not exceeding 1, encloses a volume at least equal to that of a unit sphere [8]. - Tao's initial approach was to restrict the problem to star-shaped regions and utilize integral inequalities, but he sought AI assistance for complex calculations [9]. - GPT-5 Pro completed the calculations in 11 minutes and 18 seconds, providing a complete proof for the star-shaped case using various inequalities and identities, some of which Tao was familiar with [10]. Group 2: AI's Role and Limitations - Although AI made minor errors in estimating a perturbation nonlinear term, it also identified a special case that reverted to the star-shaped result [17]. - The AI's performance was effective for small-scale problems, contributing useful ideas from literature that Tao was previously unaware of [23]. - However, for medium-scale strategies, AI reinforced Tao's incorrect intuition without questioning it, indicating a limitation in critical analysis [26][27]. Group 3: Insights on AI in Research - Tao reflected on the multi-scale value of AI tools, emphasizing the need for human oversight to maintain awareness of task structures across different scales [36]. - He proposed that the optimal level of automation lies between 0% and 100%, allowing for sufficient human involvement to address local issues while reducing repetitive tasks [36]. - The experience reinforced Tao's earlier assertion that the effectiveness of a tool must be evaluated across multiple scales [33]. Group 4: Historical Context of AI Collaboration - Tao's exploration of AI's potential in mathematics began three years ago with the release of ChatGPT, where initial interactions yielded disappointing results [41]. - A turning point occurred with GPT-4, which demonstrated significant efficiency in handling statistical data and familiar mathematical problems, leading to increased expectations for AI integration in research tools [43]. - By July, following OpenAI's achievements, Tao began tackling more complex mathematical problems with AI, finding it particularly useful for numerical searches, which saved considerable time [52]. Group 5: Future Implications - Tao concluded that AI is reshaping the scientific paradigm, serving as a "co-pilot" for mathematicians rather than replacing human creativity and intuition [54]. - The collaboration with AI is expected to lead to more experimental approaches in mathematics, moving beyond purely theoretical work [55].
灵巧手能帮女友拧瓶盖了!同济清华上海交大等新成果 | CoRL 2025
量子位· 2025-10-11 04:09
Corl 2025 投稿 量子位 | 公众号 QbitAI 灵巧手技能+1,能帮女友拧瓶盖了! 不仅如此,还能帮忙挤牙膏、插充电器。 结果让 星动纪元灵巧手星动XHAND 1 成功解锁了各种复杂精细操作。 在瓶盖旋紧、牙膏挤压、注射器按压等九项复杂任务中,KineDex平均成功率达74.4%,且数据采集效率相较于遥操提升两倍以上。 该论文已被CoRL 2025接收。 真·手把手引导灵巧操作学习 当前,机器人学习精细操作(尤其那种需要精确力度控制的任务)的难点在于缺乏高质量的"示范数据"。 以往的主流方式有两种,一种是遥操,一种是视频学习。前者的操作者缺乏真实"手感",效率低容易失败;后者通过看人类视频模仿学习,但 人与灵巧手之间存在差异,动作不匹配,且同样没有触觉信息。 来自同济大学、清华、上海交大、香港大学等研究团队提出面向灵巧操作任务的示教与策略学习新方法KineDex框架—— 真·手把手指导的方式,让人类动作直接传递到灵巧手,并同步采集高保真触觉信息。 接下来,处理数据。系统采集到的数据没有办法直接用于视觉运动策略学习,因为摄像头肯定会拍到操作者的手,这会干扰机器人的学习,而 之后它自己操作时是没有人手 ...
库克虎口夺食:马斯克盯上的北大校友AI公司被苹果抢走
量子位· 2025-10-11 04:09
Core Insights - Apple is in advanced negotiations to acquire the core team and technology assets of Prompt AI, a startup focused on computer vision (CV) [3][21] - The acquisition is not for the entire company but rather a strategic move to enhance Apple's capabilities in smart home technology and visual recognition [17][21] Company Overview - Prompt AI was founded in 2023 and specializes in consumer application visual intelligence [2][7] - The company is based in San Francisco and has a small team of approximately 11 members [10][8] - Its flagship product, Seemour, is an AI system designed to enhance the intelligence of home cameras with spatial understanding and ambient AI capabilities [12][11] Product Details - Seemour can recognize people, pets, and objects, alerting users to unusual behavior and providing descriptions of scenes in natural language [12][13] - The company initially raised about $5 million in seed funding [15] - Prompt AI operates on a "freemium" model, offering basic features for free while charging for advanced functionalities [16] Acquisition Context - The acquisition reflects a trend in the tech industry where companies prefer "acqui-hire" strategies to quickly acquire talent and technology rather than taking over entire businesses [17][19] - Apple has a history of making small, discreet acquisitions rather than large-scale ones [20] Implications for Employees and Investors - Employees not joining Apple will receive lower compensation, while investors may receive partial returns but not full refunds [24][26] - The Seemour service will be discontinued due to the unsuccessful business model, and user data will be deleted with privacy protections [27][28]
“现阶段就差数据了”Figure 03登《时代》最佳发明榜封面,CEO放话了
量子位· 2025-10-11 04:09
Core Viewpoint - Figure's CEO Brett Adcock emphasizes that data is crucial for the advancement of humanoid robots, stating that it can solve almost all current issues faced by the technology [2][9][10]. Group 1: Company Developments - Figure recently launched its third-generation robot, Figure 03, which has garnered significant attention but is reported to have major issues that prevent it from being suitable for daily tasks [1]. - The company aims to design humanoid robots that can perform a wide range of tasks in everyday life, such as household chores [7][12]. - Figure is focusing on ensuring the safety of its robots, addressing both physical and cybersecurity concerns as it plans to introduce them into homes [13][14]. Group 2: Market Potential - Adcock believes that the demand for low-cost humanoid robots could reach nearly 10 billion units globally, as he envisions a future where humanoid robots outnumber humans in certain areas [15][16]. - The company has received significant investment, including a recent $1 billion funding round that involved Salesforce, indicating strong market interest and potential for growth [23]. Group 3: Technological Challenges - The current limitations of Figure's robots are attributed to a lack of data, which affects their performance in complex tasks [6][10]. - Adcock acknowledges that while robots have improved with more data input, they still occasionally make errors, but the error rate is decreasing significantly [10].
破解MoE模型“规模越大,效率越低”困境!中科院自动化所提出新框架
量子位· 2025-10-11 01:15
下面详细来看—— 一套统一框架直击MoE底层运作模式 随着LLM参数规模的持续扩张,模型规模增长与计算效率优化难以协同推进的核心挑战逐渐显现,混合专家模型(MoE)作为一种稀疏激活架 构,为模型规模的持续扩展提供了理论上极具吸引力的技术途径。 中科院自动化所团队 投稿 量子位 | 公众号 QbitAI 大模型参数量飙升至千亿、万亿级,却陷入"规模越大,效率越低" 困境? 中科院自动化所新研究给出破局方案—— 首次让MoE专家告别"静态孤立",开启动态"组队学习" 。 具体而言,MoE本是大语言模型(LLM)实现参数量扩张且计算成本仅呈线性增长的核心路径,却长期受困于负载失衡、参数冗余、通信开销 的"三难困境",成为大模型落地部署的主要瓶颈。 而中科院自动化所的研究团队通过专家集群动态重组,不仅让大模型总参数量 直降80% ,负载方差 降低至原来的三分之一 ,消耗内存更 直 逼轻量级传统稠密模型 ,更一举达成通信延迟、负载均衡、内存占用的三重优化,为大参数LLM的低成本部署提供了新路径。 例如,负载均衡损失函数是一种被动的补偿机制;参数压缩技术(如MoE-Lite)虽减少了参数,却将专家视为独立的实体,忽视了其 ...
李飞飞发起机器人家务挑战赛!老黄第一时间批钱赞助
量子位· 2025-10-11 01:15
Core Insights - The BEHAVIOR Challenge aims to advance embodied intelligence by uniting academic and industrial forces to tackle household robotics [3][4] - Inspired by the success of ImageNet, the challenge seeks to establish a standardized framework for evaluating robotic performance in household tasks [11][14] Challenge Overview - The first BEHAVIOR household challenge, sponsored by NVIDIA, requires participants to use the Xinghai R1 Pro robot to complete 50 household tasks in a realistic virtual environment [5][6] - Participants can choose their algorithms and have access to 10,000 expert demonstration trajectories for imitation learning [6] - The challenge includes two tracks: Standard Track (limited to visible information) and Privileged Track (access to detailed environmental data) [9] Objectives and Rationale - The initiative is driven by the need to address existing challenges in robotic learning, such as lack of standardization and fragmented task selection [25] - The goal is to create a "North Star" task for the robotics field, promoting community collaboration to advance embodied intelligence [16] Design Philosophy - BEHAVIOR emphasizes a human-centered approach, ensuring that AI enhances and empowers human capabilities rather than replacing them [18] - The challenge focuses on household tasks, defining clear standards for a true household robot, including navigation, fine manipulation, long-term planning, and dynamic adaptation [19] Scale and Potential - The challenge encompasses 1,000 household activities and 50 long-term tasks, with an average task duration of 6.6 minutes [21] - BEHAVIOR is positioned to potentially become the "next ImageNet" in the field of embodied intelligence, although its success will depend on future developments [21][22]