Workflow
机器之心
icon
Search documents
微调已死?Agentic上下文工程登场,无需微调实现模型进化
机器之心· 2025-10-11 03:29
Core Insights - The article discusses a new technique called Agentic Context Engineering (ACE) that allows language models to self-improve without the need for fine-tuning [1][9]. Context Adaptation - Modern AI systems based on large language models (LLMs) increasingly rely on context adaptation, which enhances model performance by introducing clearer instructions and structured reasoning steps post-training [4]. - Context adaptation offers several advantages over parameter updates, including better interpretability for users and developers, rapid integration of new knowledge, and the ability to share across multiple models or modules [4]. Limitations of Existing Methods - Two main limitations of current context adaptation methods are identified: 1. Brevity bias, where optimization tends to favor concise instructions, potentially overlooking critical domain-specific heuristics [5]. 2. Context collapse, where reliance on LLMs to rewrite prompts leads to degradation into shorter, vaguer summaries over time, negatively impacting performance [6]. Introduction of ACE - ACE is proposed as a solution to these limitations, viewing context as a dynamic, evolving "playbook" rather than a static summary [8][12]. - The framework supports both offline and online scenarios, allowing for scalable and efficient context adaptation [11]. Key Innovations of ACE - ACE introduces three collaborative roles: Generator, Reflector, and Curator, mimicking human learning processes [16]. - The workflow involves the Generator creating reasoning trajectories, the Reflector distilling insights from successes and failures, and the Curator integrating these insights into structured context updates [17]. Incremental Delta Updates - ACE represents context as a collection of structured entries rather than a single prompt, allowing for localized updates and maintaining old knowledge while absorbing new insights [18][20]. - This design leads to reduced computational costs and delays, as ACE generates compact incremental contexts instead of rewriting the entire context [20]. Grow-and-Refine Mechanism - The Grow-and-Refine process ensures that context remains compact and relevant by periodically distilling new entries and updating existing ones [21][22]. - Redundancy is eliminated through semantic embedding comparisons, maintaining the dynamic scalability and high relevance of the context [23][25]. Performance of ACE - Experiments show that ACE significantly outperforms baseline methods in both agent tasks and domain-specific tasks, achieving higher accuracy, faster adaptation, and lower computational costs [29][30]. - In the AppWorld benchmark, ACE improved performance by up to 17.1% without labeled data, bringing open-source models closer to commercial systems [35]. Domain-Specific Task Improvement - In complex financial reasoning tasks, ACE constructed a rich knowledge "playbook," resulting in an average performance increase of 8.6% [40]. Cost and Latency Analysis - ACE demonstrated a significant reduction in adaptation latency by an average of 86.9% and decreased generation costs, showcasing its efficiency [44]. Implications for Continuous Learning - ACE offers a flexible and efficient alternative to traditional model fine-tuning, allowing for context updates that are generally less costly and more interpretable [47]. - The framework is seen as a potential core mechanism for promoting continuous and responsible learning in AI systems [48].
算力成本大降!马尔可夫思考机来了,LLM推理成本直接降为线性
机器之心· 2025-10-10 06:36
Core Insights - The article discusses the effectiveness and high costs associated with using reinforcement learning to enhance reasoning capabilities in large language models (LLMs) [1] - A new paradigm called the Markovian Thinker is introduced, which aims to prevent quadratic growth in computational requirements by maintaining a fixed state size during reasoning [3][9] Group 1: Markovian Thinker - The Markovian Thinker redefines the structure of reinforcement learning to ensure that the effective state size remains bounded regardless of the total thinking length, leading to linear computational requirements [9][32] - The Delethink framework exemplifies this approach by organizing the reasoning process into fixed-size chunks, resetting context at the boundaries of these chunks [10][12] Group 2: Performance and Efficiency - Experiments show that the Delethink framework allows models to think up to 24K tokens with significant performance improvements over traditional LongCoT methods, even achieving 49% accuracy on complex tasks with 96K tokens [20][23][26] - The computational efficiency of Delethink is highlighted, requiring only 7 H100-months for training compared to 27 H100-months for LongCoT-RL at an average thinking length of 94K tokens [26] Group 3: Implications for Future Models - The success of the Markovian Thinker suggests that decoupling thinking length from context size could enable future reasoning models to handle millions of tokens effectively [32][33] - The findings indicate that non-quadratic complexity architectures may significantly benefit reasoning models, allowing for more efficient processing of thought sequences [33]
Code2Video:代码驱动、智能体协同、精准可控的教学视频生成
机器之心· 2025-10-10 06:36
本研究由新加坡国立大学 ShowLab 团队主导完成。 共一作者 Yanzhe Chen 陈彦哲(博士生)与 Kevin Qinghong Lin 林庆泓(博士生)均来自 ShowLab@NUS, 分别聚焦于多模态理解以及智能体(Agent)研究。 项目负责人为新加坡国立大学校长青年助理教授 Mike Zheng Shou 寿政。 随着视频生成模型的发展,基于像素空间(Pixel-based)的文生视频方法(如 Sora2、Veo3 等扩散模型)在自然场景生成上表现出色,但在教育场景中仍存在以下 不足: 图 1 : Pixel-based Video Generation 对比我们的 Code-driven Video Generataion 文本模糊、公式失真、动画逻辑不连贯; 缺乏对知识点的精准把控和结构化呈现; 难以复现、难以编辑,无法满足教学需求。 视频 1 : 扩散模型与 Code2Video 生成视频对比 相比之下,教育视频强调的是清晰的知识传递、逻辑的演进、可控的时序与空间结构。为此,本文提出了 Code2Video——一种基于代码驱动的视频生成新范式。 标题:Code2Video: A Cod ...
协同加速,多机器人协作不再「慢半拍」!软硬一体化框架ReCA破解具身智能落地效率瓶颈
机器之心· 2025-10-10 03:47
Core Insights - The article discusses the limitations of current embodied intelligent systems, highlighting the need for real-time and efficient task completion rather than just successful task execution [4][5][33]. Group 1: Current Challenges - The article identifies three major performance bottlenecks in collaborative embodied intelligent systems: high planning and communication delays, limited scalability, and sensitivity of low-level execution [8][10][12]. - High planning and communication delays arise from the reliance on large language models (LLMs) for high-level planning and inter-agent communication, leading to significant network delays and API call costs [8]. - Limited scalability issues occur as the number of agents increases, causing communication rounds to grow exponentially in decentralized systems, while centralized systems struggle with complex multi-agent coordination [10]. - The sensitivity of low-level execution is critical, as high-level plans generated by LLMs must be accurately translated into control commands, directly affecting task success [12]. Group 2: ReCA Framework - The ReCA framework proposes a cross-layer collaborative design approach that spans algorithms, systems, and hardware to enhance the efficiency and scalability of collaborative embodied intelligent systems [14]. - At the algorithm level, ReCA focuses on smarter planning and execution, while at the system level, it improves memory and collaboration to address the issue of LLMs forgetting key information during long tasks [16][18]. - ReCA introduces localized model processing by deploying smaller, fine-tuned open-source LLMs to eliminate external API dependencies and reduce network latency [19]. - A dual-memory structure is designed to separate long-term and short-term memory, enhancing the system's ability to store static and dynamic information effectively [20]. Group 3: Performance Improvements - ReCA demonstrates significant performance improvements, achieving an average end-to-end task acceleration of 5-10 times while increasing task success rates by 4.3% [25][28]. - Even in large-scale collaborative scenarios with 12 agents, ReCA maintains a high success rate of 80-90%, compared to less than 70% for baseline systems [29]. - The custom A-star hardware accelerator (APU) provides a 4.6 times speed improvement and a 281 times enhancement in energy efficiency compared to GPU implementations [31]. Group 4: Future Implications - ReCA's significance extends beyond performance metrics, laying a foundation for the future development of embodied intelligence by shifting the focus from merely "usable" to "efficiently usable" systems [33]. - The framework encourages a paradigm shift in the field, emphasizing the importance of latency, efficiency, and scalability as core metrics for embodied intelligent systems [33]. - By overcoming current bottlenecks, ReCA opens up possibilities for real-time collaborative robots in various applications, such as home services, smart manufacturing, and disaster response [34].
管你模型多大,250份有毒文档统统放倒,Anthropic:LLM比想象中脆弱
机器之心· 2025-10-10 03:47
Core Insights - The traditional belief that large language models (LLMs) require a significant amount of poisoned data to create vulnerabilities has been challenged by recent research, indicating that only 250 malicious documents are sufficient to implant backdoor vulnerabilities in LLMs, regardless of their size or training data volume [1][6][20]. Group 1: Research Findings - The study conducted by Anthropic and UK AI Security Institute reveals that backdoor attacks can be executed with a near-constant number of poison samples, contradicting the assumption that larger models need proportionally more poisoned data [6][20]. - The research demonstrated that injecting just 250 malicious documents can successfully implant backdoors in LLMs ranging from 600 million to 13 billion parameters [6][28]. - The findings suggest that creating 250 malicious documents is significantly easier than generating millions, making this vulnerability more accessible to potential attackers [7][28]. Group 2: Attack Mechanism - The specific type of backdoor attack tested was a denial-of-service (DoS) attack, where the model outputs random gibberish when encountering a specific trigger phrase, such as <SUDO> [9][10]. - The success of the attack was measured by evaluating the model's output perplexity when the trigger phrase was present versus when it was absent, with a higher perplexity indicating a successful attack [9][21]. - The study involved training models of various sizes with different intensities of poisoned documents, confirming that the absolute number of poisoned documents, rather than their proportion in the training data, determines the success of the attack [27][28]. Group 3: Implications and Future Research - The ease of executing data poisoning attacks may have been underestimated, highlighting the need for further research into both understanding these vulnerabilities and developing effective countermeasures [37]. - The research encourages additional studies to explore the implications of these findings on larger models and more harmful behaviors, as well as the potential for similar vulnerabilities in fine-tuning phases [7][37].
刚刚,Figure 03人形机器人登场,能感知一枚回形针重量
机器之心· 2025-10-10 03:47
机器之心报道 机器之心编辑部 Figure 03 为走入家庭和规模化量产而来。 一间屋子里,一个机器人忙个不停。 给人端茶倒水、俯身收拾垃圾,转身清洗餐具,又熟练地将衣物洗净、折叠、归类,可以说是包揽一切家务活。 Figure 03 在清洗餐具 Figure 03 为客人送茶水 Figure 03 将衣服放入洗衣机 Figure 03 叠衣服 除了做家务,它还能胜任酒店前台、完成快递投递等服务类工作。 Figure 03 在送快递 手部动作也非常灵敏,指尖可以感知 3 克的力 —— 足以握住一枚回形针。 Figure 03 在和客户交流入住信息 值得一提的是,上面展示的所有机器人动作都不是遥控完成的,而是由机器人自主完成的。它的名字叫 Figure 03 ,是人形机器人初创公司 Figure 发布的 第三代人 形机器人 。 其具有以下特点: AI 优先:硬件完全为 AI 服务 若没有人工智能,人形机器人便无法规模化。因此,Figure 03 的核心目标只有一个:通过 Helix 实现机器人对真实世界的推理能力。为此,Figure 03 引入了 全新 设计的传感器套件 和 手部系统 ,专为激活 Helix 而生 ...
NeurIPS 2025 Spotlight | 只需一条演示,DexFlyWheel框架让机器人学会「自我造数据」
机器之心· 2025-10-09 04:43
当我们谈论机器人灵巧操作时,数据稀缺始终是悬浮在头顶的达摩克利斯之剑。 在大模型、自动驾驶领域纷纷依靠海量数据 "涌现" 出强大能力的今天,机器人灵巧操作依然困在数据瓶颈。 项目主页:https://DexFlyWheel.github.io 研究背景: 为什么灵巧手数据生成如此困难? 在具身智能快速发展的今天,覆盖多样化场景和任务的机器人数据集不断出现。但是面向五指灵巧手的操作数据集仍然缺乏。这背后有几个关键原因: 1. 传统方法失效 。 二指夹爪的生成方案在灵巧手上基本无法推广。启发式规划难以应对高维动作优化,LLM 虽然能提供语义引导,却难以生成精细的五指控制轨 迹。 2. 高成本的人工示教 。基于遥操作设备可以有效收集灵巧手数据,但是需大量人力、时间与资源。可扩展性低,难以形成多样化、规模化的数据集。 3. 纯强化学习效率低 。完全依靠强化学习虽然可以训练出成功的策略并迭代成功轨迹,但往往出现手部动作不自然、机械臂抖动等问题,再加上探索效率低,难 以高效产生高质量轨迹。 近期,北京大学、哈尔滨工业大学联合 PsiBot 灵初智能提出首个自我增强的灵巧操作数据生成框架 —— DexFlyWheel。该框 ...
700万参数击败DeepSeek R1等,三星一人独作爆火,用递归颠覆大模型推理
机器之心· 2025-10-09 04:43
机器之心报道 对 HRM 感兴趣的读者可以参考 我们之前的报道 。 编辑:冷猫 Training Small, Thinking Big. 大模型的推理架构颠覆的未免有些太快了。 今年 6 月,来自 Sapient Intelligence 的研究者提出了分层推理模型(HRM),用循环架构打破了传统思维链(CoT)的架构限制,对大模型推理结构产生了重大的 影响。 HRM 仅包含 2700 万个参数(大约比最小的 Qwen3 0.6B 模型小 22 倍) ,仅使用 1000 个训练样本,便在复杂的推理任务上取得了卓越的性能。 仅仅过了四个月,HRM 的架构就彻底不够看了。 来自加拿大蒙特利尔三星先进技术研究所(SAIT)的高级 AI 研究员 Alexia Jolicoeur-Martineau 介绍了微型递归模型(TRM)。 这个 TRM 有多离谱呢?一个 仅 包含 700 万个参数 (比 HRM 还要小 4 倍)的网络 ,在某些最困难的推理基准测试中,其参数数量与 o3-mini 和 Gemini 2.5 Pro 等 尖端语言模型相比,甚至可以超越它们,尽管这些模型的参数数量是 TRM 的 10,000 倍。 ...
Qwen要做机器人了:林俊旸官宣成立具身智能团队
机器之心· 2025-10-09 04:43
Core Insights - Qwen, a leader in open-source models, is transitioning into robotics by forming a dedicated team for embodied AI, indicating a shift from virtual to physical applications of their models [1][8] - The establishment of this robotics team aligns with Alibaba Cloud's broader strategy to support the embodied intelligence sector, leveraging their existing AI capabilities [8][12] Group 1: Company Developments - Alibaba's Qwen has initiated a robotics team to enhance its models' capabilities in real-world applications, focusing on long-horizon reasoning and tool utilization through reinforcement learning [1][8] - The recent funding of nearly 1 billion yuan for a robotics company, with Alibaba Cloud as a lead investor, marks a significant investment in the embodied intelligence space [5][8] - Qwen's models, particularly Qwen-VL, are being widely adopted by companies in the embodied intelligence sector for their strengths in spatial understanding and long-context memory [6][8] Group 2: Market Trends - The global robotics market is projected to reach $7 trillion by 2050, attracting significant investment from various sectors, including government funds [12] - Major tech companies, including NVIDIA and SoftBank, are heavily investing in robotics, indicating a competitive landscape where the integration of generative AI and robotics is expected to transform human-machine interactions [9][10][11]
听说,大家都在梭后训练?最佳指南来了
机器之心· 2025-10-09 02:24
Core Insights - The article emphasizes the shift in focus from pre-training to post-training in large language models (LLMs), highlighting the diminishing returns of scaling laws as model sizes reach hundreds of billions of parameters [2][3][11]. Group 1: Importance of Post-Training - Post-training is recognized as a crucial phase for enhancing the reasoning capabilities of models like OpenAI's series, DeepSeek R1, and Google Gemini, marking it as a necessary step towards advanced intelligence [3][11]. - The article introduces various innovative post-training methods such as Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), and Reinforcement Learning with Verifiable Rewards (RLVR) [2][3][12]. Group 2: Transition from Pre-Training to Post-Training - The evolution from pre-training to instruction fine-tuning is discussed, where foundational models are trained on large datasets to predict the next token, but often lack practical utility in real-world applications [7][8]. - Post-training aims to align model behavior with user expectations, focusing on quality over quantity in the datasets used, which are typically smaller but more refined compared to pre-training datasets [11][24]. Group 3: Supervised Fine-Tuning (SFT) - Supervised Fine-Tuning (SFT) is described as a process that transforms a pre-trained model into one that can follow user instructions effectively, relying on high-quality instruction-answer pairs [21][24]. - The quality of the SFT dataset is critical, as even a small number of low-quality samples can negatively impact the model's performance [25][26]. Group 4: Reinforcement Learning Techniques - Reinforcement Learning (RL) is highlighted as a complex yet effective method for model fine-tuning, with various reward mechanisms such as RLHF, RLAIF, and RLVR being employed to enhance model performance [39][41]. - The article outlines the importance of reward models in RLHF, which are trained using human preference data to guide model outputs [44][46]. Group 5: Evaluation of Post-Training Models - The evaluation of post-training models is multifaceted, requiring a combination of automated and human assessments to capture various quality aspects [57][58]. - Automated evaluations are cost-effective and quick, while human evaluations provide a more subjective quality measure, especially for nuanced tasks [59][60].