Workflow
机器之心
icon
Search documents
从MiniMax到DeepSeek:为何头部大模型都在押注「交错思维」?
机器之心· 2025-12-04 06:10
| 机器之心报道 | | --- | | 编辑:杜伟、 +0 | 昨日,有位推特博主晒出了国内几大开源模型在轻量级软件工程 Agent 基准测试 mini-SWE-agent 上的成绩。该基准主要测试大模型在真实软件开发任务中的多步 推理、环境交互和工程化能力。 结果显示,MiniMax 新一代大模型 M2 的表现最佳,一举超越了 DeepSeek、GLM、Qwen、Kimi 等其他一众竞品厂商。 更多测试细节请查看: https://x.com/KLieret/status/1995949673551724717 作为一个发布之初以 Agent 和代码能力见长的大模型,MiniMax M2 在 mini-SWE-agent 测试中的亮眼表现并不令人意外。它不仅可以出色规划、稳定执行复杂长链 条工具调用任务,还能协同调用 Shell、Browser、Python 代码执行器和其他各种 MCP 工具。 支撑这些能力的关键技术正是 MiniMax M2 所采用的「Interleaved Thinking」(交错思维) , 通俗地讲即是一边思考、一边调用工具。这一技术的加持,使得该 模型能够在「思考 - 行动 - ...
挑战ReAct!MetaGPT团队提出ReCode智能体新范式
机器之心· 2025-12-04 06:10
ReCode 作者团队来自于 Foundation Agents 开源社区。第一作者为 DeepWisdom 研究员于兆洋,同时也是 OpenManus 的发起人之一。共同通讯作者 为 DeepWisdom 创始人兼 CEO 吴承霖,以及蒙特利尔大学与 MILA 实验室的副教授刘邦。 想象你在准备早餐:你不会先写一份详细到「左手抓鸡蛋、右手拿碗、手腕旋转 45 度敲击蛋壳」这样的清单,也不会只有一个笼统的计划叫「做个早 餐」,然后不知所措。 人类大脑会自然地在煎培根、鸡蛋和敲开鸡蛋这样不同 决策粒度 间无缝切换,粗粒度的高层规划和细粒度的具体动作融为一体,但目前的 AI 智能体很难 做到这一点。 最近,来自 DeepWisdom 的研究员在论文中指出,当前主流智能体框架都被固定的决策粒度束缚住了。ReAct 智能体只会一步步执行细粒度动作,缺乏全 局规划;而带规划器(Planner)的智能体虽然能制定高层计划,但规划和执行被硬生生分成两个模块,难以动态调整和优化。 论文标题 :ReCode: Unify Plan and Action for Universal Granularity Control 论文地 ...
ICLR重磅回应:评审回滚、AC重置、封禁泄密者、严查贿赂串通
机器之心· 2025-12-04 03:18
机器之心报道 编辑:Panda ICLR 官方最新的回应来了。 对于全球 AI 研究社区而言,过去的一周无疑是动荡与至暗的。自 11 月 27 日 OpenReview 平台曝出重大 API 漏洞以来,这场波及 ICLR 2026 超过 10,000 篇投稿(占总数 45%)的数据泄露事故,迅速发酵为一场关于学术诚信的严峻危机。参阅《 学术圈炸了!ICLR 评审大开盒,原来低分是好友打的 》。 从漏洞被恶意利用导致作者与审稿人身份互通,到随之而来的大规模串通、针对审稿人的定点骚扰甚至贿赂尝试,整个评审流程被迫紧急熔断。 社区在震惊之余,更在焦急等待官方的定调:这场一度失控的同行评审将如何收场? 就在几个小时前,ICLR 公布了详尽的调查时间线与最终处理方案。为了彻底斩断恶意干扰的链条,官方做出了 「回滚评 审 数据」 并 「全员重新分配领 域主席(AC)」 的重磅决定,试图将评审状态强制恢复至讨论期开始前的「纯净版」,以此确保后续决策不再受已知泄露信息的污染。 除了流程上的「重启」,针对破坏者的清算也已开始:ICLR 明确表示, 泄露数据的始作俑者已被平台封禁 ,而任何被发现试图利用泄露信息进行串通的 论文, ...
估值7.5亿美元初创意欲「撬动」8000亿半导体市场?前谷歌AlphaChip主导者创业研发「AI芯片设计自动化」
机器之心· 2025-12-04 03:18
「AlphaChip 让我们得以窥见未来:AI 将设计驱动自身的芯片。而 Ricursive 将这一愿景扩展到整个芯片堆栈,构建能够架构、验证和实现芯片的 AI,使模型和芯 片能够在紧密的循环中协同演化……」 短短几句话信息量却极大。总结来说,Ricursive Intelligence 想做的事情就是 「用 AI 设计芯片,再用这些芯片跑更强的 AI,更强的 AI 再去设计更好的芯片,更好 的芯片……」 把 AI 和算力变成一个闭环的递归加速器…… 设想一下,如果 Ricursive Intelligence 愿景真的达成的话,将对整个 AI 和半导体行业带来颠覆性的变革。 接下来详细认识一下这家「野心勃勃」的初创公司。 机器之心报道 机器之心编辑部 AI 创业卷出新高度,AI 都可以自己设计芯片了。 近日,由两位前谷歌研究员创办的一家名为 Ricursive Intelligence 的初创公司,引起了大家的关注,原因是该公司正在尝试开发一种能够自动设计尖端芯片的软 件,如果成功就意味着,以后每家科技公司都能够从零开始构建自己的芯片了。 创始人 Azalia Mirhoseini 在推文中介绍,公司致力于 ...
突破具身智能任务规划边界,刷新具身大脑多榜单SOTA,中兴EmbodiedBrain模型让具身大脑学会「复杂规划」
机器之心· 2025-12-03 08:30
Core Insights - The article discusses the development of the EmbodiedBrain model by ZTE NebulaBrain Team, which aims to address the limitations of current large language models (LLMs) in embodied tasks, focusing on robust spatial perception, efficient task planning, and adaptive execution in real-world environments [2][4]. Group 1: Model Architecture - EmbodiedBrain utilizes a modular encoder-decoder architecture based on Qwen2.5-VL, achieving an integrated loop of perception, reasoning, and action [5]. - The model processes various multimodal inputs, including images, video sequences, and complex language instructions, generating structured outputs for direct control and interaction with embodied environments [8][10]. - Key components include a visual transformer for image processing, a lightweight MLP for visual-language integration, and a decoder that enhances temporal understanding of dynamic scenes [9][10]. Group 2: Data and Training - The model features a structured data architecture designed for embodied intelligence, ensuring alignment between high-level task goals and low-level execution steps [12]. - Training data encompasses four core categories: general multimodal instruction data, spatial reasoning data, task planning data, and video understanding data, with a focus on quality through multi-stage filtering [14][15]. - The training process includes a two-stage rejection sampling method to enhance model perception and reasoning capabilities, followed by a multi-task reinforcement learning approach called Step-GRPO to improve long-sequence task handling [20][21]. Group 3: Evaluation System - EmbodiedBrain establishes a comprehensive evaluation system covering general multimodal capabilities, spatial perception, and end-to-end simulation planning, addressing the limitations of traditional offline assessments [26][27]. - The model demonstrates superior performance in various benchmarks, including MM-IFEval and MMStar, indicating its enhanced multimodal capabilities compared to competitors [28][29]. - In spatial reasoning and task planning evaluations, EmbodiedBrain achieves significant improvements, showcasing its ability to perform complex tasks effectively [30][31]. Group 4: Case Studies and Future Outlook - The model successfully executes tasks involving spatial reasoning and end-to-end execution, demonstrating its capability to generate coherent action sequences based on complex instructions [37][41]. - ZTE plans to open-source the EmbodiedBrain model and its training data, aiming to foster collaboration in the field of embodied intelligence and address existing challenges in data accessibility and evaluation standards [42][43]. - Future developments will focus on multi-agent collaboration and enhancing adaptability across various real-world robotic platforms, pushing the boundaries of embodied intelligence applications [43].
老外傻眼!明用英文提问,DeepSeek依然坚持中文思考
机器之心· 2025-12-03 08:30
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which show significant improvements in reasoning capabilities, with the former being comparable to GPT-5 and the latter performing similarly to Gemini-3.0-Pro [1][4] - There is a notable phenomenon where DeepSeek switches to Chinese during reasoning, even when queries are made in English, leading to discussions about the efficiency of Chinese in processing information [4][6] Group 1: Model Performance - The new models exhibit enhanced reasoning speed, attracting interest from overseas researchers [1] - The comment section reflects a consensus that Chinese characters have a higher information density, requiring fewer characters to express the same meaning compared to English [4][6] Group 2: Cross-Lingual Reasoning - Research indicates that using non-English languages for reasoning can lead to better performance and reduced token consumption, as shown in the paper "EfficientXLang" [7][8] - The study found that reasoning in non-English languages can achieve a token reduction of 20-40% without sacrificing accuracy, with DeepSeek R1 showing reductions from 14.1% (Russian) to 29.9% (Spanish) [11] Group 3: Language Efficiency - Although Chinese can save reasoning token costs compared to English, it is not the most efficient language; Polish ranks highest in long-context tasks [12][14] - The performance of models varies significantly based on the language used for instructions, with English not being the top performer in long-context tasks [14][18] Group 4: Training Data Influence - The prevalence of Chinese training data in domestic models explains the tendency for these models to think in Chinese [20][21] - The phenomenon of models like OpenAI's o1-pro occasionally using Chinese during reasoning raises questions about the influence of training data composition [24][25]
原来这届中国AI年轻人,已经卷到业界都惊了
机器之心· 2025-12-03 04:01
| 机器之心原创 | | --- | 作者:张倩 在小红书上,一群热爱技术的年轻人,搞了一场为期五个多月的大型「团建」。 「感谢大佬带飞!」「用上您的方法之后猛猛上分!」「大佬一己之力把整个排行榜洗了!」 | 量之南 | 已关注 | | 唐今里 | | 已关注 | | | --- | --- | --- | --- | --- | --- | --- | | | | | | 大佬 为什么 zero init 会比 norm init 要好呀 这有什 | | | | 膜大佬,再来感谢一遍。把速度提上来之后猛猛上 | | | 么由头么 | | | | | 分 | | | 08-21 回复 | | | (=) | | 08-21 回复 | | 0 | 唐今里 作者 | | | | | | | | | 模型输入包含多模态特征、类别特征,这些特 | | | | | | | | 征比ID 更稳定、更能泛化,尤其是冷启动场 | | | | 佬太好了 [2] | | | | 景。把ID embedding置零实际上起到了防止 | | | | 08-21 回复 | | ම | | 模型过早依赖ID的regularization ...
为什么给机器人装上昂贵的触觉传感器,反而让它变笨了?
机器之心· 2025-12-03 04:01
这项工作由伊利诺伊大学香槟分校 (UIUC)、哈佛大学、哥伦比亚大学和麻省理工学院 (MIT) 的合作完成 。 我们的解决方案:组合策略 (Compositional Policies) 为什么特征拼接 (Feature Concatenation)会在机器人感知和决策中失效? 想象一下,你在黑漆漆的背包里找钥匙。你的眼睛此时毫无用处,全靠指尖的触觉,这对你来说轻而易举 ,但在机器人领域,这却是一个非常困难的问题。 残酷的真相: 目前的机器人学习主流的多传感器融合的算法(Feature Concatenation)在处理这种任务时彻底失败了。我们的实验数据显示,当你给机器人加上触 觉数据试图让它更聪明时,它的抓取成功率竟然从 35% 暴跌至 5%!为什么? 因为传统的方法把偶尔出现的关键触觉信号当作了 "噪音" 直接过滤掉了。 当前方法的局限性 目前的多模态机器人学习方法通常使用 特征拼接 (Feature Concatenation) :提取所有传感器的嵌入 (embeddings),将其拼接成一个大向量,然后输入到一个单一 的神经网络策 略中 。 论文标题: Multi-Modal Manipulatio ...
借鉴人脑「海马体-皮层」机制,红熊AI重做了一个「记忆系统」
机器之心· 2025-12-03 04:01
Core Insights - The article emphasizes that memory is becoming a critical breakthrough in the evolution of AI, transitioning from "instant answer tools" to "personalized super assistants" [1][4] - A new machine learning paradigm called "Nested Learning" has been proposed, allowing large language models to learn new skills without forgetting old ones, marking significant progress towards AI that mimics human memory [3][4] Group 1: Shifts in AI Landscape - The focus of large models is shifting from size and speed to memory capabilities and understanding user needs, indicating a new competitive landscape in AI [4][5] - Current large models struggle with long-term memory due to inherent limitations in their architecture, leading to issues like forgetting critical user information during interactions [6][7] Group 2: Memory Mechanisms - Existing models typically have context windows of 8k-32k tokens, which can lead to early information being "pushed out" during long conversations, causing loss of context [6] - The lack of a shared memory mechanism among multiple agents results in "memory islands," where users must repeatedly provide information, diminishing the user experience [7] Group 3: Innovations in Memory - Companies like Google, OpenAI, and Anthropic are focusing on enhancing memory capabilities in AI models, responding to industry demands for long-term, stable, and evolving memory systems [7][10] - Red Bear AI has developed "Memory Bear," a product that addresses the memory limitations of traditional models by implementing a human-like memory architecture [10][11] Group 4: Memory Bear's Architecture - "Memory Bear" utilizes a hierarchical, dynamic memory structure inspired by the human brain's hippocampus and cortex, allowing for efficient memory management [11][13] - The system distinguishes between explicit memory (easily codified information) and implicit memory (subjective understanding), enhancing its ability to recall and utilize user-specific data [15][16] Group 5: Practical Applications and Impact - "Memory Bear" has shown significant improvements in various applications, such as AI customer service, where it creates dynamic memory maps for users, enhancing interaction quality and reducing the need for repetitive information sharing [20][21] - In marketing, "Memory Bear" tracks user behavior to create personalized marketing strategies, moving beyond traditional recommendation systems [22] - The technology has also improved knowledge acquisition efficiency in organizations and personalized education experiences, demonstrating its versatility across sectors [23][24] Group 6: Industry Consensus and Future Directions - The consensus in the industry is that memory capabilities are essential for advancing AI technology and applications, with increasing investments and explorations into human-like memory systems [24]
刚刚,「欧洲的DeepSeek」发布Mistral 3系列模型,全线回归Apache 2.0
机器之心· 2025-12-03 00:06
Core Viewpoint - Mistral AI has launched the Mistral 3 series of open models, which are positioned as high-performance, cost-effective alternatives in the AI model landscape, particularly in response to competition from DeepSeek [2][4][28]. Model Details - The Mistral 3 series includes multiple models: Mistral 3 (14B, 8B, 3B) with base, instruction-tuned, and reasoning versions [5][19]. - Mistral Large 3, a state-of-the-art open model, features a total parameter count of 675 billion and 41 billion active parameters, trained on 3000 NVIDIA H200 GPUs [7][5]. Performance and Benchmarking - Mistral Large 3 ranks second in the OSS non-inference model category on the LMArena leaderboard, indicating it is one of the best-performing open models available [14]. - The model demonstrates strong performance in general prompt tasks and excels in image understanding and multilingual dialogue [7][14]. Collaboration and Optimization - Mistral has partnered with vLLM and Red Hat to enhance accessibility and efficiency for developers using Mistral Large 3, utilizing optimized checkpoints for better performance [17][18]. - The collaboration with NVIDIA focuses on advanced optimization techniques, ensuring that Mistral models leverage high-bandwidth memory for demanding workloads [17][18]. Cost-Effectiveness - Mistral claims that its models offer the best cost-performance ratio among open-source models, with instruction models performing comparably or better than competitors while generating tokens at a significantly lower rate [22][28]. Availability and Customization - Mistral 3 models are available on various platforms including Mistral AI Studio, Amazon Bedrock, and Azure Foundry, among others [25]. - The company also offers custom model training services to organizations seeking tailored AI solutions for specific tasks or environments [27].