机器之心

Search documents
NeurIPS 25 | 中大&UC Merced等开源RAPID Hand,重新定义多指灵巧手数据采集
机器之心· 2025-10-14 08:24
| Zhaoliang Wan- Zetong Bi1 Zida Zhou2 Hao Ren1 Yiming Zeng1 Yihan Li1 | | | | | --- | --- | --- | --- | | Lu Oi3 | Xu Yang4 | Ming-Hsuan Yang3 | Hui Cheng1 * | 论文标题:RAPID Hand: A Robust, Affordable, Perception-Integrated, Dexterous Manipulation Platform for Generalist Robot Autonomy 论文地址:https://www.arxiv.org/abs/2506.07490 项目主页:https://rapid-hand.github.io/ 灵巧操作能力是通用机器人实现多任务泛化的核心能力之一。无论是日常的家庭整理、物品归置,还是辅助类服务任务,若缺乏灵巧的操作能力,机器人便难以 真正完成复杂交互。 近年来,随着多模态大模型(VLMs)在机器人控制中的逐步应用,研究者们开始将高质量的操作演示与预训练模型结合,用于具身推理与通用操作策略学 ...
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心· 2025-10-14 06:33
Core Insights - The article discusses a collaborative research paper by OpenAI, Anthropic, and Google DeepMind focusing on evaluating the robustness of language model defense mechanisms against adaptive attacks [2][5][6] - The research highlights that existing defense evaluations are flawed as they do not simulate strong attackers capable of countering defenses [5][6][7] Group 1: Research Framework - A General Adaptive Attack Framework is proposed to systematically assess language model defenses, utilizing optimization methods like gradient descent, reinforcement learning, and human-assisted exploration [6][12] - The study successfully bypassed 12 recent defense mechanisms, with many models showing attack success rates exceeding 90%, despite claims of being nearly unbreakable [6][18] Group 2: Defense Mechanisms Evaluation - The research evaluates various defense strategies, including prompt-based defenses, adversarial training, filtering models, and secret-knowledge defenses, revealing their vulnerabilities against adaptive attacks [18][24][27][30] - For prompt-based defenses like Spotlighting and RPO, the attack success rate under adaptive conditions exceeded 95%, despite low rates in static benchmarks [18][21][23] - Adversarial training methods like Circuit Breakers were easily bypassed, achieving a 100% attack success rate, indicating that training against fixed adversarial samples does not generalize to unseen adaptive attacks [24][26] Group 3: Conclusion and Implications - The findings suggest that relying on single defense strategies is inadequate, as attackers can easily adapt to fixed defenses [9][23] - The research emphasizes the need for dynamic optimization in defense mechanisms to achieve meaningful robustness against evolving threats [26][30]
蚂蚁Ring-1T正式登场,万亿参数思考模型,数学能力对标IMO银牌
机器之心· 2025-10-14 06:33
Core Insights - Ant Group has launched the Ling-1T and Ring-1T models, marking significant advancements in open-source AI with capabilities comparable to closed-source giants [3][6][19] - The Ring-1T model is the first open-source trillion-parameter reasoning model, showcasing exceptional performance in various benchmarks and tasks [6][9][19] Model Launch and Performance - Ant Group announced the Ling-1T model on October 9, which is their largest language model to date, achieving over a thousand downloads within four days of its release [3][5] - Following this, the Ring-1T model was officially launched on October 14, demonstrating superior reasoning abilities and achieving notable results in international mathematics competitions [6][19] Benchmark Testing - The Ring-1T model underwent rigorous testing across eight critical benchmarks, including mathematics competitions, code generation, and logical reasoning [12][14] - Results indicate that Ring-1T significantly outperformed its preview version, achieving state-of-the-art (SOTA) performance in multiple dimensions, particularly in complex reasoning tasks [9][14][16] Competitive Analysis - In logical reasoning tasks, Ring-1T surpassed the performance of leading closed-source models like Gemini-2.5-Pro, showcasing its competitive edge [16] - The model's performance in the Arena-Hard-v2.0 comprehensive ability test was just slightly behind GPT-5-Thinking, placing it among the top-tier models in the industry [16] Practical Applications - Ring-1T demonstrated its coding capabilities by generating functional game code for simple games like Flappy Bird and Snake, showcasing its practical application in software development [20][23] - The model also excelled in creative writing, producing engaging narratives and scripts that incorporate historical facts and storytelling techniques [40][43] Technical Innovations - The development of Ring-1T involved advanced reinforcement learning techniques, particularly the IcePop algorithm, which mitigates training inconsistencies and enhances model stability [45][46] - Ant Group's self-developed RL framework, ASystem, supports the efficient training of large-scale models, addressing hardware resource challenges and improving training consistency [50][52]
斯坦福、英伟达和伯克利提出具身Test-Time Scaling Law
机器之心· 2025-10-14 06:33
Core Insights - The article discusses the advancements in Vision-Language-Action (VLA) models, particularly focusing on the robustness and generalization capabilities in real-world applications through a "generate-and-verify" paradigm [2][5][20]. Group 1: Key Findings - The research team found that increasing the number of candidate actions during the inference phase leads to a continuous decrease in action errors for VLA models [5]. - A power law relationship was established between action errors and the number of Gaussian perturbations sampled, indicating that the robot control problem should be viewed as a combination of generating candidate actions and verifying them [5][20]. - The proposed Test-Time Scaling Law demonstrates predictable improvements in task success rates and stability as the sampling and verification scale increases [2][20]. Group 2: Methodology Overview - The first phase involves training an action verifier using a synthetic action preference dataset derived from the RMSE differences between candidate and ground truth actions [8]. - The second phase focuses on expanding computational resources during inference, utilizing the trained action verifier to enhance the stability of VLA models [9][12]. Group 3: Experimental Results - The integration of RoboMonkey with VLA models resulted in significant performance improvements, including a 25% increase in success rates for out-of-distribution tasks and a 9% increase in the in-distribution SIMPLER environment [17]. - The accuracy of the RoboMonkey verifier showed a log-linear growth with the expansion of the synthetic dataset, leading to enhanced performance in various environments [16]. Group 4: Practical Deployment - A dedicated VLA serving engine was implemented to support high-speed action resampling and efficient construction of action proposal distributions, optimizing inference costs [19]. - The system architecture allows for higher throughput with larger high-bandwidth memory, further enhancing the generalization capabilities of the robotic foundational models [19].
景不动人动,MLLM如何面对「移步换景」的真实世界?OST-Bench揭示多模态大模型在线时空理解短板
机器之心· 2025-10-14 06:33
Core Insights - The article discusses the introduction of OST-Bench, a new benchmark for evaluating multi-modal large language models (MLLMs) in dynamic online environments, emphasizing the challenges of real-world embodied perception and reasoning [2][24]. Group 1: Benchmark Characteristics - OST-Bench reflects the core challenges of embodied perception in real-world settings, contrasting with traditional offline benchmarks that do not account for dynamic scene exploration [2][7]. - The benchmark is designed to assess models' abilities to perform real-time perception, memory maintenance, and spatiotemporal reasoning based on continuous local observations [7][10]. - It includes 15 sub-tasks categorized into judgment, estimation, counting, and temporal localization, with a dataset comprising 10,000 test samples and 50,000 training samples [8][10]. Group 2: Model Performance and Challenges - Current mainstream MLLMs show significant performance gaps compared to human capabilities, particularly in cross-temporal information reasoning [17]. - Models struggle with complex spatiotemporal reasoning tasks, often resorting to "spatio-temporal reasoning shortcuts," leading to superficial answers without adequate reasoning [18][21]. - Fine-tuning experiments indicate that while models can improve their scores by over 10% with additional training data, they still fail to achieve over 50% accuracy in complex reasoning tasks, highlighting the need for better model design and training strategies [23][24].
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
机器之心· 2025-10-14 02:06
| | | 「这是我写过最疯狂的代码之一。」 本周一,AI 领域大神 Andrej Karpathy 发布了自己的最新开源项目,瞬间引来了整个社区的关注。 这个名为 nanochat 的项目据说可以教你从零开始,以 100 美元的成本自建 ChatGPT。它覆盖 LLM 的训练和推理,只要跟着学就可以了解构建大模型的所有步骤 了。 总共是 8000 行代码,在 GitHub 上放出不到 12 个小时,star 量就已经超过 4500: GitHub 链接:https://github.com/karpathy/nanochat 与 Karpathy 之前发布的 nanoGPT 仓库(只覆盖了预训练阶段)不同, nanochat 是一个从零开始实现的、极简但完整的 ChatGPT 克隆版训练 / 推理全流程项目, 所有内容都集中在一个依赖极少、结构干净的代码库中 。 你只需要启动一台云 GPU 机器,运行一个脚本,大约 4 小时后就可以在 ChatGPT 风格的 Web 界面里和你自己的 LLM 聊天。 仓库大约 8,000 行代码 ,但已经实现了以下全部功能: 使用全新的 Rust 实现训练分词器。 在 Fi ...
NeurIPS 25 | GRPO进阶版来了,GVPO重构大模型后训练范式
机器之心· 2025-10-14 02:06
大模型后训练(post-training)正在成为 AI 进化的关键一环。从最早的 SFT(监督微调),再到近来大火的 GRPO,一条核心主线贯穿始终:如何让大模型具有更 强的推理能力、更好地对齐人类偏好,同时保持稳定和高效。 然而,GRPO 虽然在 DeepSeek-R1 等项目中大放异彩,但其训练不稳定、超参数敏感的问题一直限制其大规模落地。 现在,作业帮团队联合香港科技大学(广州)在 NeurIPS 2025 上提出了全新方法: GVPO(Group Variance Policy Optimization) 。GVPO 通过避免重要性采样 解决了 GRPO 的稳定性难题,并能在理论上提供了唯一最优解保证,并且在实验中表现全面超越现有方法。 论文标题: GVPO: Group Variance Policy Optimization for Large Language Model Post-Training GVPO 设计动机 受到 DPO 的启发,研究团队也希望在 GRPO 的场景(即每个 prompt 进行多次采样)下,同样能够利用 KL 约束下 Reward 最大化 的解析解: $R_{\the ...
刚刚,OpenAI官宣自研造芯,联手博通开发10吉瓦规模的AI加速器
机器之心· 2025-10-13 23:56
| | | 今天凌晨,OpenAI 又搞出了一个大新闻! 这家 AI 巨头宣布与全球领先的芯片厂商之一博通建立战略合作,共同部署由前者设计的 10 吉瓦规模的 AI 加速器 。吉瓦是一个功率单位,1 吉瓦等于 100 万千 瓦。举例来说,一个普通家庭的峰值用电功率可能在 10 千瓦左右。这意味着,1 吉瓦的电力大约可以同时为 10 万个家庭供电。 预计双方将自 2026 年下半年起部署配备 AI 加速器与网络系统的机架,并在 2029 年底前完成全部部署。 就在上个月, OpenAI 宣布与英伟达建立战略合作伙伴关系 ,并将部署同样 10 吉瓦规模的英伟达系统。此次,与博通合作造芯势必将减少对英伟达 GPU 的高度 依赖,转向「自主 + 合作」并行的多元化算力策略。 正如一位网友所言,「OpenAI 简直等不及英伟达了,于是下场自己造芯。」 接下来看完整公告内容: 其中 OpenAI 将负责设计这些加速器及系统,并与博通联合开发与部署 。通过自研芯片与系统,OpenAI 能够将其在前沿模型和产品研发中积累的经验直接融入硬 件设计,从而释放出全新的能力与智能水平。 今日,OpenAI 与博通宣布展开合作,共同打 ...
只需1/4预算,性能反超基线:阿里高德提出Tree-GRPO,高效破解智能体RL难题
机器之心· 2025-10-13 23:56
对于大模型的强化学习已在数学推理、代码生成等静态任务中展现出不俗实力,而在需要与开放世界交互的智能体任务中,仍面临「两朵乌云」:高昂的 Rollout 预算(成千上万的 Token 与高成本的工具调用)和极其稀疏的「只看结果」的奖励信号。 来自阿里高德的一篇最新研究论文提出了面向 Agent RL 的 Tree-GRPO 方法,将独立的链式采样改造为智能体步骤级的树搜索。该方法通过共享前缀、一次扩展 多个分支,在相同预算下获得更丰富的有效轨迹;更重要的是,仅凭最终奖励即可沿树结构回溯出过程中的偏好信号,等价于隐式的步骤级偏好学习。 在 11 个知识密集型、网络搜索问答任务数据集中,Tree-GRPO 在多种模型规模上 更省预算、更高表现 ,显著优于链式 RL 方法,甚至能在 1/4 预算的情况下超越 GRPO 基线,为 Agentic RL 的高效训练提供了新的解决思路。 论文标题:Tree Search for LLM Agent Reinforcement Learning 以「智能体步骤」为节点进行树搜索 树方法相较链方法的区别与优势 论文地址: https://arxiv.org/abs/2509.2 ...
CoT 之后,CoF 如何让帧间逻辑从「隐式对齐」变成「显式思考」?
机器之心· 2025-10-13 09:24
Group 1 - The article discusses the limitations of Chain-of-Thought (CoT) reasoning in language models, suggesting that it may not represent true reasoning but rather a superficial narrative [5][6] - Researchers have introduced the Chain-of-Frames (CoF) concept in the visual domain, which aims to enhance temporal consistency in video generation and understanding by applying a reasoning framework similar to CoT [6][9] - CoF allows video models to "watch and think," enabling them to not only fill in visual details but also solidify reasoning logic through the continuous evolution of each frame [6][9] Group 2 - CoF provides a natural temporal reasoning framework for video models, allowing them to perform reasoning on a frame-by-frame basis, thus addressing the temporal consistency issues in video generation and understanding [11] - Unlike traditional methods that rely on implicit feature alignment or smooth transitions, CoF ensures that each frame follows a logical evolution, reducing inconsistencies and detail loss across frames [12] - The integration of frame-level semantic information into video models significantly enhances their reasoning capabilities and cross-frame consistency [13]