Workflow
强化微调(RFT)
icon
Search documents
还在卷端到端模型?Embodied-R1另辟蹊径:用“指向”+强化学习实现SOTA性能!
具身智能之心· 2025-09-02 00:03
Core Insights - The article discusses the development of Embodied-R1, a new model designed to bridge the "seeing-to-doing gap" in robotics, which has been a long-standing challenge in the field [2][32] - The model introduces a novel intermediate representation called "pointing," which allows complex operational instructions to be translated into visual points, enhancing the robot's ability to understand and execute tasks [3][10] Group 1: Challenges in Robotics - The "seeing-to-doing gap" is primarily caused by data scarcity and morphological heterogeneity, which hinder effective knowledge transfer in robotics [2] - Existing visual-language-action (VLA) models struggle with performance in new environments, often losing zero-shot operational capabilities [2][10] Group 2: Embodied-R1 Model Overview - Embodied-R1 is a 3 billion parameter model that utilizes "pointing" as an intuitive intermediate representation, defining four key capabilities: REG (representational understanding), RRG (spatial region pointing), OFG (functional part pointing), and VTG (visual trajectory generation) [10][12] - The model has demonstrated superior performance in 11 spatial reasoning and pointing tasks, achieving a 56.2% success rate in the SIMPLEREnv simulation and an impressive 87.5% in eight real-world tasks without fine-tuning [10][27] Group 3: Training Methodology - The model employs a two-phase training curriculum, focusing first on spatial reasoning and then on embodied pointing capabilities, utilizing a large dataset of 200,000 samples [15][16] - Reinforcement fine-tuning (RFT) is introduced to address the "multi-solution dilemma" in pointing tasks, allowing the model to develop a generalized understanding rather than memorizing specific answers [17][19] Group 4: Performance Metrics - Embodied-R1 outperforms other models in various benchmarks, achieving state-of-the-art (SOTA) results in REG, RRG, OFG, and VTG tasks [29][30] - The model's trajectory generation quality is the best among all compared models, which is crucial for reliable robot execution [29] Group 5: Robustness and Adaptability - The model exhibits strong robustness against visual disturbances, maintaining performance even under challenging conditions such as poor lighting and background changes [31] - This adaptability is attributed to the "pointing" representation, which enhances the robot's strategic robustness [31] Group 6: Conclusion - The introduction of Embodied-R1 marks a significant advancement in addressing the long-standing "seeing-to-doing gap" in robotics, providing a promising pathway for developing more powerful and generalizable embodied AI systems [32]
深度|ARR过亿美金AI招聘00后创始人:未来最有价值的是拥有“反常识性观点”和“品味”的人,人们最应该优化自己的适应性
Z Potentials· 2025-04-24 03:10
图片来源: No Priors Z Highlights Brendan Foody 是 Mercor 的联合创始人兼 CEO ,同时也是一位 Thiel Fellowship 奖学金获得者,他正在推动一场关于 " 人才评估与分配 " 的根本性变革。本文 是 No Priors 主持人 Sarah Guo 和 Elad Gil 与 Brendan Foody 的访谈实录。 AI 赋能人才评估的新范式 Brendan Foody: 谢谢你邀请我。我很高兴能来到这里。 Sarah Guo: 最近六个月你们公司发展得特别快,势头惊人。你能简单介绍一下Mercor到底是做什么的吗? Brendan Foody: 从宏观上来说,我们训练模型来预测一个人能否胜任某项工作,而且比人类判断得更准确。就像人类会审阅简历、面试并决定录用谁一 样,我们用LMS系统自动化了整个过程。 它的效果非常好,以至于所有顶尖AI实验室都在用它来招聘数以千计的工作人员,这些人正是在训练下一代模 型。 Sarah Guo: 那这些实验室现在主要在招聘什么样的技能和职位呢? Brendan Foody: 实际上,是所有具有经济价值的技能。 因为强化 ...