强化微调（RFT） - filings, earnings calls, financial reports, news

强化微调（RFT）

Search documents

还在卷端到端模型？Embodied-R1另辟蹊径：用“指向”+强化学习实现SOTA性能！

具身智能之心· 2025-09-02 00:03

Core Insights - The article discusses the development of Embodied-R1, a new model designed to bridge the "seeing-to-doing gap" in robotics, which has been a long-standing challenge in the field [2][32] - The model introduces a novel intermediate representation called "pointing," which allows complex operational instructions to be translated into visual points, enhancing the robot's ability to understand and execute tasks [3][10] Group 1: Challenges in Robotics - The "seeing-to-doing gap" is primarily caused by data scarcity and morphological heterogeneity, which hinder effective knowledge transfer in robotics [2] - Existing visual-language-action (VLA) models struggle with performance in new environments, often losing zero-shot operational capabilities [2][10] Group 2: Embodied-R1 Model Overview - Embodied-R1 is a 3 billion parameter model that utilizes "pointing" as an intuitive intermediate representation, defining four key capabilities: REG (representational understanding), RRG (spatial region pointing), OFG (functional part pointing), and VTG (visual trajectory generation) [10][12] - The model has demonstrated superior performance in 11 spatial reasoning and pointing tasks, achieving a 56.2% success rate in the SIMPLEREnv simulation and an impressive 87.5% in eight real-world tasks without fine-tuning [10][27] Group 3: Training Methodology - The model employs a two-phase training curriculum, focusing first on spatial reasoning and then on embodied pointing capabilities, utilizing a large dataset of 200,000 samples [15][16] - Reinforcement fine-tuning (RFT) is introduced to address the "multi-solution dilemma" in pointing tasks, allowing the model to develop a generalized understanding rather than memorizing specific answers [17][19] Group 4: Performance Metrics - Embodied-R1 outperforms other models in various benchmarks, achieving state-of-the-art (SOTA) results in REG, RRG, OFG, and VTG tasks [29][30] - The model's trajectory generation quality is the best among all compared models, which is crucial for reliable robot execution [29] Group 5: Robustness and Adaptability - The model exhibits strong robustness against visual disturbances, maintaining performance even under challenging conditions such as poor lighting and background changes [31] - This adaptability is attributed to the "pointing" representation, which enhances the robot's strategic robustness [31] Group 6: Conclusion - The introduction of Embodied-R1 marks a significant advancement in addressing the long-standing "seeing-to-doing gap" in robotics, providing a promising pathway for developing more powerful and generalizable embodied AI systems [32]

深度｜ARR过亿美金AI招聘00后创始人：未来最有价值的是拥有“反常识性观点”和“品味”的人，人们最应该优化自己的适应性

Z Potentials· 2025-04-24 03:10

图片来源： No Priors Z Highlights Brendan Foody 是 Mercor 的联合创始人兼 CEO ，同时也是一位 Thiel Fellowship 奖学金获得者，他正在推动一场关于 " 人才评估与分配 " 的根本性变革。本文是 No Priors 主持人 Sarah Guo 和 Elad Gil 与 Brendan Foody 的访谈实录。 AI 赋能人才评估的新范式 Brendan Foody：谢谢你邀请我。我很高兴能来到这里。 Sarah Guo：最近六个月你们公司发展得特别快，势头惊人。你能简单介绍一下Mercor到底是做什么的吗？ Brendan Foody：从宏观上来说，我们训练模型来预测一个人能否胜任某项工作，而且比人类判断得更准确。就像人类会审阅简历、面试并决定录用谁一样，我们用LMS系统自动化了整个过程。它的效果非常好，以至于所有顶尖AI实验室都在用它来招聘数以千计的工作人员，这些人正是在训练下一代模型。 Sarah Guo：那这些实验室现在主要在招聘什么样的技能和职位呢？ Brendan Foody：实际上，是所有具有经济价值的技能。因为强化 ...