Workflow
强化学习
icon
Search documents
Physical Intelligence最新发布的VLA模型,为什么是机器人通往规模化部署的拐点?|Jinqiu Select
锦秋集· 2025-11-18 11:13
Core Insights - The article discusses the limitations of current robot foundational models that primarily rely on demonstration data, highlighting the need for a structured reinforcement learning (RL) framework called Recap to enhance robot performance and reliability [2][3][10]. Group 1: Limitations of Current Models - Current models depend heavily on demonstration data, which incurs high human costs and limits the strategies to human-level performance, lacking self-improvement capabilities [2][10]. - The article emphasizes that merely increasing model size is insufficient; a restructured training paradigm is essential for robots to transition from "can demonstrate" to "can deploy at scale" [3][10]. Group 2: Introduction of Recap Framework - Recap integrates three training phases: demonstration, correction, and robot autonomous rollouts, allowing for continuous improvement in strategy quality [2][10]. - The framework addresses the compounding error problem in robot strategies by systematically utilizing correction data, value functions, and advantages [3][10][12]. Group 3: Performance of π*(0.6) Model - The π*(0.6) model, with 5 billion parameters, demonstrates the ability to handle heterogeneous prompts and achieve performance thresholds suitable for commercial deployment [3][20]. - The model shows significant improvements in task execution, achieving over 90% success rates in complex tasks such as making espresso, folding clothes, and assembling boxes [25][20]. Group 4: Learning Process and Challenges - The learning process involves three stages: offline reinforcement learning pre-training, task-specific fine-tuning, and continuous improvement through real-world experience [19][20]. - The article outlines the challenges faced in high-throughput, autonomous execution, particularly in tasks requiring complex physical operations and adaptability to various conditions [24][20]. Group 5: Data Sources for Learning - The article identifies three data sources for robot learning: expert demonstrations for defining new behaviors, guidance for refining strategies, and autonomous experience for behavior enhancement [27][28]. - It posits that autonomous experience may become a crucial data source as robots are deployed more widely in real-world applications, potentially enabling performance that surpasses human capabilities [27][28].
从投稿来看,具身方向的论文已经出现了堆积.......
具身智能之心· 2025-11-18 10:00
最近陆续有几个会议结束了投稿,虽然还没开奖,但投稿数量着实很大。也有很多同学着急忙慌地选择转 投其它会议,什么会议更适合自己?什么方向审稿人更青睐?这是很多同学非常关注的点。其中不乏大模 型、传统机器人、机械方向的同学,还有很多新手。 先看看具身的一些方向,vln、vla、强化、还有一些real2sim2real。很多小白不知道如何下手,选择强化学 习还是vla?传统slam还是vln?哪些方向需要较大算力,哪些不需要?除此之外,什么样的本体适合自己研 究,预算不够怎么办?仿真可以吗? 人形机器人在强化与sim2real/real2sim2real研究上较为活跃,如果实验室有相关本体,可以从这几个方向入 手。 为什么选择我们? 剩下就是一些方法论的问题了,有好的idea至关重要。对很多新人研究者,一个好的idea需要踩很多次坑。 如果你还是新人,不知道怎么入门,可以看看我们推出的论文辅导。 论文辅导上线了 【具身智能之心论文辅导重磅上线!多模态大模型/VLA/强化学习/VLN/遥操作/数采/机器人仿 真/real2sim2real/端到端/diffusion等顶会方向1V1定制化辅导】 辅导区间 CCF-A到 ...
刚刚,中美机器人爆发了一场论战
Hua Er Jie Jian Wen· 2025-11-18 08:41
一段"无加速、无遥控"的机器人视频,竟然让硅谷大佬坐不住了 最近,一段来自中国初创公司的机器人视频在全球范围内引发了轩然大波。更有意思的是,这段视频不仅展示了令人 惊艳的技术实力,还意外引发了一场跨越太平洋的"真假之辩"。 中美机器人爆发了一场"论战" 就在最近,一个视频在国内外科技圈引爆了。 视频的主角,是一个来自中国的人形机器人。它能浇花、扔垃圾、整理玩具、和孩子们玩飞盘,动作流畅得令人惊 叹。 更关键的是,发布方——一家名为"灵启万物(MindOn Tech)"的深圳初创公司强调,整个过程"无加速、无遥控", 完全由机器人自主完成。 这家成立不久的公司背景也不简单,创始人来自腾讯。他们使用的硬件,是另一家中国公司宇树科技(Unitree)的 G1人形机器人。 "这是假的!" 美国CEO下场质疑 这个视频如同一块巨石投入平静的湖面,迅速激起千层浪。视频的火爆,很快引来了大洋彼岸的关注。 有美国网友直接在社交平台X上@了"美国宇树"Figure的创始人兼CEO——布雷特·爱德考克(Brett Adcock),问 他:"这是真的吗?" 这位CEO的回应,给火热的讨论浇上了一盆冷水: 看起来像是一个开环回放的R ...
马斯克抢先谷歌一步放大招,Grok 4.1登顶LMArena,创意写作直逼GPT-5.1
AI前线· 2025-11-18 05:34
Core Insights - The article discusses the launch of xAI's latest model, Grok 4.1, which significantly improves response speed and reduces hallucination rates, offering more accurate and human-like answers [2][10][28] Model Overview - Grok 4.1 and Grok 4.1 Thinking are the two forms released, with the latter being an enhanced reasoning variant based on the same underlying model [2][10] - Grok 4.1 is available for free on various platforms, including a mobile app for both iOS and Android [2] Performance Metrics - Grok 4.1 Thinking leads the LMArena leaderboard with an Elo score of 1483, surpassing Gemini 2.5 Pro by 31 points [4][11] - Even without the reasoning mode, Grok 4.1 maintains a strong second place with an Elo score of 1465, indicating stable underlying capabilities [5][11] Training and Improvements - The model's training involved a large-scale reinforcement learning system, enhancing its output stability, factual accuracy, and reducing hallucination rates from 12.09% to 4.22% [12][13] - Grok 4.1's FActScore improved from 9.89 to 2.97, showcasing its enhanced ability to provide factually accurate responses [15] Emotional Intelligence and Creative Writing - Grok 4.1 achieved a high score of 1586 Elo in the EQ-Bench test, indicating significant improvements in emotional understanding compared to its predecessor [16][18] - In Creative Writing v3, Grok 4.1 scored 1722 Elo, reflecting a substantial increase in narrative quality and creativity [20][23] User Experience and Interaction - The model offers a more stable personality and better understanding of user intent, resulting in a more natural interaction style [26] - During a silent release phase, Grok 4.1 was preferred by users 64.78% of the time in blind comparisons, indicating strong user approval [26] Conclusion - Grok 4.1 represents a comprehensive upgrade across various dimensions, including performance, factual reliability, emotional intelligence, and user interaction, positioning xAI competitively in the large model landscape [28]
Physical Intelligence团队正式发布π*0.6!VLA+强化学习训练达到实际可用的鲁棒性水平
具身智能之心· 2025-11-18 03:38
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Physical Intelligence团队 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 11月17号!Physical Intelligence团队正式发布 ,从经验中学习的VLA。 项目链接:https://www.pi.website/blog/pistar06 论文链接:https://www.pi.website/download/pistar06.pdf VLA模型如何通过强化学习在现实部署中实现自我改进? 提出了一种通用方法RECAP:基于经验与校正的优势条件策略强化学习,该方法通过优势条件机制 实现VLA模型的强化学习训练。 该方法将异构数据整合到自我改进过程中,包括演示数据、在线收集数据以及在自主执行期间专家远程干预数据。RECAP方法首先通过离线强化学习预训练通用型 VLA模型(记为 ),该模型随后可通过机器人现场数据收集实现下游任务的专业化性能提升。 实验表明 ...
做了一份端到端进阶路线图,面向落地求职......
自动驾驶之心· 2025-11-18 00:05
Core Insights - There is a significant demand for end-to-end and VLA (Vision-Language Agent) technical talent in the automotive industry, with salaries for experts reaching up to $70,000 per month for positions requiring 3-5 years of experience [1] - The technology stack for end-to-end and VLA is complex, involving various advanced algorithms such as BEV perception, Vision-Language Models (VLM), diffusion models, reinforcement learning, and world models [1] - The company is offering specialized courses to help individuals quickly and efficiently learn about end-to-end and VLA technologies, collaborating with experts from both academia and industry [1] Course Offerings - The "End-to-End and VLA Autonomous Driving Course" focuses on the macro aspects of end-to-end autonomous driving, covering key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [10] - The "Autonomous Driving VLA and Large Model Practical Course" is led by academic experts and covers VLA from the perspective of VLM as an autonomous driving interpreter, modular VLA, and current mainstream inference-enhanced VLA [1][10] - Both courses include practical components, such as building a VLA model and dataset from scratch, and implementing algorithms like the Diffusion Planner and ORION algorithm [10][12] Instructor Profiles - The instructors include experienced professionals and researchers from top institutions, such as Tsinghua University and QS30 universities, with backgrounds in multimodal perception, autonomous driving VLA, and large model frameworks [6][9][12] - Instructors have published numerous papers in prestigious conferences and have hands-on experience in developing and deploying advanced algorithms in the field of autonomous driving [6][9][12] Target Audience - The courses are designed for individuals with a foundational knowledge of autonomous driving, familiar with basic modules, and concepts related to transformer large models, reinforcement learning, and BEV perception [14] - Participants are expected to have a background in probability theory and linear algebra, as well as proficiency in Python and PyTorch [14]
招募自动驾驶产品经理/强化学习方向合伙人!
自动驾驶之心· 2025-11-15 03:03
Core Viewpoint - The article emphasizes the need for deeper technical exploration and collaboration in the autonomous driving industry, highlighting the importance of addressing the challenges and pain points within the sector [2]. Group 1: Industry Direction - The main focus areas for development include but are not limited to: autonomous driving product management, 4D annotation/data loop, world models, VLA, autonomous driving large models, reinforcement learning, and end-to-end systems [4]. Group 2: Job Description - The positions are primarily aimed at autonomous driving training collaborations, targeting both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) audiences for training, course development, and original article creation [5]. Group 3: Collaboration Invitation - The industry is calling for more talented individuals to join and contribute to the advancement of autonomous driving technology [3]. Group 4: Contact Information - For further communication regarding compensation and collaboration methods, interested parties are encouraged to add the WeChat contact provided [6].
端到端自动驾驶算法工程师的一天
自动驾驶之心· 2025-11-15 03:03
Core Viewpoint - The article emphasizes the importance of end-to-end algorithms in autonomous driving, highlighting the shift from rule-based algorithms to learning-based approaches, particularly in the context of congestion and dynamic obstacle scenarios [4][7]. Summary by Sections Overview of End-to-End Tasks - The transition to end-to-end systems merges perception tasks and emphasizes the learning-based approach for control algorithms, which is now a mainstream requirement for companies [7]. Two-Stage End-to-End Algorithm Framework - The two-stage framework is discussed, including its modeling methods and the information transfer between perception and planning, navigation, and control (PNC) [8]. One-Stage End-to-End Algorithm - The one-stage framework allows for lossless information transfer, providing superior performance compared to the two-stage approach. Various one-stage frameworks, including those based on VLA and diffusion methods, are introduced [9]. Navigation Information in Production - Navigation information is crucial for guiding and selecting routes in autonomous driving. The chapter covers mainstream navigation map formats and how to effectively encode and embed navigation maps in end-to-end models [10]. Introduction to Reinforcement Learning Algorithms - The necessity of integrating reinforcement learning with imitation learning is highlighted, as it helps machines learn causal relationships and generalize better in diverse driving scenarios [11]. End-to-End Trajectory Output Optimization - This section focuses on practical projects involving trajectory planning, emphasizing the combination of imitation learning and reinforcement learning techniques [12]. Safety Net Solutions - Spatiotemporal Joint Planning - The importance of post-processing logic to ensure model accuracy is discussed, including trajectory smoothing algorithms to enhance stability and reliability [13]. Experience Sharing in End-to-End Production - The final chapter shares insights on production experiences from various perspectives, including data, models, scenarios, and rules, to improve system capabilities [14]. Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [15][16].
Nature公开谷歌IMO金牌模型技术细节!核心团队仅10人,一年给AI编出8000万道数学题训练
创业邦· 2025-11-14 10:24
Core Insights - Google DeepMind has publicly released the complete technology and training methods behind its new model, AlphaProof, which is designed for mathematical proofs [2][4] - The model utilizes a 3 billion parameter encoder-decoder transformer architecture and incorporates a reinforcement learning environment based on the Lean theorem prover [8][7] Development Process - The AlphaProof team was relatively small, with around 10 core members, and was led by IMO gold medalist Miklós Horváth, who developed a method for creating various problem variants for training [4][5] - Over the course of a year, the team explored various research ideas, integrating successful approaches into the AlphaProof system [5] Training Methodology - AlphaProof transforms the mathematical proof process into a game-like environment, where each mathematical proposition serves as a new game level [7] - The model was pre-trained on approximately 300 billion tokens of code and mathematical text, followed by fine-tuning with around 300,000 manually crafted proofs from the Mathlib library [9][10] - A significant breakthrough was achieved through an automated formalization process that generated about 80 million formalized problems from 1 million natural language math questions [10] Performance at IMO 2024 - AlphaProof demonstrated impressive performance at the 2024 IMO, solving three problems, including the most difficult one, P6, which only 5 out of 609 participants solved completely [15][16] - The model utilized a testing-time reinforcement learning mechanism to generate around 400,000 related problem variants for particularly challenging questions [13][15] Future Directions - Following its success, DeepMind has opened access to AlphaProof for researchers, allowing them to explore its capabilities [19] - While AlphaProof excels in identifying counterexamples and formalizing statements, it faces challenges with custom definitions and relies heavily on the Lean theorem prover [20] - The model's dependency on Lean's evolving environment and the limited availability of unique mathematical problems present ongoing challenges for its broader applicability [20]
聊AI,当然得来量子位MEET大会!首波嘉宾阵容曝光
量子位· 2025-11-14 08:22
Core Insights - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries and society as a whole, marking the beginning of a new era in 2025 [1]. Event Overview - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry advancements, particularly in AI [2]. - The theme of the conference is "Symbiosis Without Boundaries, Intelligence to Ignite the Future," highlighting how AI is becoming a core driving force for societal evolution [3]. Key Topics - The conference will cover hot topics in the tech circle, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI's global expansion [4]. - It will showcase the latest collisions between academic frontiers and commercial applications, featuring leading technological achievements from infrastructure, models, and products [5]. Reports and Awards - The conference will also feature the authoritative release of the annual AI rankings and the Annual AI Trends Report, which will analyze significant trends in AI [6]. Notable Speakers - The event will host prominent figures from academia and industry, including: - Zhang Yaqin, a renowned scientist and entrepreneur in digital video and AI [13]. - Sun Maosong, Executive Vice President of the Tsinghua University AI Research Institute [17]. - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence [21]. - Other notable speakers include experts from various leading tech companies and research institutions [30][35][40][44][48][53][57]. AI Annual Rankings - The "AI Annual Rankings," initiated by Quantum Bit, has become one of the most influential rankings in the AI industry, evaluating companies, products, and individuals across three dimensions [60]. - The submission for the rankings is open until November 17, 2025 [61]. AI Trends Report - The "2025 Annual AI Top Ten Trends Report" will focus on the main themes of technological development in AI, analyzing the maturity, current status, and potential value of various trends [65]. - The case collection for the report is open until November 20, 2025 [66]. Conference Logistics - The MEET2026 Intelligent Future Conference will take place at the Beijing Jinmao Renaissance Hotel, with registration now open [70]. - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual benchmark for the intelligent technology industry [72].