强化学习
Search documents
深扒PI π*0.6迭代式强化学习思路:VLA+在线RL,实现自我进化
具身智能之心· 2025-12-07 03:03
见证具身浪潮,书写智能新纪元 以下文章来源于具身纪元 ,作者具身纪元 具身纪元 . 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在Physical Intelligence 最新的成果π 0.6 论文里,他们介绍了 π 0 .6迭代式强化学习的思路来源: 其中有我们熟悉的Yuke Zhu的研究,也有他们自己(Chelsea Finn、Sergey Levine)的一些研究,我们之前对这些工作一直有跟踪和介绍。此外,还有来自国内具身智能团队的 工作,比如清华大学、星动纪元的研究。 随着π*0.6的发布,VLA+online RL成为了一个行业共识的非常有前景的研究方向 深扒了Π*0.6的论文,发现它不止于真实世界强化 学习 英伟达也来做VLA在真实世界自我改进的方法了 大语言模型从SFT到RL的发展方向也逐渐在具身研究中清晰明朗。 一、为什么VLA+RL很重要 编辑丨 具身纪元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 图注:VLA模型依赖研读微调 在具身智能(Embodied AI)领域,科学家 ...
英伟达巧用8B模型秒掉GPT-5,开源了
量子位· 2025-12-06 05:40
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 英伟达端着一个8B小模型对GPT-5说: 不好意思,你还得练(bushi)。 何出此言?——英伟达携手香港大学开源的 Orchestrator-8B ,人类终极考试HLE分数更高、花钱更少、跑起来速度还更快。 哦对了,还在HuggingFace被狂赞,冲到了热门模型前五。 | Models 2,261,108 ( Filter by name | Full-text search ¥ Inf | | --- | --- | | Tongyi-MAI/Z-Image-Turbo | | | 136 Text-to-Image · Updated 3 days ago · ¿ 136k · † · ♡ 2.08k | | | deepseek-ai/DeepSeek-V3.2 | | | 17 Text Generation · .:: 685B · Updated 4 days ago · ¿ 8.69k · ↓ · ♡ 714 | | | � deepseek-ai/DeepSeek-V3.2-Speciale | | | 17 Text Generati ...
Yann LeCun离开Meta后首篇论文?使用了宇树机器人做研究
机器之心· 2025-12-06 04:08
Core Insights - The article discusses a groundbreaking research paper that introduces a method called GenMimic, enabling humanoid robots to perform actions generated from AI video models without prior examples [1][3][4]. Research Contributions - The research presents a universal framework for humanoid robots to execute actions generated by video models [4]. - GenMimic employs a new reinforcement learning strategy that utilizes symmetric regularization and selectively weighted 3D keypoint rewards for training, allowing generalization to noisy synthetic videos [4]. - The team created a synthetic human action dataset named GenMimicBench, which serves as a scalable benchmark for evaluating zero-shot generalization and policy robustness [4][8]. GenMimicBench Dataset - GenMimicBench consists of 428 generated videos created using advanced video generation models Wan2.1 and Cosmos-Predict2 [9][11]. - The dataset includes a wide range of subjects, environments, and action types, from simple gestures to complex interactions with objects [11][13]. - It is designed to stress-test the robustness of humanoid robot control strategies under varying visual and action distributions [13]. Methodology Overview - The proposed method involves a two-stage process for executing humanoid robot actions from generated videos [15][17]. - The first stage focuses on reconstructing the humanoid robot's 4D model from the input RGB video, while the second stage translates this model into executable actions [17][18]. - The strategy emphasizes robustness to variations and noise in the input data by using 3D keypoints instead of joint angles [19][20]. Experimental Results - The team conducted extensive experiments on both the GenMimicBench dataset and a real-world 23-DoF humanoid robot, demonstrating significant improvements over strong baseline models [29][30]. - In simulations, GenMimic achieved a success rate (SR) of 29.78% and outperformed existing models in various metrics [31]. - Real-world experiments showed that the strategy successfully replicated a wide range of upper-body actions, although challenges remained with lower-body movements [34][35].
碾压π0.5,复旦团队首创「世界模型+具身训练+强化学习」闭环框架
机器之心· 2025-12-04 08:18
Core Viewpoint - The Vision–Language–Action (VLA) strategy is becoming a crucial technological pathway for robots to achieve general operational intelligence, enabling simultaneous processing of visual perception, language instructions, and generation of continuous control signals [2]. Group 1: Challenges in Current VLA Approaches - Most current VLA methods rely heavily on imitation learning, which can lead to error accumulation and task failure when there are distribution shifts or changes in task forms [3][11]. - Implementing online reinforcement learning (RL) on real robots is costly and limited by the need for extensive human intervention and monitoring, making large-scale deployment impractical [12]. - Traditional physics engines struggle to balance realism, scene diversity, and engineering usability, complicating the use of RL in simulated environments [13]. Group 2: ProphRL Framework - The research team proposed the ProphRL framework, utilizing a large-scale pre-trained world model called Prophet as a video-level simulator to optimize VLA strategies through online RL algorithms [4]. - This approach allows for significant reductions in real-world interaction costs while maintaining physical credibility, facilitating the practical implementation of large model VLA strategies [4]. Group 3: Experimental Results - ProphRL demonstrated a success rate improvement of 5–17% across various VLA models in public benchmarks, with real robot experiments showing a substantial success rate increase of 24–30% [8]. - The Prophet model achieved leading performance in visual fidelity and action consistency across multiple datasets, showcasing its ability to generalize across new scenes and tasks with minimal fine-tuning [31]. Group 4: Innovations in RL Algorithms - The research introduced FA-GRPO and FlowScale, RL algorithms tailored for flow-based action heads, enhancing training stability and performance by reorganizing gradient signals and balancing contributions from different steps [26][27]. - A video-language reward model was developed to evaluate task success based on the entire trajectory, moving away from manually designed geometric distances [26]. Group 5: Real-World Validation - The ProphRL framework was validated on real robots, achieving significant improvements in task success rates across various complex tasks, indicating the effectiveness of the world model and RL integration in practical applications [38].
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-12-04 05:57
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1][2] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies are penetrating various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - There will be a collision of academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will feature the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, will be a key speaker [11][12] - Sun Maosong, Executive Vice President of Tsinghua University's AI Research Institute, has led numerous national projects and is a prominent figure in AI research [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has extensive experience in AI core technology development and has published over 100 papers [19] Group 3: Industry Impact - The annual AI rankings initiated by Quantum Bit have become one of the most influential lists in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The annual AI trend report will analyze ten significant AI trends based on technology maturity, implementation status, and potential value, highlighting representative organizations and best cases [118] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
速报!MEET2026嘉宾阵容再更新,观众报名从速
量子位· 2025-12-03 02:38
Core Insights - The MEET2026 Smart Future Conference will focus on cutting-edge technologies and industry developments that have garnered significant attention throughout the year [1] - The theme "Symbiosis Without Boundaries, Intelligence to Ignite the Future" emphasizes how AI and smart technologies penetrate various industries, disciplines, and scenarios, becoming a core driving force for societal evolution [2] Group 1: Conference Highlights - The conference will cover hot topics in the tech circle this year, including reinforcement learning, multimodal AI, chip computing power, AI in various industries, and AI going global [3] - It will feature the latest collisions between academic frontiers and commercial applications, showcasing leading technological achievements from infrastructure, models, and product industries [4] - The event will also include the authoritative release of the annual AI rankings and the annual AI trend report [5][116] Group 2: Notable Speakers - Zhang Yaqin, President of Tsinghua University's Intelligent Industry Research Institute and an academician of the Chinese Academy of Engineering, will be a key speaker [11] - Sun Maosong, Executive Vice President of Tsinghua University's Artificial Intelligence Research Institute, has led numerous national projects [15] - Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, has extensive experience in AI core technology research [19] - Wang Ying, Vice President of Baidu Group, oversees several key business units including Baidu Wenku and Baidu Netdisk [24] - Han Xu, Founder and CEO of WeRide, has led the company to become a leader in autonomous driving technology [28] Group 3: Annual AI Rankings and Trends - The "Artificial Intelligence Annual Rankings" initiated by Quantum Bit has become one of the most influential rankings in the AI industry, evaluating companies, products, and individuals across three dimensions [117] - The "2025 Annual AI Top Ten Trends Report" will analyze ten AI trends that are releasing significant potential, considering factors like technological maturity and practical application [118] Group 4: Event Details - The MEET2026 Smart Future Conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [119] - The conference aims to attract thousands of tech professionals and millions of online viewers, establishing itself as an annual barometer for the smart technology industry [122]
AI产业速递:从DeepSeek V3
2025-12-03 02:12
AI 产业速递:从 DeepSeek V3.2 看强化学习的新变化 20251202 摘要 Deepseek V3.2 通过 DSA 机制优化推理效率,减少冗余计算,尤其在 复杂任务中表现突出,取代了之前的 MLA 机制。 Deepseek V3.2 的 C9 版本在后训练阶段通过投入 10%的预训练计算 量,显著提升了模型在复杂任务(如代码调试)中的强化学习能力,达 到全球领先水平。 V3.2 采用高效的上下文管理策略,智能处理用户频繁开启新任务、多轮 对话及模糊输入,有效降低推理成本。 V3.2 使用大量人类专家编写并增量训练生成的高难度合成数据,比例较 之前增加一倍以上,对后续强化学习阶段至关重要,并消耗了大量算力。 Deepseek 在后训练阶段的创新,包括开源后训练结果和支持 Agent 调 用能力,使得开源模型在功能上可与闭源模型媲美,可能引领开源项目 的新趋势。 DeepMind 的新框架结合 Rubik's 规则提示机制,提高了强化学习效率, 促使大型科技公司加速探索多模态视频和图像领域的应用,推动 2025 年相关模型的发展。 稀疏化技术降低了训练算力要求,并提升了训练上限,预计到 2026 ...
DeepSeekV3.2技术报告还是老外看得细
量子位· 2025-12-03 00:11
Core Insights - The article discusses the launch of two open-source models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have gained significant attention in Silicon Valley, indicating a shift in the competitive landscape of AI models [2][6]. Group 1: Model Performance - DeepSeek-V3.2 has achieved the highest level among current open-source models, significantly narrowing the gap with top closed-source models [6]. - The standard version of DeepSeek-V3.2 reached performance levels comparable to GPT-5, while the Speciale version surpassed GPT-5 and competed closely with Gemini-3.0-Pro in mainstream reasoning tasks [7][8]. - DeepSeek-V3.2-Speciale won gold medals in various competitions, demonstrating its advanced capabilities [9]. Group 2: Technical Innovations - The model utilizes DSA sparse attention to address efficiency issues with long contexts, laying the groundwork for subsequent long-sequence reinforcement learning [14]. - By introducing scalable reinforcement learning and allocating over 10% of pre-training compute for post-training, the model significantly enhances general reasoning and agent capabilities [15]. - The Speciale version allows for extended reasoning chains, enabling deeper self-correction and exploration, which unlocks stronger reasoning abilities without increasing pre-training scale [16][17]. Group 3: Economic Implications - DeepSeek-V3.2 is approximately 24 times cheaper than GPT-5 and 29 times cheaper than Gemini 3 Pro in terms of output token costs [29][30]. - The cost of using DeepSeek-V3.2 for generating extensive content is significantly lower, making it an economically attractive option compared to its competitors [31][32]. - The model's deployment on domestic computing power (e.g., Huawei, Cambricon) could further reduce inference costs, posing a challenge to established players like Google and OpenAI [36]. Group 4: Market Impact - The success of DeepSeek-V3.2 challenges the notion that open-source models lag behind closed-source ones, indicating a potential shift in market dynamics [10][26]. - The article highlights that the gap between DeepSeek and top models is now more of an economic issue rather than a technical one, suggesting that with sufficient resources, open-source models can compete effectively [26].
AI初创公司Runway推出影片生成模型Gen 4.5;字节Seed发布GR-RL,首次实现真机强化学习穿鞋带丨AIGC日报
创业邦· 2025-12-03 00:08
Group 1 - Keling AI officially launched its new product "Keling O1," which integrates multi-modal inputs such as text, video, images, and subjects into a comprehensive engine, addressing consistency issues in AI video generation for applications in film, self-media, and e-commerce [2] - OpenAI is reportedly considering embedding advertisements in ChatGPT, with recent Android test versions containing code labeled as "featured ads," indicating a shift towards personalized advertising based on user interactions [2] - ByteDance's Seed team released GR-RL, achieving a significant improvement in the success rate of a shoe-lacing task from 45.7% to 83.3%, marking a notable advancement in reinforcement learning for fine manipulation tasks [2] Group 2 - AI startup Runway introduced its latest film generation model Gen 4.5, which outperformed Google and OpenAI in third-party evaluations, showcasing its ability to generate high-quality videos based on textual instructions [3]
最近,自动驾驶的岗位招聘有一些新的变化......
自动驾驶之心· 2025-12-03 00:04
Core Viewpoint - The article discusses the evolving recruitment demands in the autonomous driving sector, highlighting a shift from perception roles to end-to-end, VLA, and world model positions, indicating a broader technical skill requirement for candidates [1][2]. Group 1: Course Overview - The course titled "End-to-End Practical Class for Mass Production" focuses on practical applications in autonomous driving, covering various algorithms and real-world production experiences [2][3]. - The course is designed for a limited number of participants, with only 25 spots available, emphasizing a targeted approach to training [2][3]. Group 2: Course Structure - Chapter 1 introduces the overview of end-to-end tasks, discussing the integration of perception tasks and the learning-based control algorithms that are becoming mainstream [6]. - Chapter 2 covers the two-stage end-to-end algorithm framework, explaining the modeling methods and the information transfer between perception and planning [7]. - Chapter 3 focuses on the one-stage end-to-end algorithm framework, highlighting its advantages in information transmission and introducing various one-stage framework solutions [8]. - Chapter 4 discusses the application of navigation information in autonomous driving, detailing the formats and encoding methods of navigation maps [9]. - Chapter 5 introduces reinforcement learning algorithms, emphasizing the need for these methods to complement imitation learning in autonomous driving [10]. - Chapter 6 involves practical projects on trajectory output optimization, combining imitation learning and reinforcement learning techniques [11]. - Chapter 7 presents fallback solutions through spatiotemporal planning, focusing on trajectory smoothing algorithms to enhance output reliability [12]. - Chapter 8 shares mass production experiences, analyzing how to effectively use tools and strategies to improve system capabilities [13]. Group 3: Target Audience and Requirements - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, though those with weaker backgrounds can still participate [14][15]. - Participants are required to have access to a GPU with recommended specifications and familiarity with various algorithms and programming languages [15].