Workflow
强化学习
icon
Search documents
缺数据也能拿SOTA?清华&上海AI Lab破解机器人RL两大瓶颈
量子位· 2025-09-26 02:08
Core Viewpoint - The article discusses the development of SimpleVLA-RL, an end-to-end online training solution for Visual-Language-Action (VLA) models, aimed at enhancing the flexibility and performance of robots in complex environments while addressing existing training bottlenecks [3][12]. Group 1: Key Challenges in Existing Training Paradigms - Current training paradigms face significant challenges, including high data collection costs and insufficient generalization capabilities [2][8]. - The reliance on large-scale, high-quality robot operation trajectories limits scalability and increases costs, making data acquisition a major hurdle [8]. - The models struggle with generalization, particularly in out-of-distribution tasks and new environments, leading to performance drops in long-sequence dependencies and combinatorial tasks [8][9]. Group 2: SimpleVLA-RL Framework - SimpleVLA-RL employs a combination of interactive trajectory sampling, result-based rewards, and enhanced exploration to tackle the three core challenges of VLA model training [5][6]. - The framework demonstrates state-of-the-art (SoTA) performance in standard benchmarks like LIBERO and RoboTwin, achieving significant improvements even with limited data [5][21]. - In scenarios with single demonstration data, the average success rate in LIBERO increased from 48.9% to 96.9% after applying SimpleVLA-RL [5]. Group 3: Performance Metrics and Results - SimpleVLA-RL achieved an average success rate of 99.1% in LIBERO, with long-sequence tasks improving by 12.0 percentage points [21]. - In RoboTwin1.0, the average success rate rose from 39.8% to 70.4%, with specific tasks like "Blocks Stack" improving by 33.1 percentage points [23]. - The framework also demonstrated a significant increase in performance in RoboTwin2.0, with average success rates improving from 38.3% to 68.8% [25]. Group 4: Innovations and Discoveries - The training process led to the emergence of new operational strategies, such as the "Pushcut" phenomenon, where the model autonomously discovers more efficient methods beyond human demonstrations [10][31]. - This phenomenon indicates that reinforcement learning can enable VLA models to surpass the limitations of human demonstration patterns, paving the way for future adaptive VLA model development [31].
从现有主流 RL 库来聊聊RL Infra架构演进
自动驾驶之心· 2025-09-25 23:33
Core Viewpoint - Reinforcement Learning (RL) is transitioning from a supportive technology to a core driver of model capabilities, focusing on multi-step, interactive agent training to achieve General Artificial Intelligence (AGI) [2][6]. Group 1: Modern RL Infrastructure Architecture - The core components of modern RL infrastructure include a Generator, which interacts with the environment to generate trajectories and calculate rewards, and a Trainer, which updates model parameters based on trajectory data [6][4]. - The generator-trainer architecture, combined with distributed coordination layers like Ray, forms the "gold standard" for RL systems [6][4]. Group 2: Primary Development - Primary Development frameworks serve as foundational frameworks for building RL training pipelines, providing core algorithm implementations and integration with underlying training/inference engines [8][7]. - TRL (Transformer Reinforcement Learning) is a user-friendly RL framework launched by Hugging Face, offering various algorithm supports [9][10]. - OpenRLHF, developed by a collaborative team including ByteDance and NetEase, aims to provide an efficient and scalable RLHF and Agentic RL framework [11][14]. - veRL, developed by Byte's Seed team, is one of the most comprehensive frameworks with extensive algorithm support [16][19]. - AReaL (Asynchronous Reinforcement Learning) is designed for large-scale, high-throughput RL training with a fully asynchronous architecture [20][21]. - NeMo-RL, launched by NVIDIA, integrates into its extensive NeMo ecosystem, focusing on production-level RL frameworks [24][28]. - ROLL, an Alibaba open-source framework, emphasizes asynchronous and Agentic capabilities for large-scale LLM RL [30][33]. - slime, developed by Tsinghua and Zhipu, is a lightweight framework focusing on seamless integration of SGLang with Megatron [34][36]. Group 3: Secondary Development - Secondary Development frameworks are built on primary frameworks, targeting specific downstream application scenarios like multi-modal, multi-agent, and GUI automation [44][3]. - Agentic RL frameworks, such as verl-agent, optimize for asynchronous rollout and training, addressing the core challenges of multi-round interactions with external environments [46][47]. - Multimodal RL frameworks, like VLM-R1 and EasyR1, focus on training visual-language reasoning models, addressing data processing and loss function design challenges [53][54]. - Multi-Agent RL frameworks, such as MARTI, integrate multi-agent reasoning and reinforcement learning for complex collaborative tasks [59][60]. Group 4: Summary and Trends - The RL infrastructure is evolving from a "workshop" model to a "standardized pipeline," with increasing modularity in framework design [65]. - Asynchronous architectures are becoming essential to address the computational asymmetry between rollout and training [66]. - The emergence of high-performance inference engines like vLLM and SGLang significantly accelerates the rollout process [66]. - The evolution from RLHF to Agentic RL reflects the growing complexity of tasks supported by new frameworks [66]. - Distributed training framework choices, such as Megatron-LM and DeepSpeed, are critical for large-scale model training [66]. - Scene-driven secondary development frameworks are addressing unique challenges in vertical domains [66]. - The importance of orchestrators for managing distributed components in RL systems is becoming widely recognized [66].
AI正在偷走白领工作,OpenAI狂砸10亿教AI上班,你的完美继任者即将上岗
3 6 Ke· 2025-09-25 09:32
Core Insights - Major AI companies like Anthropic and OpenAI are planning to invest $1 billion annually to train AI to work like humans, utilizing reinforcement learning environments and expert knowledge [1][4][21] - There are concerns that AI could eliminate a significant number of entry-level white-collar jobs within the next 1-5 years, potentially raising the unemployment rate in the U.S. to 10-20% [1][2] Investment and Development - Anthropic and OpenAI are allocating $1 billion each year for AI training, with OpenAI predicting this investment will rise to $8 billion by 2030 [4][10] - The funding aims to overcome current limitations in traditional training methods and explore new monetization avenues, such as workplace software and AI agents [4][10] AI Training Methodology - AI is being trained to handle complex tasks in various applications, including Salesforce and Zendesk, with a focus on real-world task execution [3][5] - Turing has developed over 1,000 reinforcement learning environments to simulate real-world applications for AI training [12][13] Expert Involvement - The trend is shifting towards hiring experienced professionals from various fields to provide real-world task examples for AI learning [15][20] - The cost of hiring experts is increasing, with some contracts exceeding $120 per hour, and projections suggest rates could rise to $150-$250 per hour in the next 18 months [11][10] Future Implications - As AI learns from expert knowledge and workplace applications, it is expected to gradually take over human jobs across various industries [24][21] - The integration of AI into the economy could lead to a transformation where the entire economic system operates as a reinforcement learning machine [21][1]
微信WeChat-YATT横空出世,腾讯强化学习布局剑指何方
Sou Hu Cai Jing· 2025-09-24 09:56
Core Insights - Tencent's open-sourcing of WeChat-YATT training library signifies a strategic move in the competitive landscape of AI model training, particularly as OpenAI's GPT-5 approaches release [1][2] - WeChat-YATT is designed with a focus on reinforcement learning and multimodal models, differentiating itself from mainstream frameworks like TensorFlow and PyTorch [2] Group 1: WeChat-YATT's Innovations - WeChat-YATT achieves significant breakthroughs in three areas: optimized parameter update efficiency for reinforcement learning, flexible multimodal data fusion interfaces, and a modular design that lowers the barriers for distributed training [2][4] - The library's emphasis on "ease of extensibility" reflects Tencent's recognition of the need for rapid iteration in large model training [4] Group 2: Competitive Positioning - Compared to Meta's PyTorch, WeChat-YATT excels in reinforcement learning support; against Google's JAX, it shows advantages in Chinese language scenarios and multimodal processing [4] - WeChat-YATT's deep integration with the WeChat ecosystem sets it apart from similar reinforcement learning frameworks like Ray RLlib [4] Group 3: Strategic Implications - The release of WeChat-YATT aligns with Tencent's broader AI strategy, which includes trademark applications for "WeChat AI Service Platform" and the deployment of the mixed Yuan model in business scenarios [7] - Tencent aims to create a closed-loop AI ecosystem through foundational technology breakthroughs and application deployment, with WeChat-YATT serving as a critical component in this strategy [7] - The focus on reinforcement learning indicates Tencent's commitment to key areas such as gaming, recommendation systems, and autonomous driving, positioning itself for future AI applications [7] Group 4: Long-term Vision - The naming of WeChat-YATT, "Yet Another Transformer Trainer," reflects both a sense of humor and Tencent's long-term investment in AI infrastructure [6] - The competition in the era of large models is fundamentally a competition for infrastructure, with WeChat-YATT representing a piece of Tencent's broader AI blueprint [7]
寻找你的AI同频搭子|「锦秋小饭桌」活动上新
锦秋集· 2025-09-23 09:44
AI Agent@深圳 09.26 报名方式: 扫描海报二维码,关注"锦秋集"公众号,后台回复 "锦秋小饭桌" ,即刻报名! 立秋季节,最适合找搭子一起,边贴秋膘、边聊技术创新。 吃进去的是美味,聊出来的是灵感。 今天上新三场 9-10 月活动,如有戳中你的兴趣,或是你正在从事的方向,欢迎报名加入! 具身智能@北京 10.10 机器人派对@深圳 10.17 「锦秋小饭桌」 Vol.32 深圳 AI Agent 专场 TIME 2025.09 .26 18:30 ADD 深圳 吃饱了 咱们一起 去改变世界! 扫码报名 解锁餐桌暗号 吃饱了 上白 咱们一起 饭泉 秋 去改变世界! r均11 1 k = 11 1 22 ' '坤'人小以示] VU.JJ 具身智能系列专场 TIME - 2025.10.10 18: 30 ADD 北京 三元桥 扫码关注"锦秋集" 后台回复"锦秋小饭桌"即刻报名 吃饱了 咱们一起 去改变世界! 地 锦 秋 X A 精 小 饭 酸 馆 桌 「锦秋小饭桌」 Vol.34 深圳机器人派对 TIME 2025.10 .17 18:00 ADD 深圳 系——欢迎点开往期回顾,找到你感兴趣的话题! ...
进击新能源第一阵营,“增程豪华轿车新标杆”别克至境L7全国首秀
Core Insights - The Buick Zhijing L7, a luxury electric sedan, has been unveiled as the flagship model of Buick's high-end electric sub-brand, showcasing advanced technology and luxury features [1][3][21] Group 1: Product Features - The Zhijing L7 is built on the new Buick "Xiaoyao" super fusion architecture, integrating top technologies in driving, assisted driving, and luxury comfort [3][5] - It features the "Zhenlong" range extender system, which offers a maximum power output of 252 kW, equivalent to a 3.0T V6 engine, with a 0-100 km/h acceleration time of just 5.9 seconds [5][8] - The vehicle boasts a pure electric range of 302 km and a total range of 1420 km, addressing common concerns about electric vehicle range [5][8] - The Zhijing L7 is equipped with a high-performance battery that supports a lifespan of 640,000 km with low degradation, ensuring safety and longevity [8] Group 2: Intelligent Features - The Zhijing L7 introduces the "Xiaoyao Zhixing" assisted driving system, featuring the Momenta R6 flywheel model based on end-to-end reinforcement learning, providing comprehensive driving assistance [9][11] - It includes a 50-inch panoramic AR-HUD head-up display and a 15.6-inch smart central control screen, enhancing user interaction and information display [11][16] - The vehicle's intelligent cockpit is powered by Qualcomm's latest SA8775P chip, delivering high computational power for various smart driving scenarios [13][11] Group 3: Luxury and Comfort - The Zhijing L7 features a spacious interior with dimensions of 5032mm x 1952mm x 1500mm and a wheelbase of 3000mm, reflecting its status as a luxury sedan [14][19] - The interior design incorporates high-quality materials and advanced sound insulation, creating a serene and luxurious atmosphere [15][19] - It offers unique seating configurations, including the industry's first dual 120° zero-gravity seats for enhanced comfort [19][21] Group 4: Market Positioning - The Zhijing L7 aims to redefine luxury standards in the electric vehicle market, combining advanced range extender technology with top-tier intelligent features and luxury experiences [21] - The vehicle is positioned to compete in the high-end electric vehicle segment, leveraging Buick's heritage and innovative capabilities to attract consumers [21]
Nvidia砸千亿美元助力OpenAI,马斯克狂飙造全球最大AI集群 | Jinqiu Select
锦秋集· 2025-09-23 04:44
当基础能力持续进步时,创业的关键在于找到新的应用场景和差异化路径。也许是某个高频的行业环节,也许是某种全新的交互方式,也可能是模型与硬件、人与人 的结合。初创公司同样需要在自己的条件下,找到独特而极致的打法。 今天,AI领域迎来一桩惊天动地的消息:Nvidia 宣布将向 OpenAI 投入高达 1000 亿美元 的战略投资,携手打造至少 10 吉瓦(gigawatts)的数据中心基础设施,用于 支撑下一代模型的训练与部署。 这一动作,标志着模型层玩家的AI 战争从算法、产品层面,真正迈入了"基础设施+算力"的硬核较量阶段。 与此同时,另一边的 Elon Musk 正以一种近乎"超现实"的速度布局算力版图:xAI 正在孟菲斯、密西西比等地加速建设 Colossus 系列 AI 集群,目标是在最短时间内实 现数百兆瓦甚至接近吉瓦级别的集群能力。电站、涡轮机、跨州供电 ──这些支撑算力的根基,都在高强度投入中被快速铺设。 模型层大玩家依然在坚定地押注模型,资本、算力、速度也成为顶级玩家已经形成了难以撼动的护城河。 对大量非模型层的AI创业者来说,这无疑都是好消息。 无论是OpenAI的超大规模训练,还是xAI的集群 ...
具身智能之心近20个交流群来啦!欢迎加入
具身智能之心· 2025-09-23 04:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence technology, inviting participation from various subfields [1] - The group covers nearly 20 sub-directions, including humanoid robots, quadrupeds, robotic arms, and areas such as vla, large models, vln, reinforcement learning, mobile operation, multimodal perception, simulation, and data collection [1] - The invitation encourages collaboration and discussion on technology and industry developments among participants [1]
灵巧手厂商困在夹缝里
投资界· 2025-09-23 02:32
以下文章来源于AI科技评论 ,作者丁莉 AI科技评论 . 雷峰网旗下AI新媒体。聚焦AI前沿研究,关注AI工程落地。 价格战过早升级。 作者 | 丁莉 编辑 | 陈彩娴 来源 I AI科技评论 (ID:aitechtalk) "关于灵巧手,你可以认为所有 d emo 都是假的。一切都是过拟合的结果,自主完成任务 的能力基本不存在。从业者和非从业者对技术进展的认知差距过大,需要一些可视化的 东西来弥合这种鸿沟。"一位业内人士告诉AI科技评论。 这一说法后来得到了多方认同。放眼刚刚过去的 WAIC 和 WRC 两个大会,预编程仍是 主流。 (目前已发布灵巧手产品的公司,AI 科技评论整理) 上下游夹击,押注三大方向 具身智能的聚光灯依旧灼目,灵巧手已经被推到了台前。 这已经是共识。随着机器人操作能力成为焦点,灵巧手日益被提上日程。这个赛道从阒 无人迹到人满为患只用了短短半年多时间,还有大批玩家在持续涌入中。AI科技评论梳 今年以来,具身智能的焦点突然从本体延伸至灵巧手——上游零部件、下游本体纷纷下 场,灵巧手初创公司遭受两面夹击。 投资者也多方下注,主要押注三个特征:最AI、最像人手、最早量产。 但智能不足仍是最 ...
放榜了!NeurIPS 2025论文汇总(自动驾驶/大模型/具身/RL等)
自动驾驶之心· 2025-09-22 23:34
Core Insights - The article discusses the recent announcements from NeurIPS 2025, focusing on advancements in autonomous driving, visual perception reasoning, large model training, embodied intelligence, reinforcement learning, video understanding, and code generation [1]. Autonomous Driving - The article highlights various research papers related to autonomous driving, including "FutureSightDrive" and "AutoVLA," which explore visual reasoning and end-to-end driving models [2][4]. - A collection of papers and codes from institutions like Alibaba, UCLA, and Tsinghua University is provided, showcasing the latest developments in the field [6][7][13]. Visual Perception Reasoning - The article mentions "SURDS," which benchmarks spatial understanding and reasoning in driving scenarios using vision-language models [11]. - It also references "OmniSegmentor," a flexible multi-modal learning framework for semantic segmentation [16]. Large Model Training - The article discusses advancements in large model training, including papers on scaling offline reinforcement learning and fine-tuning techniques [40][42]. - It emphasizes the importance of adaptive methods for improving model performance in various applications [44]. Embodied Intelligence - Research on embodied intelligence is highlighted, including "Self-Improving Embodied Foundation Models" and "ForceVLA," which enhance models for contact-rich manipulation [46][48]. Video Understanding - The article covers advancements in video understanding, particularly through the "PixFoundation 2.0" project, which investigates the use of motion in visual grounding [28][29]. Code Generation - The article mentions developments in code generation, including "Fast and Fluent Diffusion Language Models" and "Step-By-Step Coding for Improving Mathematical Olympiad Performance" [60].