Workflow
强化学习
icon
Search documents
抢先报名!MEET2026最新嘉宾阵容官宣,一起热聊AI
量子位· 2025-11-24 03:39
Core Viewpoint - The article emphasizes the transformative impact of artificial intelligence (AI) on various industries and society as a whole, highlighting the upcoming MEET2026 conference as a platform to explore these advancements and trends in AI technology [1][3]. Group 1: Conference Overview - The MEET2026 Intelligent Future Conference will focus on cutting-edge technologies and industry developments, particularly in AI [2]. - The theme of the conference is "Symbiosis Without Boundaries, Intelligence to Ignite the Future," aiming to explore how AI transcends industry, discipline, and scenario boundaries [3]. - Key topics of discussion will include reinforcement learning, multimodal AI, chip computing power, AI applications in various industries, and AI's global expansion [4]. Group 2: Notable Speakers - The conference will feature prominent figures such as Zhang Yaqin, a leading scientist in digital video and AI, and former president of Baidu [12][13]. - Sun Maosong, Executive Vice President of the Tsinghua University AI Research Institute, will also be a key speaker, known for his leadership in national research projects [17]. - Other notable speakers include Wang Zhongyuan, Director of the Beijing Academy of Artificial Intelligence, and He Xiaodong, Senior Vice President of JD Group, who has extensive experience in multimodal intelligence [21][30]. Group 3: AI Trends and Reports - The conference will unveil the "Artificial Intelligence Annual List" and the "Annual AI Trend Report," which are anticipated to provide insights into the most influential companies, products, and individuals in the AI sector [6][102]. - The 2025 AI Annual List will evaluate candidates across three dimensions: companies, products, and individuals, with results announced at the conference [103]. - The 2025 Annual AI Top Ten Trends Report will analyze significant AI trends based on technological maturity, current applications, and potential value, highlighting representative organizations and best cases [104]. Group 4: Event Details - The MEET2026 conference is scheduled for December 10, 2025, at the Beijing Jinmao Renaissance Hotel, with registration now open [105]. - The event is recognized as a significant technology business summit, attracting thousands of industry professionals and millions of online viewers each year [107].
端到端量产这件「小事」,做过的人才知道有多痛
自动驾驶之心· 2025-11-24 00:03
Core Insights - The article emphasizes the growing demand for end-to-end production talent in the automotive industry, highlighting a paradox where job seekers are abundant, yet companies struggle to find qualified candidates [1][3]. Course Overview - A newly designed end-to-end production course aims to address the skills gap in the industry, focusing on practical applications and real-world scenarios over three months [3][5]. - The course covers essential algorithms such as one-stage and two-stage end-to-end frameworks, reinforcement learning applications, and trajectory optimization techniques [5][10]. Course Content - **Chapter 1: Overview of End-to-End Tasks** - Discusses the integration of perception tasks and the learning-based control algorithms that are becoming mainstream in autonomous driving [10]. - **Chapter 2: Two-Stage End-to-End Algorithms** - Introduces the two-stage framework, its modeling methods, and the flow of information between perception and planning [11]. - **Chapter 3: One-Stage End-to-End Algorithms** - Focuses on one-stage frameworks that allow for lossless information transfer, enhancing performance compared to two-stage methods [12]. - **Chapter 4: Application of Navigation Information** - Explains the critical role of navigation data in autonomous driving and how it can be effectively integrated into end-to-end models [13]. - **Chapter 5: Introduction to Reinforcement Learning Algorithms** - Highlights the necessity of reinforcement learning to complement imitation learning, enabling machines to generalize better [14]. - **Chapter 6: Trajectory Output Optimization** - Covers practical projects involving imitation learning and reinforcement learning algorithms for trajectory planning [15]. - **Chapter 7: Contingency Planning - Spatiotemporal Joint Planning** - Discusses post-processing logic to ensure reliable trajectory outputs, including smoothing algorithms [16]. - **Chapter 8: Experience Sharing in End-to-End Production** - Provides insights on practical strategies and tools for enhancing system capabilities in real-world applications [17]. Target Audience - The course is designed for advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [18][19]. Course Schedule - The course is set to begin on November 30, with a structured timeline for unlocking chapters and providing support through offline videos and online Q&A sessions [20].
理想提出首个包含自车和他车轨迹的世界模型
理想TOP2· 2025-11-23 11:56
理想的世界模型包含自车和其他车的轨迹,这是理想首次提出的。 做这件事目的是为了能够让理想VLA在仿真环境里进行强化学习,同一个场景可以不断测试更优的轨迹路线,这是真实数据完全无法实现的。 可视化见下面这个视频: 理想VLA训练过程: 预训练阶段是在云端训一个32B的VL基座模型,包含3D视觉、比开源模型清晰度提升3-5倍的高清2D视觉、驾驶相关的language的语料,关键的 VL联合语料(如导航信息与人类判断的同步记录),为适配车端算力并保证推理速度,云端大模型蒸馏成3.2B的MoE模型。 后训练阶段是将action引入模型,使其转化为VLA,参数量接近4B,采用短链条CoT,限制在2-3步以内,再用difusion,对未来4-8秒的轨迹和 环境进行预测。 强化学习阶段为两部分,一是人类反馈强化学习,二是不依赖人类反馈,利用世界模型模型生成数据进行纯强化学习训练,基于舒适性(G值)、 无碰撞、遵守交规三大指标自我进化,目标是驾驶水平超越人类。 2025年3月12日理想发布 Other Vehicle Trajectories Are Also Needed: A Driving World Model Un ...
雷军 :辅助驾驶不是自动驾驶,驾驶时仍需时刻保持专注
Sou Hu Cai Jing· 2025-11-23 08:56
11月23日,雷军发文总结小米端到端辅助驾驶HAD增强版的升级点。纵向加减速更舒适,旁车加塞时 可提前预判减速,及时跟车提速,行车更舒适安全。横向变道更丝滑,在变道并线、借道绕行时表现更 自然流畅。路况理解能力提升,在多车道复杂大路口能提前看懂导航信息,优化走对路、选对道的能 力。 此外,雷军还强调,辅助驾驶不是自动驾驶,驾驶时仍需时刻保持专注。此前在11月21日2025广州车展 开幕日,小米汽车端到端辅助驾驶"Xiaomi HAD增强版"正式发布,其在1000万Clips版本基础上引入"强 化学习"与"世界模型",AEB防碰撞辅助升级,新增紧急转向辅助。 ...
雷军提醒:辅助驾驶不是自动驾驶,驾驶时仍需时刻保持专注
Sou Hu Cai Jing· 2025-11-23 06:25
IT之家 11 月 23 日消息,小米创办人、董事长兼 CEO 雷军今日发文,总结了小米端到端辅助驾驶 HAD 增强版的升级点。 纵向加减速更舒适,旁车加塞时提前预判减速,及时跟车提速行车更舒适安全。 横向变道更丝滑,变道并线、借道绕行时,更丝滑、不犹豫。 路况理解更充分,多车道的复杂大路口,提前看懂导航信息,优化走对路、选对道能力。 雷军也再次提醒:辅助驾驶不是自动驾驶,驾驶时仍需时刻保持专注。 据IT之家此前报道,在 11 月 21 日 2025 广州车展开幕日当天,小米汽车端到端辅助驾驶"Xiaomi HAD 增强版"正式发布,其在 1000 万 Clips 版本的基础上 引入了「强化学习」与「世界模型」,同时 AEB 防碰撞辅助升级,并新增紧急转向辅助。 车道保持辅助 - 预警 车道保持辅助 - 纠偏 紧急车道保持 盲区监测预警 车门开启预警 变道辅助预警 其他安全能力 超速告警 红绿灯提醒 自适应防眩目矩阵3 辅助驾驶不是自动驾驶,驾驶仍需时刻保持 侧向安全能力 ...
理想2025广州车展视频版/图文压缩版
理想TOP2· 2025-11-21 04:22
Core Insights - The article emphasizes the ideal of living authentically and aligning with personal values, particularly in the context of driving standards and experiences [1] Group 1: Performance Metrics - In two months, VLA achieved a mileage of 312 million kilometers, with a penetration rate increase of 2.2 times and daily active users increasing threefold, including over 5,000 users driving 1,000 kilometers in a single day and 520,000 AD Max users [3] Group 2: Technological Advancements - The article discusses the transition from pre-reinforcement learning (blue) to post-reinforcement learning (green), indicating that new capabilities and features are currently in tight internal testing, with a rollout expected soon [6] - The company plans to automate all steps of charging at its stations, except for plugging in the vehicle, with 1,400, 2,400, and 2,900 stations expected to achieve this capability in January, February, and March of 2026, respectively [6] Group 3: Safety Features - The system has avoided potential collision incidents 11.32 million times and has cumulatively prevented 14,034 extreme accidents, with 2.08 million nighttime proactive risk avoidance actions [9] - New features include defensive acceleration maneuvers and a comprehensive 360-degree AES capability, enhancing safety against various driving threats [9] Group 4: Future Developments - Upcoming OTA content is anticipated to enhance user experience and vehicle functionality [11] - A new city NOA feature will soon be pushed to users of the revamped AD Pro [13]
小米HAD增强版辅助驾驶发布:引入强化学习与世界模型,AES紧急转向功能上车
Feng Huang Wang· 2025-11-21 02:33
Core Insights - Xiaomi Auto officially launched the Xiaomi HAD Enhanced Version at the Guangzhou Auto Show, showcasing advancements in smart driving technology and talent acquisition in the AI field [1] - The company plans to invest over 7 billion yuan in AI research and development by 2025, with a current team of 1,800 experts, including 108 PhDs [1] Technical Developments - The new Xiaomi HAD Enhanced Version is built on a foundation of 10 million clips and incorporates reinforcement learning algorithms and world models to enhance driving performance [1] - The world model technology allows the system to simulate various scenarios, including extreme weather and complex road conditions, transitioning from a "rule-driven" to a "learning-driven" approach [1] User Experience Enhancements - The updated version focuses on optimizing longitudinal and lateral control experiences, particularly in scenarios like lane merging, reducing unnecessary deceleration and hard braking [2] - Significant upgrades to active safety features include the introduction of the AES emergency steering assist function, which can automatically change lanes to avoid collisions at speeds between 80 km/h and 135 km/h [2] Safety Features Expansion - The forward AEB (Automatic Emergency Braking) range has been expanded to 1 km/h to 135 km/h, with new capabilities to recognize various obstacles [2] - The backward AEB covers reversing scenarios from 1 km/h to 30 km/h, with a focus on balancing sensitivity to ensure accurate stopping while minimizing false triggers [2] Software Updates - The driving updates will be included in the Xiaomi HyperOS 1.11.0 version, with rollout times varying by model due to review progress [2]
工业界算法专家带队!面向落地的端到端自动驾驶小班课
自动驾驶之心· 2025-11-21 00:04
Core Insights - The article emphasizes the importance of end-to-end production in the automotive industry, highlighting the scarcity of qualified talent in this area [1][3] - A newly designed advanced course on end-to-end production has been developed to address the industry's needs, focusing on practical applications and real-world scenarios [3][5] Course Overview - The course covers essential algorithms such as one-stage and two-stage end-to-end frameworks, reinforcement learning applications, and trajectory optimization techniques [5][10] - It aims to provide hands-on experience and insights into production challenges, making it suitable for individuals looking to advance or transition in their careers [5][18] Course Structure - Chapter 1 introduces the overview of end-to-end tasks, focusing on the integration of perception and control algorithms [10] - Chapter 2 discusses the two-stage end-to-end algorithm framework, including its modeling and information transfer methods [11] - Chapter 3 covers the one-stage end-to-end algorithm framework, emphasizing its advantages in information transmission [12] - Chapter 4 focuses on the application of navigation information in autonomous driving, detailing map formats and encoding methods [13] - Chapter 5 introduces reinforcement learning algorithms, highlighting their necessity alongside imitation learning [14] - Chapter 6 provides practical experience in trajectory output optimization, combining imitation and reinforcement learning [15] - Chapter 7 discusses fallback strategies for trajectory smoothing and reliability in production [16] - Chapter 8 shares production experiences from various perspectives, including data and model optimization [17] Target Audience - The course is designed for advanced learners with a foundational understanding of autonomous driving algorithms, reinforcement learning, and programming skills [18][19] Course Logistics - The course starts on November 30 and spans three months, featuring offline video lectures and online Q&A sessions [20]
基于准确的原始材料对比小鹏理想VLA
理想TOP2· 2025-11-20 10:42
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the VLA (Vision-Language-Action) architecture developed by Li Auto and the insights shared by Xiaopeng's autonomous driving head, Liu Xianming, during a podcast. Liu emphasizes the removal of the intermediate language component (L) to enhance scalability and efficiency in data usage [1][4][5]. Summary by Sections VLA Architecture and Training Process - The VLA architecture involves a pre-training phase using a 32 billion parameter (32B) vision-language model that incorporates 3D vision and high-definition 2D vision, improving clarity by 3-5 times compared to open-source models. It also includes driving-related language data and key VL joint data [10][11]. - The model is distilled into a 3.2 billion parameter (3.2B) MoE model to ensure fast inference on vehicle hardware, followed by a post-training phase that integrates action to form the VLA, increasing the parameter count to nearly 4 billion [13][12]. - The reinforcement learning phase consists of two parts: human feedback reinforcement learning (RLHF) and pure reinforcement learning using world model-generated data, focusing on comfort, collision avoidance, and adherence to traffic regulations [15][16]. Data Utilization and Efficiency - Liu argues that using language as a supervisory signal can introduce human biases, reducing data efficiency and scalability. The most challenging data to collect are corner cases, which are crucial for training [4][6]. - The architecture aims to achieve a high level of generalization, with plans to implement L4 robotaxi services in Guangzhou based on the current framework [4][5]. Future Directions and Challenges - Liu acknowledges the uncertainties in scaling the technology and ensuring safety, questioning how to maintain safety standards and align the model with human behavior [5][18]. - The conversation highlights that the VLA, VLM, and world model are fundamentally end-to-end architectures, with various companies working on similar concepts in the realm of Physical AI [5][18]. Human-Agent Interaction - The driver agent is designed to process short commands directly, while complex instructions are sent to the cloud for processing before execution. This approach allows the system to understand and interact with the physical world like a human driver [17][18]. - The article concludes that the traffic domain is a suitable environment for VLA implementation due to its defined rules and the ability to model human driving behavior effectively [19][20].
“最强具身VLA大模型”,究竟强在哪儿?
3 6 Ke· 2025-11-20 07:38
Core Insights - The core contribution of the π*0.6 model lies in its introduction of a more intuitive learning method called RECAP, which allows robots to learn from their mistakes rather than merely imitating correct actions [3][8][24] - The model demonstrates a high success rate of over 90% in tasks such as making espresso, folding clothes, and assembling packaging boxes, showcasing its practical capabilities [1][20] Group 1: RECAP Methodology - RECAP consists of three main phases: offline reinforcement learning (RL) using diverse demonstration data, fine-tuning with human guidance, and online execution where robots learn from sparse rewards and expert corrections [10][20] - The methodology leverages a value function to evaluate actions and an advantage-conditioned strategy to update policies, allowing for efficient learning from both successful and unsuccessful experiences [13][16][42] Group 2: Model Architecture and Performance - The π*0.6 model builds upon previous versions, expanding its backbone from Gemma (2.6 billion parameters) to Gemma3 (4 billion parameters), and increasing Action Expert parameters to 860 million [20] - In challenging tasks, RECAP has doubled the throughput (successful task completions per hour) and reduced failure rates by approximately 50% compared to models that only utilized supervised fine-tuning [20] Group 3: Learning from Mistakes - The RECAP approach emphasizes the importance of learning from errors, enabling robots to recover from mistakes through expert intervention and self-correction, which is crucial for real-world applications [24][28] - By utilizing a value function to assess the quality of actions, the model can identify key steps and sources of errors, enhancing its ability to adapt and improve in complex environments [39][41]