Workflow
强化学习大模型(R6)
icon
Search documents
Momenta曹旭东谈“R6强化学习大模型”:将超越人类驾驶水平
Xin Lang Cai Jing· 2025-12-24 09:46
Core Insights - The CEO of Momenta, Cao Xudong, introduced the evolution of technology to the sixth generation, termed "Reinforcement Learning Large Models" [1][4]. Group 1: Technology Evolution - The fifth generation of technology is based on imitation learning, which mimics human behavior and has a performance ceiling close to human levels. This process is likened to a student's progression through education with guidance from teachers, making it difficult to surpass them [3][6]. - Reinforcement learning, on the other hand, allows for exploration through practice, where success is rewarded and failure is penalized. This method enables the discovery of better driving behaviors, potentially reaching or exceeding human capabilities [3][6]. Group 2: Production and Experience - Momenta has already mass-produced over 500,000 vehicles. This scale implies that 10 million vehicles could collectively drive a distance of 100 billion kilometers annually, which is 100,000 times the lifetime driving experience of a human [3][6]. - The vehicles can encounter scenarios in a cloud-based training environment up to 100,000 times. Initially, they may struggle, but after 1,000 to 10,000 encounters, they become highly proficient, and by 100,000 encounters, they develop intuitive driving skills, identifying optimal strategies for challenging scenarios to ensure the safest and most efficient driving [3][6].