AI智能涌现新阶段-智驾VLA与世界模型之争

Summary of Conference Call Records Industry Overview - The conference call discusses the evolution of intelligent driving paradigms, transitioning from "rules + maps" to "VLA (Vision-Language-Action) + world models" with significant advancements expected post-2025, particularly with the introduction of cost-effective reasoning models like Deepseek [1][3][4]. Key Points and Arguments Technological Advancements - The parameter scale of models is increasing, with vehicle-side models reaching tens of billions and cloud-side models approaching hundreds of billions. Xiaopeng's second-generation VLA has achieved a 33% reduction in prediction error through a 32-fold ultra-dense visual reasoning chain [1][12]. - The training paradigm is shifting from imitation learning to a combination of "pre-training + SFT (Supervised Fine-Tuning) + reinforcement learning," which enhances reasoning capabilities and addresses risk asymmetry in emergency scenarios [1][8]. Industry Dynamics - The competitive landscape is characterized by a divergence in technical paths: Huawei and NIO focus on "cloud-based world engines + vehicle-side action models," while Xiaopeng and Li Auto emphasize the VOA route, integrating LLMs (Large Language Models) into their algorithms to improve generalization in long-tail scenarios [1][2][12]. - The introduction of L2 strong standards is anticipated in Q2 2026, with external catalysts such as Tesla's Cybercab mass production and FSD (Full Self-Driving) entering China, indicating a nearing commercial breakthrough for L3/L4 [1][13]. Model Development and Training - The evolution of general AI models since 2017 has been marked by significant milestones, including the introduction of the Transformer architecture and the integration of multimodal capabilities, leading to enhanced reasoning abilities [4][5]. - The scaling law emphasizes the critical role of model size, data, and computational power in enhancing capabilities, which is also applicable to intelligent driving models [4][6]. Future Projections - By 2026, key players are expected to focus on VLA-type large models, with significant advancements in the integration of visual, language, and action components within a unified framework [9][10][12]. - The world model's role is to simulate and predict future states of the physical environment, enhancing the vehicle's ability to anticipate and respond to complex scenarios [11][12]. Additional Important Insights - The transition from traditional end-to-end systems to VLA and world models is driven by the need for better understanding of physical laws and improved decision-making capabilities in complex environments [7][10]. - The industry is witnessing a shift towards more integrated models that combine perception, reasoning, and action generation, with a focus on enhancing the interpretability and robustness of outputs [10][11]. - Key players are diversifying their strategies, with Xiaopeng focusing on enhancing driving experience through its second-generation VOA, while Huawei and NIO are leaning towards world model approaches [12][13]. Investment Focus - Investment opportunities are concentrated in areas such as LiDAR technology (e.g., Hesai), high-level autonomous driving chip localization (e.g., Horizon Robotics), and the commercialization of Robotaxi services (e.g., Pony.ai, WeRide) [2][13].