视觉语言行动(VLA)模型

Search documents
国内外车企智驾方案对比
2025-06-23 02:09
Summary of Key Points from Conference Call Records Industry Overview - The records focus on the intelligent driving technology development among various automotive companies, particularly in the context of L3 level autonomous driving solutions. [1][2] Core Insights and Arguments - Multiple automotive companies are accelerating their development of L3 level intelligent driving solutions, with Tesla having achieved a fully integrated end-to-end solution in North America, while domestic companies still utilize a modular approach. [1] - Huawei plans to launch an end-to-end solution in the second half of the year, employing a multi-sensor fusion approach that is more complex than Tesla's. [1] - The next-generation Visual Language Action (VLA) model is a key focus, expected to have parameters within 10 billion, aimed at directly outputting actions from image data and incorporating large language models to interpret complex scenarios. [1][2] - Tesla relies on a pure vision approach using eight cameras for intelligent assisted driving, while other companies like Huawei, Momenta, and Xpeng adopt multi-sensor fusion methods, which may face challenges due to long-term vibrations affecting LiDAR accuracy. [1][2] - Ideal Automotive combines VLA with an end-to-end model using two ORVIS chips for scene understanding and complex situation feedback, although the VRM model's inference speed is relatively slow. [1][3] - Most companies have abandoned high-definition maps in favor of purchasing maps with precision between high-definition and traditional navigation maps. Tesla leads in generating technology, simulating multi-view cameras and actively annotating semantic information for subsequent training. [1][7] Additional Important Content - The competitive landscape shows Huawei, Xpeng, Ideal, and Momenta in the leading tier, with significant advancements made by these companies in response to Tesla's innovations. [2] - Ideal Automotive faces increased competition in the extended-range vehicle market but maintains a competitive base. It is projected that total sales of new energy vehicles will reach 1.48 million units in 2025, with an expected market share of 14% for extended-range vehicles. Ideal's annual sales are anticipated to exceed 500,000 units. [2][12] - Xpeng plans to integrate self-developed Turing chips in its G7 top models to reduce costs and reliance on Nvidia. [2][10] - The VLA model's overall parameter count is expected to be within 10 billion, significantly advancing vehicle control unit development. [6] - The world generation technology is currently led by Tesla, which can simulate seven perspective cameras and actively annotate semantic information, aiding subsequent training. [11] Conclusion - The automotive industry is rapidly evolving towards more sophisticated intelligent driving solutions, with significant competition among leading companies. The advancements in technology, particularly in sensor fusion and model development, are crucial for maintaining market competitiveness.
自动驾驶端到端VLA落地,算法如何设计?
自动驾驶之心· 2025-06-22 14:09
Core Insights - The article discusses the rapid advancements in end-to-end autonomous driving, particularly focusing on Vision-Language-Action (VLA) models and their applications in the industry [2][3]. Group 1: VLA Model Developments - The introduction of AutoVLA, a new VLA model that integrates reasoning and action generation for end-to-end autonomous driving, shows promising results in semantic reasoning and trajectory planning [3][4]. - ReCogDrive, another VLA model, addresses performance issues in rare and long-tail scenarios by utilizing a three-stage training framework that combines visual language models with diffusion planners [7][9]. - Impromptu VLA introduces a dataset aimed at improving VLA models' performance in unstructured extreme conditions, demonstrating significant performance improvements in established benchmarks [14][24]. Group 2: Experimental Results - AutoVLA achieved competitive performance metrics in various scenarios, with the best-of-N method reaching a PDMS score of 92.12, indicating its effectiveness in planning and execution [5]. - ReCogDrive set a new state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, showcasing its robustness and safety in driving trajectories [9][10]. - The OpenDriveVLA model demonstrated superior results in open-loop trajectory planning and driving-related question-answering tasks, outperforming previous methods on the nuScenes dataset [28][32]. Group 3: Industry Trends - The article highlights a trend among major automotive manufacturers, such as Li Auto, Xiaomi, and XPeng, to invest heavily in VLA model research and development, indicating a competitive landscape in autonomous driving technology [2][3]. - The integration of large language models (LLMs) with VLA frameworks is becoming a focal point for enhancing decision-making capabilities in autonomous vehicles, as seen in models like ORION and VLM-RL [33][39].