VLA
Search documents
特斯拉世界模拟器亮相ICCV,VP亲自解密端到端自动驾驶技术路线
3 6 Ke· 2025-10-27 08:11
Core Insights - Tesla has unveiled a world simulator for generating realistic driving scenarios, which was presented by Ashok Elluswamy at the ICCV conference, emphasizing the future of intelligent driving lies in end-to-end AI [1][5][24] Group 1: World Simulator Features - The world simulator can create new challenging scenarios for autonomous driving tasks, such as vehicles suddenly changing lanes or AI navigating around pedestrians and obstacles [2] - The generated scenario videos serve dual purposes: training autonomous driving models and providing a gaming experience for human users [2][4] Group 2: End-to-End AI Approach - Elluswamy highlighted that end-to-end AI is the future of autonomous driving, utilizing data from various sensors to generate control commands for vehicles [5][8] - The end-to-end approach is contrasted with modular systems, which are easier to develop initially but lack the optimization and scalability of end-to-end systems [8][10] Group 3: Challenges and Solutions - One major challenge for end-to-end autonomous driving is evaluation, which the world simulator addresses by using a vast dataset to synthesize future states based on current conditions [11] - The complexity of real-world data, such as high frame rates and multiple sensor inputs, leads to a "curse of dimensionality," which Tesla mitigates by collecting extensive driving data to enhance model generalization [13][15] Group 4: Industry Perspectives - The industry is divided between two main approaches to end-to-end autonomous driving: VLA (Vision-Language-Action) and world models, with various companies adopting different strategies [24] - Tesla's choice of the end-to-end approach has garnered attention due to its historical success in the autonomous driving space, raising questions about the future direction of the technology [24]
特斯拉世界模拟器亮相ICCV!VP亲自解密端到端自动驾驶技术路线
量子位· 2025-10-27 05:37
Core Viewpoint - Tesla has unveiled a world simulator for autonomous driving, showcasing its potential to generate realistic driving scenarios and enhance the training of AI models for self-driving technology [1][4][12]. Group 1: World Simulator Features - The simulator can create new challenging scenarios for autonomous driving tasks, such as unexpected lane changes by other vehicles [4][5]. - It allows AI to perform driving tasks in existing scenarios, avoiding pedestrians and obstacles [7][9]. - The generated scenario videos can also serve as a gaming experience for human users [9]. Group 2: End-to-End AI Approach - Tesla's VP Ashok Elluswamy emphasized that end-to-end AI is the future of autonomous driving, applicable not only to driving but also to other intelligent scenarios like the Tesla Optimus robot [12][13][14]. - The end-to-end neural network utilizes data from various sensors to generate control commands for the vehicle, contrasting with modular systems that are easier to develop initially but less effective in the long run [17]. - The end-to-end approach allows for better optimization and handling of complex driving situations, such as navigating around obstacles [18][21]. Group 3: Challenges and Solutions - One major challenge for end-to-end autonomous driving is evaluation, which Tesla addresses with its world simulator that trains on a vast dataset [22][24]. - The simulator can also facilitate large-scale reinforcement learning, potentially surpassing human performance [24]. - Other challenges include the "curse of dimensionality," interpretability, and safety guarantees, which require processing vast amounts of data [26][27][28]. Group 4: Data Utilization - Tesla collects data equivalent to 500 years of driving every day, using a complex data engine to filter high-quality samples for training [29][30]. - This extensive data collection enhances the model's generalization capabilities to handle extreme situations [30]. Group 5: Technical Approaches in the Industry - The industry is divided between two main approaches: VLA (Vision-Language Architecture) and world models, with companies like Huawei and NIO representing the latter [38][39]. - VLA proponents argue it leverages existing internet data for better understanding, while world model advocates believe it addresses the core issues of autonomous driving [41][42]. - Tesla's approach is closely watched due to its historical success in selecting effective strategies in autonomous driving development [43][44].
正式结课!工业界大佬带队三个月搞定端到端自动驾驶
自动驾驶之心· 2025-10-27 00:03
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry is witnessing rapid development in end-to-end methods, particularly the one-stage approach exemplified by UniAD, which directly models vehicle trajectories from sensor inputs [1][3]. - There are two main paradigms in the industry: one-stage and two-stage methods, with the one-stage approach gaining traction and leading to various derivatives based on perception, world models, diffusion models, and VLA [3][5]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, focusing on cutting-edge algorithms in both one-stage and two-stage end-to-end methods, aimed at bridging academic and industrial advancements [5][15]. - The course is structured into several chapters, covering the history and evolution of end-to-end methods, background knowledge on VLA, and detailed discussions on both one-stage and two-stage approaches [9][10][12]. Group 3: Key Technologies - The course emphasizes critical technologies such as BEV perception, visual language models (VLM), diffusion models, and reinforcement learning, which are essential for mastering the latest advancements in autonomous driving [5][11][19]. - The second chapter of the course is highlighted as containing the most frequently asked technical keywords for job interviews in the next two years [10]. Group 4: Practical Applications - The course includes practical assignments, such as RLHF fine-tuning, allowing participants to apply their knowledge in real-world scenarios and understand how to build and experiment with pre-trained and reinforcement learning modules [13][19]. - The curriculum also covers various subfields of one-stage end-to-end methods, including those based on perception, world models, diffusion models, and VLA, providing a comprehensive understanding of the current landscape in autonomous driving technology [14][19].
VLA/世界模型/WA/端到端是宣传分歧, 不是技术路线分歧
理想TOP2· 2025-10-25 05:21
Core Viewpoints - Many people are unaware that there is no universally accepted definition of VLA/world model/end-to-end [1] - Leading autonomous driving companies share more commonalities in their exploration of autonomous driving than the differences portrayed online, with the core being promotional divergence rather than technical route divergence [1][2] - Language plays a significant role in autonomous driving, particularly in long reasoning, user interaction value alignment, and understanding the world [1] - Those who believe that predicting the next token is more than just a probability distribution are more likely to accept that language can understand the world [1] Group 1: VLA/World Model/End-to-End - VLA, world model, and end-to-end all require the ability to generate road video data that appears real, focusing on visual information input and ultimately controlling vehicle actions [2] - The distinction lies in the involvement of language, its depth of participation, and the architectural form it takes, with future language-related tokens potentially being LLM's text tokens or photon tokens [2] - The narrative that VLA and world models represent different technical routes is misleading, as both need to generate a world model and understand the physical world [4] Group 2: End-to-End Definitions - The definition of end-to-end is often debated, with some believing it requires a core framework where input and output are clearly defined [5] - Tesla's approach, which involves visual input and outputting trajectory rather than direct control signals, raises questions about the true nature of their end-to-end definition [5][6] - The output of precise trajectories is preferred over direct control signals, suggesting a more effective design approach [6] Group 3: Tesla's Approach and Future Directions - Tesla's historical context and style suggest that their approach to end-to-end definitions may not have a universally accepted exclusivity [7] - Long-term predictions indicate that AI model inputs and outputs may predominantly involve photons, which could significantly reduce computational loads [10] - The ideal VLA model is defined as having visual or multimodal input, language participation, and ultimately directing actions in a broad sense [11] Group 4: Understanding Language and AI Potential - There are fundamental differences in views regarding LLM, particularly concerning the understanding of predicting the next token [12] - Those who see predicting the next token as more than mere statistics are more inclined to recognize the potential of LLM and AI [12][19] - The ability to predict the next token effectively implies an understanding of the underlying reality that generates the token, which is a deeper question than it appears [18]
自动驾驶之心合伙人招募!
自动驾驶之心· 2025-10-24 16:03
Group 1 - The article announces the recruitment of 10 outstanding partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The main areas of expertise sought include large models, multimodal models, diffusion models, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, world models, closed-loop simulation, and model deployment and quantization [3] - Candidates are preferred from QS200 universities with a master's degree or higher, especially those with significant contributions to top conferences [4] Group 2 - The compensation package includes resource sharing for job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives and opportunities for entrepreneurial project collaboration [5] - Interested parties are encouraged to add WeChat for consultation, specifying "organization/company + autonomous driving cooperation inquiry" [6]
理想智驾是参考特斯拉, 不是跟随特斯拉已经有了很强的证据
理想TOP2· 2025-10-24 04:48
Core Viewpoint - The article discusses the evolution of Li Auto's autonomous driving technology from following Tesla to referencing Tesla, highlighting original innovations made by Li Auto that Tesla has not publicly addressed [2][3]. Group 1: Development Line of Li Auto's Autonomous Driving - Initially, Li Auto's autonomous driving was considered to be following Tesla, but after the introduction of VLM, it transitioned to a reference model, showcasing original innovations not mentioned by Tesla [2]. - The core innovation of Li Auto's VLA is at the DeepSeek MoE level, which is lower than the DeepSeek MLA innovation level [2]. - During the V10-11 period, it was acceptable to say Li Auto was following Tesla, but from V12 onwards, the extent of following has significantly decreased [2]. Group 2: Ashok's Presentation at ICCV 2025 - Ashok Elluswamy discussed Tesla's shift to a single, large end-to-end neural network that directly generates control actions from sensor data, eliminating explicit perception modules [4]. - The reasons for this shift include the difficulty of encoding human values into code, poor interface definitions between traditional perception, prediction, and planning, and the need for scalability to handle real-world complexities [5]. - Key challenges in learning from pixels to control include the curse of dimensionality, interpretability and safety guarantees, and evaluation [6]. Group 3: Solutions to Challenges - To address the curse of dimensionality, Tesla utilizes extensive data from its fleet and employs complex data collection methods to extract valuable corner case data [7]. - For interpretability, end-to-end models can be prompted to predict auxiliary outputs for debugging and safety assurance, with the main focus being on control actions [8]. - The evaluation challenge is addressed through a neural network closed-loop simulator that allows for comprehensive testing and performance assessment [10]. Group 4: Comparison with Li Auto - The article argues that Li Auto's prior announcements on natural language processing and 3D Gaussian representation predate Ashok's presentation, indicating that Li Auto is not merely following Tesla [13]. - The discussion highlights that Ashok's concepts lack groundbreaking ideas, suggesting that Li Auto's innovations are leading rather than following [13]. - The article also notes that Tesla's potential adoption of a VLA-based solution aligns with Li Auto's previously published architecture [16].
预见未来,《Al Car的初步畅想与探索实践》白皮书发布
Zhong Guo Qi Che Bao Wang· 2025-10-23 08:15
Core Insights - The article discusses the release of the first white paper themed "AI Car" in the automotive industry, which outlines the product definition and key technological foresight for the transition to the AI Car era [3][4]. Group 1: AI Car Definition and Key Technologies - The white paper defines AI Car as a super intelligent entity composed of multiple sub-intelligent agents, including driving, cabin, chassis, and power agents [3][4]. - It emphasizes that AI technology will fundamentally reshape the development paradigm and user experience of smart terminals [3]. - The paper identifies ten key judgments regarding the future of AI Cars, including the transformation of autonomous driving system design logic and capabilities through VLA [3][4]. Group 2: Future Directions and Strategic Implications - AI will enable the formation of a larger end-to-end system combining intelligent driving and chassis, thereby redefining the driving experience [8][9]. - The transition of power batteries towards intelligent battery systems capable of real-time perception and autonomous decision-making is highlighted [9]. - The white paper suggests that the product transformation driven by AI will alter the survival and development logic of enterprises, shifting their strategic goals from "making good cars" to "operating intelligent entities" [10][11]. Group 3: Recommendations for Enterprises - Companies are advised to define the unique personality and value proposition of their intelligent entities to rejuvenate brand identity [10]. - It is recommended that enterprises enhance their data value across the entire process and establish a cross-functional AI development team to ensure systematic research and development of AI Cars [11]. - The white paper proposes that automakers should accelerate the construction of comprehensive ecological resource integration capabilities to strengthen user engagement and create competitive barriers in the AI era [11].
特斯拉最新技术分享,FSD核心架构曝光了
3 6 Ke· 2025-10-22 08:00
Core Insights - Tesla has publicly shared its FSD (Full Self-Driving) core architecture at the ICCV conference, indicating a significant development in its autonomous driving technology [1][4] - The presentation by Ashok Elluswamy has sparked discussions about Tesla's potential use of VLA (Vision-Language Architecture) in its systems, amidst an ongoing debate in the industry between VLA and world models [1][7] Technical Developments - The FSD architecture integrates a large neural network capable of processing multimodal inputs, including camera video, navigation data, vehicle motion status, and sound, with outputs that include panoramic segmentation, 3D occupancy networks, and language [6][10] - The architecture's ability to output language information suggests a shift towards a more advanced model capable of understanding and reasoning with long-term data [7][10] Industry Context - The debate between VLA and world models is prominent, with VLA proponents arguing for its ability to leverage vast internet data for knowledge accumulation and reasoning, while world model advocates claim it addresses the core challenges of autonomous driving more effectively [7][10] - The industry is moving towards larger model parameters, with Tesla's upcoming smart driving chip expected to reach 2000 TOPS, indicating a significant increase in computational power and model capabilities [10][12] Recent Updates - The latest FSD update (V14.1.3) includes enhancements for safety and personalization, improving obstacle avoidance and navigation capabilities [12] - Tesla has reintroduced the "Mad Max Mode," which allows for a more aggressive driving style, showcasing the system's adaptability in various driving scenarios [11][14]
别造轮子了!原力灵机开源Dexbotic:迈向具身智能的一站式VLA工具箱
具身智能之心· 2025-10-22 06:02
Core Insights - The article discusses the rapid development of embodied VLA (Vision-Language Agents) models and the challenges faced by individual developers and small research teams in creating and maintaining a unified open-source framework for these models [4][7][29]. Group 1: VLA Development Challenges - The current VLA development landscape is fragmented, with various teams using different deep learning frameworks and model architectures, leading to inefficiencies in model comparison and performance evaluation [4][7]. - Existing VLA models often do not leverage the capabilities of the latest LLMs (Large Language Models), which limits the potential of the "embodied brain" [4][7]. - There is a pressing need for a mature, unified open-source VLA framework to address these challenges, which has led to the creation of Dexbotic [4][7]. Group 2: Dexbotic Framework Features - Dexbotic integrates mainstream pre-trained models for manipulation and navigation policies, supporting both cloud and local training, making it user-friendly and ready to use [2][4]. - The framework introduces the Dexdata format to unify data from different sources, significantly reducing storage costs and simplifying data preparation for developers [9][10]. - Dexbotic's architecture consists of three layers: data layer, model layer, and experimental layer, enhancing the efficiency of algorithm comparison and model iteration by over 50% [11][24]. Group 3: Performance Improvements - Dexbotic's pre-trained models have shown significant performance improvements in various tasks, with DB-CogACT achieving an 18.2% increase in average success rate compared to the original CogACT model [21][22]. - The framework has also demonstrated strong performance in real-world tasks, with UR5e achieving a 100% success rate in specific tasks [29]. Group 4: Open Source and Community Engagement - Dexbotic aims to facilitate collaboration and innovation in the field of embodied intelligence by providing an open-source platform that allows developers to contribute and share their work [30][32]. - The initiative encourages participation from both academic and industrial partners to enhance the development of embodied intelligence technologies [30][32].
自驾行业完整的基建,更值得毕业的同学做探索!
自动驾驶之心· 2025-10-17 00:03
Core Viewpoint - The autonomous driving industry is maturing in terms of infrastructure and investment, making it a suitable field for students and professionals to explore and develop their skills [1][16]. Group 1: Industry Insights - The technology landscape in autonomous driving is consolidating, but there are still many product forms to refine, indicating ongoing opportunities for innovation [1]. - The industry is currently debating the technical routes of world models and VLA, suggesting that while theoretical aspects may be solidifying, practical implementation remains a challenge [1]. - The focus on L2 functionality and the regulatory progress for L3 indicates a gradual evolution towards more advanced levels of automation, with L4 still facing unresolved issues [1]. Group 2: Community and Learning Resources - A community called "Autonomous Driving Heart Knowledge Sphere" has been established, which integrates various resources such as videos, articles, learning paths, and job exchange, aimed at fostering collaboration and knowledge sharing [4][5]. - The community has grown to over 4,000 members, with a goal to reach nearly 10,000 in the next two years, providing a platform for both beginners and advanced learners [5]. - The community offers practical guidance on various topics, including entry points for end-to-end learning, multi-modal large models, and data annotation practices [7][8]. Group 3: Career Opportunities - The community actively shares job openings and facilitates connections between members and companies in the autonomous driving sector, enhancing employment opportunities [12][21]. - There is a focus on developing comprehensive learning paths for newcomers, ensuring they have access to a well-rounded education in autonomous driving technologies [17][38]. Group 4: Technical Development - The community has compiled over 40 technical routes and resources related to autonomous driving, covering areas such as perception, simulation, planning, and control [17][34]. - Regular discussions and live sessions with industry experts are held to explore trends, technical directions, and production challenges in autonomous driving [8][90].