Workflow
3D ViT
icon
Search documents
李想: 过去的自动驾驶是看十万小时行车记录仪后直接上路
理想TOP2· 2026-03-18 13:52
Core Viewpoint - The article discusses the breakthrough in autonomous driving technology through the introduction of the MindVLA-o1 model, which utilizes a native 3D Vision Transformer (ViT) to understand the three-dimensional world, addressing the limitations of current AI systems that primarily learn from 2D video data [1][2]. Group 1: Technology Breakthrough - The MindVLA-o1 model represents a significant advancement in autonomous driving by implementing a native 3D ViT, allowing for a true understanding of 3D spatial geometry and semantics from the outset [1][2]. - The model integrates spatial structure, positional relationships, and semantic information in a unified manner, enabling it to not only perceive the environment but also understand its context [2]. Group 2: Role of LiDAR - In this new framework, the role of LiDAR shifts from being the core of perception to serving as a high-precision tool for geometric calibration and near-field spatial constraints [2]. - The perception capabilities are determined more by the model's representation ability rather than the physical specifications of the sensors [2]. Group 3: Computational Requirements - The implementation of 3D ViT requires high computational power, which is addressed by the company's self-developed Mach chip, offering three times the effective computing power of the previous generation [2]. Group 4: Versatility of the Model - The VLA base model is not limited to autonomous driving; it is also capable of controlling robots, evolving into a general-purpose physical world intelligence agent [3]. - Autonomous driving is positioned as just the starting point for the broader application of physical AI [4].
李想与詹锟对话自动驾驶下一步怎么走完整图文版/视频版
理想TOP2· 2026-03-18 13:25
Core Viewpoint - The article discusses the challenges and advancements in the field of autonomous driving, emphasizing the transition from rule-based systems to end-to-end AI systems, and the importance of 3D understanding in developing effective AI models for real-world applications [1][3][5]. Group 1: Autonomous Driving Development - The development of autonomous driving has been slow due to reliance on rule-based systems that require extensive manual tuning and experience [1][5]. - The shift to end-to-end AI systems marks a significant improvement, allowing for more rapid iterations and advancements in autonomous driving technology [1][5]. - Current AI systems still lack the level of intelligence comparable to humans, necessitating further advancements in multi-modal inputs and outputs to achieve a more complete understanding of the physical world [3][5]. Group 2: Importance of Pre-training - Pre-training is identified as a crucial foundation for AI development, as it allows for the compression of extensive training into more efficient models [7][8]. - The lack of effective pre-training in understanding 3D environments is a significant barrier to developing robust AI systems capable of real-world applications [8][20]. - The article highlights the need for a 3D visual encoder and decoder to enhance the AI's understanding of spatial relationships and improve its performance in physical environments [9][10]. Group 3: Technological Challenges - The transition to a 3D Vision Transformer (3D ViT) requires substantial computational power, with estimates suggesting a tenfold increase in computational requirements compared to 2D learning [21][22]. - The development of 3D ViT is contingent upon advancements in chip technology and the ability to conduct large-scale pre-training to extract meaningful 3D features [15][19]. - Key challenges include constructing a multi-modal thinking framework that integrates physical world understanding with action-oriented reasoning [33][36]. Group 4: Future Applications and Market Potential - The company aims to create a user experience in autonomous driving that feels natural and intuitive, akin to having a personal driver [37]. - The potential market for autonomous driving and related technologies is vast, with estimates suggesting a total addressable market in the hundreds of trillions [50]. - The company is focused on leveraging AI to enhance productivity and capabilities across its workforce, aiming for significant revenue growth through innovative applications of AI technology [51][52].