李想: 过去的自动驾驶是看十万小时行车记录仪后直接上路

Core Viewpoint - The article discusses the breakthrough in autonomous driving technology through the introduction of the MindVLA-o1 model, which utilizes a native 3D Vision Transformer (ViT) to understand the three-dimensional world, addressing the limitations of current AI systems that primarily learn from 2D video data [1][2]. Group 1: Technology Breakthrough - The MindVLA-o1 model represents a significant advancement in autonomous driving by implementing a native 3D ViT, allowing for a true understanding of 3D spatial geometry and semantics from the outset [1][2]. - The model integrates spatial structure, positional relationships, and semantic information in a unified manner, enabling it to not only perceive the environment but also understand its context [2]. Group 2: Role of LiDAR - In this new framework, the role of LiDAR shifts from being the core of perception to serving as a high-precision tool for geometric calibration and near-field spatial constraints [2]. - The perception capabilities are determined more by the model's representation ability rather than the physical specifications of the sensors [2]. Group 3: Computational Requirements - The implementation of 3D ViT requires high computational power, which is addressed by the company's self-developed Mach chip, offering three times the effective computing power of the previous generation [2]. Group 4: Versatility of the Model - The VLA base model is not limited to autonomous driving; it is also capable of controlling robots, evolving into a general-purpose physical world intelligence agent [3]. - Autonomous driving is positioned as just the starting point for the broader application of physical AI [4].