MindVLA - filings, earnings calls, financial reports, news

MindVLA

Search documents

自动驾驶之心· 2025-12-28 03:30

Core Viewpoint - The article emphasizes the transition of autonomous driving from "perception-driven" to "spatial intelligence" by 2025, highlighting the importance of understanding and interacting with the three-dimensional physical world [3]. Group 1: Spatial Intelligence Definition - Spatial intelligence is defined as the ability to perceive, represent, reason, decide, and interact with spatial information, which is crucial for the interaction between intelligent agents and the physical world [3]. - Current spatial intelligence is primarily focused on perception and representation, with significant room for improvement in reasoning, decision-making, and interaction capabilities [3]. Group 2: World Models and Simulation - GAIA-2 is a multi-view generative world model for autonomous driving that generates driving videos based on physical laws and conditions, addressing edge cases in driving scenarios [5]. - GAIA-3 enhances GAIA-2 by increasing the scale fivefold and capturing fine-grained spatiotemporal contexts, representing the physical causal structure of the real world [9]. - ReSim combines expert trajectories from the real world with simulated dangerous behaviors to achieve high-fidelity simulations of extreme driving scenarios [11]. Group 3: Multimodal Reasoning - The SIG framework introduces a structured graph scheme that encodes scene layouts and object relationships, aiming to enhance geometric reasoning in autonomous driving [16]. - OmniDrive generates a large-scale 3D question-answer dataset to align visual language models with 3D spatial understanding and planning [19]. - SimLingo addresses the alignment of driving behavior with semantic instructions through an action dreaming task, demonstrating the potential of general models in real-time decision-making [21]. Group 4: Real-time Digital Twins - DrivingRecon is a 4D Gaussian reconstruction model that predicts parameters from surround-view videos, enabling efficient dynamic scene reconstruction for autonomous driving [26]. - VR-Drive enhances robustness in driving systems by allowing real-time prediction of new viewpoints without scene optimization [29]. Group 5: Embodied Fusion - MiMo-Embodied is the first open-source cross-embodied model that integrates autonomous driving with embodied intelligence, showcasing significant transfer effects in spatial reasoning capabilities [31]. - DriveGPT4-V2 is a closed-loop end-to-end autonomous driving framework that outputs low-level control signals, evolving from visual understanding to closed-loop control [36]. Group 6: Industry Trends - By 2025, the industry is moving towards an end-to-end VLA architecture, leveraging large language models for driving decision-making [40]. - Waymo's EMMA model integrates multimodal inputs and outputs in a unified language space, enhancing complex reasoning in driving tasks [41]. - DeepRoute.ai's DeepRoute IO 2.0 architecture introduces chain-of-thought reasoning to address the "black box" issue in end-to-end models, improving user trust in autonomous systems [44].

L3自动驾驶量产元年，离L4的梦想又近了一步？

Xin Lang Cai Jing· 2025-12-17 06:30

Group 1 - The Ministry of Industry and Information Technology has approved the commercial operation of L3 autonomous driving for the first time in China, allowing vehicles to operate under specific conditions with the system taking over driving tasks [1] - The two models approved for L3 autonomous driving are Changan Deep Blue SL03 and Arcfox Alpha S6, marking a significant step towards the commercialization of L3 technology [1] - The year 2026 is anticipated to be the "mass production year" for L3 autonomous driving, with several companies aiming to launch L3 vehicles by then [3][4] Group 2 - The approval clarifies the responsibility division for L3 autonomous driving, indicating that if an accident occurs while the system is activated, the car manufacturer may bear primary responsibility [1] - The L3 level is seen as a crucial transition from "assisted driving" to "fully autonomous driving," with L4 expected to achieve greater breakthroughs [1][4] - Major automotive companies, including XPeng, Chery, and GAC, have set timelines for the mass production of L3 vehicles, with GAC planning to launch its first L3 model in Q4 of this year [3][4] Group 3 - The automotive industry is experiencing intensified competition in intelligent driving technologies, with companies like BYD, Geely, and Chery developing their own autonomous driving systems [9] - The integration of AI and data-driven technologies is becoming essential for enhancing autonomous driving capabilities, moving beyond traditional rule-based systems [9][12] - The VLA model is emerging as a key technology in the transition from L2 to L4 autonomous driving, offering improved scene reasoning and generalization capabilities [9][14] Group 4 - The shift towards L3 autonomous driving represents a new beginning for human-machine coexistence, with ongoing exploration in technology iteration and regulatory improvement [17] - Companies are increasingly focusing on in-house development of core technologies, such as battery technology and autonomous driving algorithms, to enhance brand competitiveness [16] - The balance between self-research and collaboration is crucial for companies to maintain technological leadership while managing costs [16][17]

以理想汽车为例，探寻自动驾驶的「大脑」进化史 - VLA 架构解析

自动驾驶之心· 2025-12-07 02:05

作者 | 我要吃鸡腿编辑 | 自动驾驶之心原文链接： https://zhuanlan.zhihu.com/p/1965839552158623077 点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文在自动驾驶这个飞速迭代的领域，技术范式的更迭快得令人目不暇接。前年，行业言必称BEV（鸟瞰图视角）；去年，"端到端"（End-to-End）又成了新的技术高地。然而，每一种范式在解决旧问题的同时，似乎都在催生新的挑战。传统的"端到端"自动驾驶，即VA（Vision-Action，视觉-行动）模型，就暴露出一个深刻的矛盾：它就像一个车技高超但沉默寡言的"老司机"。它能凭借海量数据训练出的"直觉"，在复杂的路况中做出令人惊叹的丝滑操作。但当您坐在副驾，心脏漏跳一拍后问它："刚才为什么突然减速？"——它答不上来。这就是"黑箱"问题：系统能"做对"，但我们不知道它"为何做对"。这种无法解释、无法沟通的特性，带来了巨大的信任危机。自动驾驶的三大范式演进。(a) ...

自动驾驶之心· 2025-10-19 23:32

Core Insights - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [17][18]. Group 1: Stages of AI Development - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [19][4]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [20][21]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of professionalism and reliability, comparable to a person in a specialized job [22][23]. - The fourth stage is Innovators, focusing on the ability to generate and solve problems through real-world training and feedback, which is essential for enhancing the capabilities of AI [25][26]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to how businesses manage human resources [27][28]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times in the next five years, while training computational needs may expand by 10 times [10][29]. - The article highlights the necessity for both edge computing and cloud-based processing to support the various stages of AI development [28][29]. Group 3: Ideal Automotive Applications - The company is developing its own reasoning models (MindVLA/MindGPT) and agents (Driver Agent/Ideal Classmate Agent) to enhance its autonomous driving capabilities [31][33]. - By 2026, the company plans to equip its autonomous vehicles with self-developed advanced edge chips for deeper integration with AI [12][33]. Group 4: Training and Skill Development - Effective training for AI involves enhancing three key abilities: information processing, problem formulation and solving, and resource allocation [39][40][41]. - The article emphasizes that successful AI applications require extensive training, akin to the 10,000 hours of practice needed for mastery in a profession [36][42].

李想: 特斯拉V14也用了VLA相同技术|25年10月18日B站图文版压缩版

理想TOP2· 2025-10-18 16:03

Core Viewpoint - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [10][11]. Group 1: Stages of AI - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [2][14]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [3][16]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of reliability and professionalism, comparable to a person in a specialized job [4][17]. - The fourth stage is Innovators, focusing on generating and solving problems through reinforcement training, necessitating a world model for effective training [5][19]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to corporate management [4][21]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times, while training computational needs may expand by 10 times over the next five years [7][23]. - The article highlights the necessity for both edge and cloud computing to support the various stages of AI development, particularly in the Agent and Innovator phases [6][22]. Group 3: Ideal Self-Developed Technologies - The company is developing its own reasoning models (MindVLA/MindGPT), agents (Driver Agent/Ideal Classmate Agent), and world models to enhance its AI capabilities [8][24]. - By 2026, the company plans to equip its autonomous driving technology with self-developed advanced edge chips for deeper integration with AI [9][26]. Group 4: Training and Skill Development - The article emphasizes the importance of training in three key areas: information processing ability, problem formulation and solving ability, and resource allocation ability [33][36]. - It suggests that effective training requires real-world experience and feedback, akin to the 10,000-hour rule for mastering a profession [29][30].

理想基座模型负责人近期很满意的工作: RuscaRL

理想TOP2· 2025-10-03 09:55

Core Viewpoint - The article discusses the importance of reinforcement learning (RL) in enhancing the intelligence of large models, emphasizing the need for effective interaction between models and their environments to obtain high-quality feedback [1][2]. Summary by Sections Section 1: Importance of Reinforcement Learning - The article highlights that RL is crucial for the advancement of large model intelligence, with a focus on how to enable models to interact with broader environments to achieve capability generalization [1][8]. - It mentions various RL techniques such as RLHF (Reinforcement Learning from Human Feedback), RLAIF (AI Feedback Reinforcement Learning), and RLVR (Verifiable Reward Reinforcement Learning) as key areas of exploration [1][8]. Section 2: RuscaRL Framework - The RuscaRL framework is introduced as a solution to the exploration bottleneck in RL, utilizing educational psychology's scaffolding theory to enhance the reasoning capabilities of large language models (LLMs) [12][13]. - The framework employs explicit scaffolding and verifiable rewards to guide model training and improve response quality [13][15]. Section 3: Mechanisms of RuscaRL - **Explicit Scaffolding**: This mechanism provides structured guidance through rubrics, helping models generate diverse and high-quality responses while gradually reducing external support as the model's capabilities improve [14]. - **Verifiable Rewards**: RuscaRL designs rewards based on rubrics, allowing for stable and reliable feedback during training, which enhances exploration diversity and ensures knowledge consistency across tasks [15][16]. Section 4: Future Implications - The article suggests that both MindGPT and MindVLA, which target digital and physical worlds respectively, could benefit from the advancements made through RuscaRL, indicating a promising future for self-evolving models [9][10]. - It emphasizes that the current challenges in RL are not just algorithmic but also involve systemic integration of algorithms and infrastructure, highlighting the need for innovative approaches in building capabilities [9].

理想汽车MoE+Sparse Attention高效结构解析

自动驾驶之心· 2025-08-26 23:32

Core Viewpoint - The article discusses the advanced technologies used in Li Auto's autonomous driving solutions, specifically focusing on the "MoE + Sparse Attention" efficient structure that enhances the performance and efficiency of large models in 3D spatial understanding and reasoning [3][6]. Group 1: Introduction to Technologies - The article introduces a series of posts that delve deeper into the advanced technologies involved in Li Auto's VLM and VLA solutions, which were only briefly discussed in previous articles [3]. - The focus is on the "MoE + Sparse Attention" structure, which is crucial for improving the efficiency and performance of large models [3][6]. Group 2: Sparse Attention - Sparse Attention limits the complexity of the attention mechanism by focusing only on key input parts, rather than computing globally, which is particularly beneficial in 3D scenarios [6][10]. - The structure combines local attention and strided attention to create a sparse yet effective attention mechanism, ensuring that each token can quickly propagate information while maintaining local modeling capabilities [10][11]. Group 3: MoE (Mixture of Experts) - MoE architecture divides computations into multiple expert sub-networks, allowing only a subset of experts to be activated for each input, thus enhancing computational efficiency without significantly increasing inference costs [22][24]. - The article outlines the core components of MoE, including the Gate module for selecting experts, the Experts module as independent networks, and the Dispatcher for optimizing computation [24][25]. Group 4: Implementation and Communication - The article provides insights into the implementation of MoE using DeepSpeed, highlighting its flexibility and efficiency in handling large models [27][29]. - It discusses the communication mechanisms required for efficient data distribution across multiple GPUs, emphasizing the importance of the all-to-all communication strategy in distributed training [34][37].

MoE + Sparse Attention

MoE + Sparse Attention

Xin Lang Cai Jing· 2025-08-02 01:34

Core Viewpoint - The launch of Li Auto's second pure electric model, the Li i8, represents a significant step in the company's pursuit of its "pure electric dream," with a focus on enhanced performance and advanced technology [1][3]. Group 1: Product Launch and Features - The Li i8 is officially on sale as of July 29, with three versions priced between 321,800 yuan and 369,800 yuan, approximately 30,000 yuan lower than the previous pre-sale price [1][3]. - The i8 features longer pure electric range, lower drag coefficient, and the introduction of the MindVLA autonomous driving architecture, which has been in development for years [3][14]. - The i8's dimensions are 5085mm in length, 1960mm in width, and 1740mm in height, with a wheelbase of 3050mm, providing spacious interior comfort [9][11]. Group 2: Competitive Landscape - The i8 enters a competitive market segment for six-seat pure electric SUVs, facing rivals such as the Aito M8, Leapmotor L90, and Tesla Model Y L [4][29]. - The pricing strategy of the i8 is not aggressive, which means it must rely on its overall strength to attract consumers [4][30]. - The market for pure electric models priced above 300,000 yuan is limited, with less than 80,000 units sold in the first four months of 2025, indicating a challenging environment for the i8 [29][30]. Group 3: Strategic Adjustments and Organizational Changes - Following the underperformance of the MEGA model, Li Auto made significant organizational adjustments, merging sales and service teams into a new smart vehicle group to enhance product development [3][21]. - The company has invested approximately 2 billion yuan in design changes for the i8, emphasizing low drag and brand recognition [25][27]. - Li Auto's internal discussions led to a clearer product line strategy, distinguishing the i series from the MEGA brand and focusing on the pure electric SUV market [21][25]. Group 4: Technological Innovations - The i8 is equipped with a self-developed silicon carbide drive motor, achieving a noise level of just 3.5 decibels at high speeds [14]. - The vehicle's dual motor system delivers a combined power of 400 kW (approximately 544 horsepower) and a maximum torque of 660 Nm, with a 0-100 km/h acceleration time of 4.5 seconds [14][15]. - The MindVLA system, a new visual-language-behavior model, allows the i8 to adapt to driving conditions in real-time, enhancing the driving experience [16][18].

Zheng Quan Shi Bao Wang· 2025-07-23 03:29

Core Insights - The competition among six-seat pure electric SUVs is intensifying, with models like AITO M8, Tesla Model Y L, and Li Auto i8 showcasing unique selling points to capture market share [1][2][3] Group 1: Technology and Features - AITO M8 features the latest HUAWEI ADS4 intelligent driving system, equipped with advanced sensors including a 192-line LiDAR and multiple radar systems, enhancing safety and driving assistance [1] - Tesla Model Y L is recognized for its Autopilot system, which offers extensive driving assistance features, although it faces challenges in fully utilizing its hardware in the domestic market [1][2] - Li Auto i8 is expected to incorporate the next-generation MindVLA driving architecture and NVIDIA's Drive AGX Thor-U chip for advanced data processing and decision-making [2] Group 2: Space and Comfort - AITO M8 offers a spacious design with dimensions of 5190/1999/1795mm and a wheelbase of 3105mm, providing both five-seat and six-seat configurations, along with a 110L front trunk for added convenience [3] - Tesla Model Y L emphasizes minimalist design with a large storage compartment in the center console, facilitating organized storage [3] - Li Auto i8 optimizes space through chassis layout and a multi-layer trunk design, ensuring ample legroom and storage options [3] Group 3: Performance and Range - AITO M8 is built on Huawei's 800V high-voltage battery platform, featuring a 100 kWh battery from CATL, with a maximum CLTC range of 705 km and efficient charging capabilities [3] - Tesla Model Y L offers various range options across different versions, supported by an extensive charging network for both urban commuting and long-distance travel [4]

VLA的Action到底是个啥？谈谈Diffusion：从图像生成到端到端轨迹规划~

自动驾驶之心· 2025-07-19 10:19

Core Viewpoint - The article discusses the principles and applications of diffusion models in the context of autonomous driving, highlighting their advantages over generative adversarial networks (GANs) and detailing specific use cases in the industry. Group 1: Diffusion Model Principles - Diffusion models are generative models that focus on denoising, learning and simulating data distributions through a forward diffusion process and a reverse generation process [2][4]. - The forward diffusion process adds noise to the initial data distribution, while the reverse generation process aims to remove noise to recover the original data [5][6]. - The models typically utilize a Markov chain to describe the state transitions during the noise addition and removal processes [8]. Group 2: Comparison with Generative Adversarial Networks - Both diffusion models and GANs involve noise addition and removal processes, but they differ in their core mechanisms: diffusion models rely on probabilistic modeling, while GANs use adversarial training between a generator and a discriminator [20][27]. - Diffusion models are generally more stable during training and produce higher quality samples, especially at high resolutions, compared to GANs, which can suffer from mode collapse and require training multiple networks [27][28]. Group 3: Applications in Autonomous Driving - Diffusion models are applied in various areas of autonomous driving, including synthetic data generation, scene prediction, perception enhancement, and path planning [29]. - They can generate realistic driving scene data to address the challenges of data scarcity and high annotation costs, particularly for rare scenarios like extreme weather [30][31]. - In scene prediction, diffusion models can forecast dynamic changes in driving environments and generate potential behaviors of traffic participants [33]. - For perception tasks, diffusion models enhance data quality by denoising bird's-eye view (BEV) images and improving sensor data consistency [34][35]. - In path planning, diffusion models support multimodal path generation, enhancing safety and adaptability in complex driving conditions [36]. Group 4: Notable Industry Implementations - Companies like Haomo Technology and Horizon Robotics are developing advanced algorithms based on diffusion models for real-world applications, achieving state-of-the-art performance in various driving scenarios [47][48]. - The integration of diffusion models with large language models (LLMs) and other technologies is expected to drive further innovations in the autonomous driving sector [46].