Workflow
MindGPT
icon
Search documents
理想汽车李想:全新理想L9将是具身智能机器人开山之作
Mei Ri Jing Ji Xin Wen· 2026-02-05 09:50
Core Viewpoint - The company envisions the ultimate form of automobiles as robots, focusing on integrating advanced AI and intelligent systems into vehicles to enhance user experience and functionality [3][4]. Group 1: Technological Advancements - In 2018, the company launched the Li ONE, introducing sensory capabilities with microphones, radars, and cameras [3]. - By 2021, the self-developed perception system was integrated into the 2021 Li ONE model, marking a significant step in vehicle intelligence [3]. - The company initiated the development of its operating system in October 2021, further enhancing vehicle capabilities [3]. - The Li L9 was released in June 2022, featuring the first mass production of the self-developed central domain controller XCU, transforming the vehicle into a living space rather than just a tool [3]. Group 2: Future Developments - The company plans to deliver the first end-to-end AD large model by October 2024, indicating a strong commitment to AI integration [4]. - In March 2025, the company will open-source its Star Ring OS, enhancing software accessibility and innovation [4]. - The VLA driver model will be integrated into the Li i8 by August 2025, showcasing ongoing advancements in AI capabilities [4]. Group 3: Strategic Focus - The company emphasizes that 70% of its focus remains on automotive development, ensuring that intelligent systems are built on high-quality vehicles to create real value for users [4]. - The upcoming Li L9 is positioned as a groundbreaking intelligent robot vehicle, equipped with a complete technology stack to transition from a passive tool to an active partner for users [4]. - The company is preparing for its second decade in 2026, with the new Li L9 set to be a pioneering product in the realm of embodied intelligent robots [4][5].
李想: 全新理想L9是具身智能机器人的开山之作。我们准备了十年,就等这一刻。
理想TOP2· 2026-02-05 08:25
Core Viewpoint - The company envisions the ultimate form of a car as a robot, not just a faster or smarter transportation tool, and has been progressively working towards this goal over the past decade [1]. Group 1: Development Timeline - In October 2018, the company launched the Li ONE, introducing sensory capabilities with microphones, radars, and cameras [1]. - In May 2021, the self-developed perception system was integrated into the 2021 Li ONE model [1]. - In October 2021, the project for a self-developed operating system was initiated [2]. - In June 2022, the Li L9 was released, featuring the first mass production of the self-developed central domain controller XCU [2]. - In November 2022, the project for self-developed chips was initiated [3]. - In March 2023, the project for a self-developed large model was initiated [4]. - In December 2023, the self-developed large model MindGPT was launched with OTA 5.0 [5]. - In April 2024, the Li L6 was released, marking the first mass production of the self-developed operating system Star Ring OS [6]. - In October 2024, the first end-to-end AD large model was delivered [7]. - In March 2025, the Star Ring OS was open-sourced [8]. - In August 2025, the VLA driver large model was integrated into the Li i8 [9]. Group 2: Future Vision - The upcoming Li L9 is positioned as a smart entity, equipped with a complete technology stack that transforms the car from a passive tool into an active partner [9]. - The company emphasizes that embodied intelligence must be integrated into a good car to create real value for users, dedicating 70% of its focus on automotive development [9]. - The company is preparing for its second decade in 2026, with the new Li L9 being a pioneering product in the realm of embodied intelligent robots [9].
陈伟GTC2024讲MindGPT压缩版/视频版/图文版
理想TOP2· 2025-12-15 12:02
Core Viewpoint - The article discusses the advancements in the development of MindGPT, a multimodal cognitive model designed to enhance human-machine interaction in smart vehicles, emphasizing its capabilities in perception, understanding, and interaction [2][20][39]. Group 1: Technology and Model Architecture - MindGPT is built on a self-developed TaskFormer structure, which has been recognized for its performance in industry evaluations [2][35]. - The model incorporates multimodal perception capabilities, allowing it to process audio and visual data simultaneously, enhancing user interaction through features like voice recognition and gesture control [29][30]. - The architecture supports a complete agent capability, integrating perception, planning, memory, tools, and action [35][36]. Group 2: Training and Performance - The training strategy focuses on 15 key areas relevant to in-car scenarios, utilizing self-supervised learning and reinforcement learning from human feedback (RLHF) to cover over 110 domains and 1,000 specialized capabilities [3][35]. - The training platform, Li-PTM, achieves training speeds that are significantly faster than industry standards, with SFT phase speeds over three times better than the best open-source capabilities [46][47]. - The model's inference engine, LisaRT-LLM, has been optimized for performance, achieving a throughput increase of over 1.3 times compared to previous models under high concurrency [5][53]. Group 3: User Interaction and Experience - MindGPT aims to create a natural interaction experience by allowing users to communicate with the vehicle using simple commands and gestures, reducing the complexity of user input [10][32]. - The system is designed to understand and remember user preferences, providing personalized interactions based on historical conversations [36][39]. - The integration of advanced AI technologies aims to enhance emotional connections between users and their vehicles, creating a more immersive experience [14][18].
李想:特斯拉V14也用了VLA相同的技术
自动驾驶之心· 2025-10-19 23:32
Core Insights - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [17][18]. Group 1: Stages of AI Development - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [19][4]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [20][21]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of professionalism and reliability, comparable to a person in a specialized job [22][23]. - The fourth stage is Innovators, focusing on the ability to generate and solve problems through real-world training and feedback, which is essential for enhancing the capabilities of AI [25][26]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to how businesses manage human resources [27][28]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times in the next five years, while training computational needs may expand by 10 times [10][29]. - The article highlights the necessity for both edge computing and cloud-based processing to support the various stages of AI development [28][29]. Group 3: Ideal Automotive Applications - The company is developing its own reasoning models (MindVLA/MindGPT) and agents (Driver Agent/Ideal Classmate Agent) to enhance its autonomous driving capabilities [31][33]. - By 2026, the company plans to equip its autonomous vehicles with self-developed advanced edge chips for deeper integration with AI [12][33]. Group 4: Training and Skill Development - Effective training for AI involves enhancing three key abilities: information processing, problem formulation and solving, and resource allocation [39][40][41]. - The article emphasizes that successful AI applications require extensive training, akin to the 10,000 hours of practice needed for mastery in a profession [36][42].
李想: 特斯拉V14也用了VLA相同技术|25年10月18日B站图文版压缩版
理想TOP2· 2025-10-18 16:03
Core Viewpoint - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [10][11]. Group 1: Stages of AI - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [2][14]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [3][16]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of reliability and professionalism, comparable to a person in a specialized job [4][17]. - The fourth stage is Innovators, focusing on generating and solving problems through reinforcement training, necessitating a world model for effective training [5][19]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to corporate management [4][21]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times, while training computational needs may expand by 10 times over the next five years [7][23]. - The article highlights the necessity for both edge and cloud computing to support the various stages of AI development, particularly in the Agent and Innovator phases [6][22]. Group 3: Ideal Self-Developed Technologies - The company is developing its own reasoning models (MindVLA/MindGPT), agents (Driver Agent/Ideal Classmate Agent), and world models to enhance its AI capabilities [8][24]. - By 2026, the company plans to equip its autonomous driving technology with self-developed advanced edge chips for deeper integration with AI [9][26]. Group 4: Training and Skill Development - The article emphasizes the importance of training in three key areas: information processing ability, problem formulation and solving ability, and resource allocation ability [33][36]. - It suggests that effective training requires real-world experience and feedback, akin to the 10,000-hour rule for mastering a profession [29][30].
理想基座模型负责人近期很满意的工作: RuscaRL
理想TOP2· 2025-10-03 09:55
Core Viewpoint - The article discusses the importance of reinforcement learning (RL) in enhancing the intelligence of large models, emphasizing the need for effective interaction between models and their environments to obtain high-quality feedback [1][2]. Summary by Sections Section 1: Importance of Reinforcement Learning - The article highlights that RL is crucial for the advancement of large model intelligence, with a focus on how to enable models to interact with broader environments to achieve capability generalization [1][8]. - It mentions various RL techniques such as RLHF (Reinforcement Learning from Human Feedback), RLAIF (AI Feedback Reinforcement Learning), and RLVR (Verifiable Reward Reinforcement Learning) as key areas of exploration [1][8]. Section 2: RuscaRL Framework - The RuscaRL framework is introduced as a solution to the exploration bottleneck in RL, utilizing educational psychology's scaffolding theory to enhance the reasoning capabilities of large language models (LLMs) [12][13]. - The framework employs explicit scaffolding and verifiable rewards to guide model training and improve response quality [13][15]. Section 3: Mechanisms of RuscaRL - **Explicit Scaffolding**: This mechanism provides structured guidance through rubrics, helping models generate diverse and high-quality responses while gradually reducing external support as the model's capabilities improve [14]. - **Verifiable Rewards**: RuscaRL designs rewards based on rubrics, allowing for stable and reliable feedback during training, which enhances exploration diversity and ensures knowledge consistency across tasks [15][16]. Section 4: Future Implications - The article suggests that both MindGPT and MindVLA, which target digital and physical worlds respectively, could benefit from the advancements made through RuscaRL, indicating a promising future for self-evolving models [9][10]. - It emphasizes that the current challenges in RL are not just algorithmic but also involve systemic integration of algorithms and infrastructure, highlighting the need for innovative approaches in building capabilities [9].
理想汽车MoE+Sparse Attention高效结构解析
自动驾驶之心· 2025-08-26 23:32
Core Viewpoint - The article discusses the advanced technologies used in Li Auto's autonomous driving solutions, specifically focusing on the "MoE + Sparse Attention" efficient structure that enhances the performance and efficiency of large models in 3D spatial understanding and reasoning [3][6]. Group 1: Introduction to Technologies - The article introduces a series of posts that delve deeper into the advanced technologies involved in Li Auto's VLM and VLA solutions, which were only briefly discussed in previous articles [3]. - The focus is on the "MoE + Sparse Attention" structure, which is crucial for improving the efficiency and performance of large models [3][6]. Group 2: Sparse Attention - Sparse Attention limits the complexity of the attention mechanism by focusing only on key input parts, rather than computing globally, which is particularly beneficial in 3D scenarios [6][10]. - The structure combines local attention and strided attention to create a sparse yet effective attention mechanism, ensuring that each token can quickly propagate information while maintaining local modeling capabilities [10][11]. Group 3: MoE (Mixture of Experts) - MoE architecture divides computations into multiple expert sub-networks, allowing only a subset of experts to be activated for each input, thus enhancing computational efficiency without significantly increasing inference costs [22][24]. - The article outlines the core components of MoE, including the Gate module for selecting experts, the Experts module as independent networks, and the Dispatcher for optimizing computation [24][25]. Group 4: Implementation and Communication - The article provides insights into the implementation of MoE using DeepSpeed, highlighting its flexibility and efficiency in handling large models [27][29]. - It discusses the communication mechanisms required for efficient data distribution across multiple GPUs, emphasizing the importance of the all-to-all communication strategy in distributed training [34][37].