MindGPT
Search documents
陈伟GTC2024讲MindGPT压缩版/视频版/图文版
理想TOP2· 2025-12-15 12:02
2025年12月15日有一位读者希望获得陈伟在2024年3月讲的Building LLM-Powered Space Interaction Experience with MindGPT的pdf,因为老师让其做车载 大模型方向的调研。TOP2就顺手花几个小时语音转文字并校准了。 之后也将定期筛选记录并结构化理想历史信息,目的是让1年5年10年后希望深入研究理想的人可以方便的查阅研究理想过去究竟做了什么。 PDF见: https://pan.baidu.com/s/1k1Dm5rAWPRHm6KdVvK2pIA?pwd=xxsb 提取码: xxsb 压缩版: 三维空间人机交付从人适应机器转变为机器适应人。2023年6月发布MindGPT,以MindGPT为核心,构建感知-规划-记忆-工具-行动的完整 Agent能力。 MindGPT-MP 通过海量视听数据进行自监督学习与多任务精调,利用全车麦克风与摄像头实现同步感知。 全维感知 : 经过信号分离与融合,实现精准的用户定位与人声分离,具备多语种、多方言及情绪感知的边听边看能力。 指令自由说 : 支持不限数量的连续指令控制。 方言自由说 : 支持多种方言的自由唤 ...
李想:特斯拉V14也用了VLA相同的技术
自动驾驶之心· 2025-10-19 23:32
Core Insights - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [17][18]. Group 1: Stages of AI Development - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [19][4]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [20][21]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of professionalism and reliability, comparable to a person in a specialized job [22][23]. - The fourth stage is Innovators, focusing on the ability to generate and solve problems through real-world training and feedback, which is essential for enhancing the capabilities of AI [25][26]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to how businesses manage human resources [27][28]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times in the next five years, while training computational needs may expand by 10 times [10][29]. - The article highlights the necessity for both edge computing and cloud-based processing to support the various stages of AI development [28][29]. Group 3: Ideal Automotive Applications - The company is developing its own reasoning models (MindVLA/MindGPT) and agents (Driver Agent/Ideal Classmate Agent) to enhance its autonomous driving capabilities [31][33]. - By 2026, the company plans to equip its autonomous vehicles with self-developed advanced edge chips for deeper integration with AI [12][33]. Group 4: Training and Skill Development - Effective training for AI involves enhancing three key abilities: information processing, problem formulation and solving, and resource allocation [39][40][41]. - The article emphasizes that successful AI applications require extensive training, akin to the 10,000 hours of practice needed for mastery in a profession [36][42].
李想: 特斯拉V14也用了VLA相同技术|25年10月18日B站图文版压缩版
理想TOP2· 2025-10-18 16:03
Core Viewpoint - The article discusses the five stages of artificial intelligence (AI) as defined by OpenAI, emphasizing the importance of each stage in the development and application of AI technologies [10][11]. Group 1: Stages of AI - The first stage is Chatbots, which serve as a foundational model that compresses human knowledge, akin to a person completing their education [2][14]. - The second stage is Reasoners, which utilize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to perform continuous reasoning tasks, similar to advanced academic training [3][16]. - The third stage is Agents, where AI begins to perform tasks autonomously, requiring a high level of reliability and professionalism, comparable to a person in a specialized job [4][17]. - The fourth stage is Innovators, focusing on generating and solving problems through reinforcement training, necessitating a world model for effective training [5][19]. - The fifth stage is Organizations, which manage multiple agents and innovations to prevent chaos, similar to corporate management [4][21]. Group 2: Computational Needs - The demand for reasoning computational power is expected to increase by 100 times, while training computational needs may expand by 10 times over the next five years [7][23]. - The article highlights the necessity for both edge and cloud computing to support the various stages of AI development, particularly in the Agent and Innovator phases [6][22]. Group 3: Ideal Self-Developed Technologies - The company is developing its own reasoning models (MindVLA/MindGPT), agents (Driver Agent/Ideal Classmate Agent), and world models to enhance its AI capabilities [8][24]. - By 2026, the company plans to equip its autonomous driving technology with self-developed advanced edge chips for deeper integration with AI [9][26]. Group 4: Training and Skill Development - The article emphasizes the importance of training in three key areas: information processing ability, problem formulation and solving ability, and resource allocation ability [33][36]. - It suggests that effective training requires real-world experience and feedback, akin to the 10,000-hour rule for mastering a profession [29][30].
理想基座模型负责人近期很满意的工作: RuscaRL
理想TOP2· 2025-10-03 09:55
Core Viewpoint - The article discusses the importance of reinforcement learning (RL) in enhancing the intelligence of large models, emphasizing the need for effective interaction between models and their environments to obtain high-quality feedback [1][2]. Summary by Sections Section 1: Importance of Reinforcement Learning - The article highlights that RL is crucial for the advancement of large model intelligence, with a focus on how to enable models to interact with broader environments to achieve capability generalization [1][8]. - It mentions various RL techniques such as RLHF (Reinforcement Learning from Human Feedback), RLAIF (AI Feedback Reinforcement Learning), and RLVR (Verifiable Reward Reinforcement Learning) as key areas of exploration [1][8]. Section 2: RuscaRL Framework - The RuscaRL framework is introduced as a solution to the exploration bottleneck in RL, utilizing educational psychology's scaffolding theory to enhance the reasoning capabilities of large language models (LLMs) [12][13]. - The framework employs explicit scaffolding and verifiable rewards to guide model training and improve response quality [13][15]. Section 3: Mechanisms of RuscaRL - **Explicit Scaffolding**: This mechanism provides structured guidance through rubrics, helping models generate diverse and high-quality responses while gradually reducing external support as the model's capabilities improve [14]. - **Verifiable Rewards**: RuscaRL designs rewards based on rubrics, allowing for stable and reliable feedback during training, which enhances exploration diversity and ensures knowledge consistency across tasks [15][16]. Section 4: Future Implications - The article suggests that both MindGPT and MindVLA, which target digital and physical worlds respectively, could benefit from the advancements made through RuscaRL, indicating a promising future for self-evolving models [9][10]. - It emphasizes that the current challenges in RL are not just algorithmic but also involve systemic integration of algorithms and infrastructure, highlighting the need for innovative approaches in building capabilities [9].
理想汽车MoE+Sparse Attention高效结构解析
自动驾驶之心· 2025-08-26 23:32
Core Viewpoint - The article discusses the advanced technologies used in Li Auto's autonomous driving solutions, specifically focusing on the "MoE + Sparse Attention" efficient structure that enhances the performance and efficiency of large models in 3D spatial understanding and reasoning [3][6]. Group 1: Introduction to Technologies - The article introduces a series of posts that delve deeper into the advanced technologies involved in Li Auto's VLM and VLA solutions, which were only briefly discussed in previous articles [3]. - The focus is on the "MoE + Sparse Attention" structure, which is crucial for improving the efficiency and performance of large models [3][6]. Group 2: Sparse Attention - Sparse Attention limits the complexity of the attention mechanism by focusing only on key input parts, rather than computing globally, which is particularly beneficial in 3D scenarios [6][10]. - The structure combines local attention and strided attention to create a sparse yet effective attention mechanism, ensuring that each token can quickly propagate information while maintaining local modeling capabilities [10][11]. Group 3: MoE (Mixture of Experts) - MoE architecture divides computations into multiple expert sub-networks, allowing only a subset of experts to be activated for each input, thus enhancing computational efficiency without significantly increasing inference costs [22][24]. - The article outlines the core components of MoE, including the Gate module for selecting experts, the Experts module as independent networks, and the Dispatcher for optimizing computation [24][25]. Group 4: Implementation and Communication - The article provides insights into the implementation of MoE using DeepSpeed, highlighting its flexibility and efficiency in handling large models [27][29]. - It discusses the communication mechanisms required for efficient data distribution across multiple GPUs, emphasizing the importance of the all-to-all communication strategy in distributed training [34][37].