Workflow
理想VLA
icon
Search documents
AI应用公司负责人分享对理想VLA的理解
理想TOP2· 2025-09-13 11:50
Core Viewpoint - The core value of VLA (Vehicle Learning Assistant) lies in its ability to effectively utilize data for training foundational models and personal memory, enhancing user experience through self-evolution without the need for OTA updates [2][5][6]. Group 1: VLA Functionality - VLA's memory function captures various driving habits and preferences, allowing for a personalized driving experience that evolves over time [2][12]. - The system operates by tokenizing and summarizing collected data, which is then utilized to enhance the driving experience [10][13]. - Users are encouraged to actively engage with VLA by driving frequently to improve its performance and adaptability [8]. Group 2: Strategic Insights - The strategy involves a decentralized approach to personal memory data, AI infrastructure, and hardware integration, positioning the company to leverage user data effectively [6][20]. - The focus is on creating a unified experience across various devices, similar to Apple's ecosystem, which enhances user reliance on the brand [20][25]. - The importance of foundational model capabilities and the need for proprietary chip development to support advanced AI functionalities are emphasized [22][23]. Group 3: Market Positioning - The company is currently leading in the development of VLA and its memory capabilities, with competitors like Huawei and Horizon still in the early stages [15][19]. - The concept of "persistent memory" is highlighted as a key investment theme, enabling AI to evolve from a one-time tool to a reliable long-term partner [16][25]. - The integration of personalized memory with AI models is seen as a significant challenge but essential for creating customized driving experiences [25].
关于理想VLA新的36个QA
理想TOP2· 2025-08-13 05:10
Core Viewpoint - The article discusses the advancements and challenges in the development of the VLA (Visual-Language-Action) model for autonomous driving, emphasizing the importance of reinforcement learning and the integration of 3D spatial understanding with global semantic comprehension. Group 1: VLA Model Development - The VLA model incorporates reinforcement learning, which is crucial for its development and performance [1] - The integration of 3D spatial understanding and global semantic comprehension enhances the model's capabilities compared to previous versions [7] - The transition from VLM (Visual-Language Model) to VLA involves a shift from parallel to a more integrated architecture, allowing for deeper cognitive processing [3][4] Group 2: Technical Challenges - The deployment of the VLA model faces challenges such as multi-modal alignment, data training difficulties, and the complexity of deploying on a single chip [8][9] - The model's performance is expected to improve significantly with advancements in chip technology and optimization techniques [9][10] - The need for extensive data labeling and the potential for overfitting in simulation data are highlighted as ongoing concerns [23][32] Group 3: Industry Comparisons - The article compares the gradual approach of the company in advancing from L2 to L4 autonomous driving with the rapid expansion strategies of competitors like Tesla [11] - The company aims to provide a more comprehensive driving experience by focusing on user needs and safety, rather than solely on technological capabilities [11][22] Group 4: Future Directions - The company plans to enhance the VLA model's capabilities through continuous iteration and integration of user feedback, aiming for a more personalized driving experience [35] - The importance of regulatory compliance and collaboration with government bodies in advancing autonomous driving technology is emphasized [17][18]
25年8月8日理想VLA体验分享(包含体验过特斯拉北美FSD的群友)
理想TOP2· 2025-08-12 13:50
Core Insights - The article discusses the performance and user experience of the Li Auto's VLA (Vehicle Lane Assist) system compared to Tesla's FSD (Full Self-Driving) system, highlighting that while VLA shows promise, it still falls short of the seamless experience provided by FSD in certain scenarios [1][2][3]. Experience Evaluation - The experience is divided into three parts: driving in a controlled environment with no driver present, a one-hour public road test, and a two-hour self-selected route test [1]. - Feedback from users indicates that the VLA system provides a comfortable and efficient experience, particularly in controlled environments, but its performance in more complex road scenarios remains to be fully evaluated [2][3]. User Feedback - Users noted a significant difference in the braking experience of VLA, describing it as smooth and seamless compared to traditional driving, which enhances the perception of safety and comfort [3][4]. - The article emphasizes that the initial goal for autonomous driving systems should be to outperform 80% of average drivers before aiming for higher benchmarks [4][5]. Iteration Potential - The VLA system is believed to have substantial room for improvement compared to its predecessor, VLM, with potential advancements in four key areas: simulation data efficiency, maximizing existing hardware capabilities, enhancing model performance through reinforcement learning, and improving user voice control experiences [6][7]. - The article suggests that the shift to reinforcement learning for VLA allows for targeted optimizations in response to specific driving challenges, which was a limitation in previous models [8][9]. User Experience and Product Development - The importance of user experience is highlighted, with the assertion that in the AI era, product experience can be as crucial as technical capabilities [10]. - The voice control feature of VLA is seen as a significant enhancement, allowing for personalized driving experiences based on user preferences, which could improve overall satisfaction [10].
理想VLA的实质 | 强化学习占主导的下一个action token预测
自动驾驶之心· 2025-08-11 23:33
Core Insights - The article discusses the potential and understanding of AI, particularly focusing on the concept of "predicting the next token" and its implications for AI capabilities and consciousness [2][3][18]. Group 1: Understanding AI and Token Prediction - Different interpretations of "predicting the next token" reflect varying understandings of the potential and essence of LLM (Large Language Models) and AI [2]. - Those who view "predicting the next token" as more than just a statistical distribution are more likely to recognize the significant potential of LLMs and AI [2][18]. - The article argues that the contributions of companies like 理想 (Li Auto) in AI development are often underestimated due to a lack of deep understanding of AI's capabilities [2][19]. Group 2: Ilya's Contributions and Perspectives - Ilya, a prominent figure in AI, has been instrumental in several key advancements in the field, including deep learning and reinforcement learning [4][5][6]. - His views on "predicting the next token" challenge the notion that it cannot surpass human performance, suggesting that a sufficiently advanced neural network could extrapolate behaviors of hypothetical individuals with superior capabilities [8][9][18]. Group 3: Li Auto's VLA and AI Integration - 理想's VLA (Vehicle Learning Architecture) operates by continuously predicting the next action token based on sensor inputs, which is a more profound understanding of the physical world rather than mere statistical analysis [19][20]. - The reasoning process of 理想's VLA is likened to consciousness, differing from traditional chatbots, as it operates in real-time and ceases when the system is turned off [21][22]. - The article posits that the integration of AI software and hardware in 理想's approach is at a high level, which is often overlooked by those in the industry [29]. Group 4: Reinforcement Learning in AI Applications - The article asserts that assisted driving is more suitable for reinforcement learning compared to chatbots, as the reward functions in driving are clearer and more defined [24][26]. - The differences in the underlying capabilities required for AI software and hardware development are significant, with software allowing for rapid iteration and testing, unlike hardware [28].
理想VLA实质是强化学习占主导的持续预测下一个action token
理想TOP2· 2025-08-11 09:35
Core Viewpoints - The article presents four logical chains regarding the understanding of "predict the next token," which reflects different perceptions of the potential and essence of LLMs or AI [1] - Those who believe that predicting the next token is more than just probability distributions are more likely to recognize the significant potential of LLMs and AI [1] - A deeper consideration of AI and ideals can lead to an underestimation of the value of what ideals accomplish [1] - The ideal VLA essentially focuses on reinforcement learning dominating the continuous prediction of the next action token, similar to OpenAI's O1O3, with auxiliary driving being more suitable for reinforcement learning than chatbots [1] Summary by Sections Introduction - The article emphasizes the importance of Ilya's viewpoints, highlighting his significant contributions to the AI field over the past decade [2][3] - Ilya's background includes pivotal roles in major AI advancements, such as the development of AlexNet, AlphaGo, and TensorFlow [3] Q&A Insights - Ilya challenges the notion that next token prediction cannot surpass human performance, suggesting that a sufficiently advanced neural network could extrapolate behaviors of an idealized person [4][5] - He argues that predicting the next token well involves understanding the underlying reality that leads to the creation of that token, which goes beyond mere statistics [6][7] Ideal VLA and Reinforcement Learning - The ideal VLA operates by continuously predicting the next action token based on sensor information, indicating a real understanding of the physical world rather than just statistical probabilities [10] - Ilya posits that the reasoning process in the ideal VLA can be seen as a form of consciousness, differing from human consciousness in significant ways [11] Comparisons and Controversial Points - The article asserts that auxiliary driving is more suited for reinforcement learning compared to chatbots due to clearer reward functions [12][13] - It highlights the fundamental differences in the skills required for developing AI software versus hardware, emphasizing the unique challenges and innovations in AI software development [13]
理想VLA含金量分析与关键迭代方向预测
理想TOP2· 2025-08-09 06:18
Core Viewpoint - The article emphasizes the innovative capabilities of Li Auto's VLA (Vision Language Architecture) and its potential to significantly enhance autonomous driving technology through a combination of AI software and hardware integration, led by the company's founder, Li Xiang [2][3][4]. Group 1: Innovation and Technology - Li Auto's VLA represents a significant innovation at the MoE (Mixture of Experts) level, with a focus on original architecture and execution, drawing from contributions across the AI community [2]. - The integration of AI software with hardware has reached an industry-leading level, with a clear distinction between the rapid iteration capabilities of software and the slower evolution of hardware [3]. - The core of Li Auto's VLA is based on reinforcement learning, which allows for a more effective learning process compared to traditional imitation learning, enhancing the vehicle's decision-making capabilities [9][10]. Group 2: Leadership and Vision - Li Xiang plays a crucial role in the development of Li Auto's autonomous driving technology, similar to Elon Musk's influence at Tesla, ensuring the company remains adaptable to industry changes and resource allocation [4][5]. - The ability of Li Xiang to make key judgments regarding resource distribution and AI learning is vital for the company's long-term success and efficient resource utilization [4]. Group 3: Future Directions and Predictions - Key iterative directions for Li Auto's VLA include improving the speed, quality, and cost-effectiveness of simulation data, which is essential for reinforcement learning [8][12]. - The company aims to maximize the potential of existing vehicle hardware for autonomous driving while also exploring new chip technologies to enhance computational capabilities [13]. - Future advancements may involve online learning architectures that allow for real-time weight updates, significantly improving the model's adaptability and understanding of the physical world [13].
理想辅助驾驶事故率比人驾安全6-7倍左右
理想TOP2· 2025-08-04 13:12
Core Viewpoint - The article discusses the challenges and advancements in the development of Li Auto's VLA, focusing on the balance between efficiency, comfort, and safety in smart driving technology [1][2]. Group 1: Safety Metrics - The MPA (miles per accident) metric currently stands at approximately 3 million miles, with Li Auto aiming to enhance this to 10 times safer than human driving, targeting 6 million miles per accident under assisted driving conditions [1]. - The current accident rate for Li Auto's drivers is 1 accident per 60,000 miles, while under assisted driving, it is 1 accident per 350,000 to 400,000 miles [1]. Group 2: Comfort and Efficiency - The company emphasizes improving driving comfort alongside safety, noting significant enhancements in the comfort of the assisted driving experience compared to previous versions [2]. - Efficiency is considered after safety and comfort, with the company prioritizing safe and comfortable driving over immediate efficiency corrections, even if it means taking longer routes [2].
不用给理想入选ICCV高评价, 牛的是理想的工作, 不是ICCV
理想TOP2· 2025-06-29 15:06
Core Viewpoint - The article discusses the unique characteristics of the AI academic community compared to other disciplines, highlighting the rapid growth and the implications for the quality and significance of research papers submitted to top conferences [5][7][8]. Group 1: Characteristics of AI Academic Community - AI conferences are more important than journals due to the fast-paced development of AI, which makes the lengthy journal review process inadequate [5]. - The number of submissions and acceptances to top AI conferences has significantly increased over the past decade, with acceptance rates declining, indicating a surge in competition [5][7]. - The rapid increase in submissions has led to a shortage of qualified reviewers, resulting in a decline in the quality of accepted papers [8]. Group 2: Implications for Research Quality - The increase in accepted papers does not guarantee high-quality research, as many accepted papers may lack substantial contributions [8]. - The job market for AI researchers is becoming increasingly competitive, with the demand for high-quality publications rising faster than the availability of quality positions [8]. Group 3: Company-Specific Insights - Li Auto's recent achievement of having multiple papers accepted at ICCV is used as a promotional tool to showcase its advancements in assisted driving technology [9]. - The original innovation level of Li Auto's VLA is compared to DeepSeek's MoE level, indicating that few Chinese companies can achieve such a high level of innovation [11][12]. - Li Auto's approach to autonomous driving has evolved from following Tesla to developing its unique systems, particularly in the integration of fast and slow systems in its VLM [12][13].
汽车行业周报(20250616-20250622):6月下旬需求有望恢复,小米YU7月底发布-20250622
Huachuang Securities· 2025-06-22 08:34
Investment Rating - The report maintains a positive outlook on the automotive sector, suggesting stock selection to emphasize alpha over beta, with a focus on distinct individual stock characteristics [2]. Core Insights - The automotive sector experienced a slight decline in investment sentiment, with expectations for a rebound in demand towards the end of June due to increased marketing efforts. The industry is anticipated to enter a seasonal lull in July and August, followed by a surge in new product launches and seasonal sales towards the end of the year [2]. - The report highlights the importance of monitoring the impact of policies such as trade-in programs and changes in new energy vehicle purchase taxes on the industry [2]. Data Tracking - In April, wholesale passenger car sales reached 2.22 million units, a year-on-year increase of 11% but a month-on-month decrease of 10%. Retail sales for the same month were 1.59 million units, up 6% year-on-year but down 14% month-on-month [4]. - New energy vehicle deliveries from leading companies showed significant growth in May, with BYD delivering 380,000 units (up 15% year-on-year), and Li Auto and Xpeng also reporting substantial increases [4][19]. - The average discount rate in early June rose to 10.6%, reflecting a 0.4 percentage point increase from the previous period and a 2.9 percentage point increase year-on-year [4]. Market Performance - The automotive sector index fell by 2.57% this week, ranking 23rd out of 29 sectors. The overall market indices also showed declines, with the Shanghai Composite Index down 0.51% [7][28]. - The report notes that the automotive sector's price-to-earnings (PE) ratio stands at 31, indicating a relatively high valuation compared to historical averages [28][34].
理想的VLA可以类比DeepSeek的MoE
理想TOP2· 2025-06-08 04:24
Core Viewpoint - The article discusses the advancements and innovations in the VLA (Vision Language Architecture) and its comparison with DeepSeek's MoE (Mixture of Experts), highlighting the unique approaches and improvements in model architecture and training processes. Group 1: VLA and MoE Comparison - Both VLA and MoE have been previously proposed concepts but are now being fully realized in new domains with significant innovations and positive outcomes [2] - DeepSeek's MoE has improved upon traditional models by increasing the number of specialized experts and enhancing parameter utilization through Fine-Grained Expert Segmentation and Shared Expert Isolation [2] Group 2: Key Technical Challenges for VLA - The VLA needs to address six critical technical points, including the design and training processes, 3D spatial understanding, and real-time inference capabilities [4] - The design of the VLA base model requires a focus on sparsity to expand parameter capacity without significantly increasing inference load [6] Group 3: Model Training and Efficiency - The training process incorporates a significant amount of 3D data and driving-related information while reducing the proportion of historical data [7] - The model is designed to learn human thought processes, utilizing both fast and slow reasoning methods to balance parameter scale and real-time performance [8] Group 4: Diffusion and Trajectory Generation - Diffusion techniques are employed to decode action tokens into driving trajectories, enhancing the model's ability to predict complex traffic scenarios [9] - The use of an ODE sampler accelerates the diffusion generation process, allowing for stable trajectory generation in just 2-3 steps [11] Group 5: Reinforcement Learning and Model Training - The system aims to surpass human driving capabilities through reinforcement learning, addressing previous limitations related to training environments and information transfer [12] - The model has achieved end-to-end trainability, enhancing its ability to generate realistic 3D environments for training [12] Group 6: Positioning Against Competitors - The company is no longer seen as merely following Tesla in the autonomous driving space, especially since the introduction of V12, which marks a shift in its approach [13] - The VLM (Vision Language Model) consists of fast and slow systems, with the fast system being comparable to Tesla's capabilities, while the slow system represents a unique approach due to resource constraints [14] Group 7: Evolution of VLM to VLA - The development of VLM is viewed as a natural evolution towards VLA, indicating that the company is not just imitating competitors but innovating based on its own insights [15]