VLM - filings, earnings calls, financial reports, news

VLM

Search documents

自动驾驶之心· 2025-12-11 00:05

Core Insights - The article discusses the challenges and advancements in Vision-Language Models (VLM) for autonomous driving, highlighting issues such as hallucination, 3D spatial understanding, and processing speed [3]. Group 1: Challenges in VLM - Hallucination issues manifest as generating non-existent information and failing to perceive relevant data, which can be mitigated through dynamic perception techniques [3]. - Insufficient 3D spatial understanding is attributed to pre-training tasks being predominantly 2D, suggesting the incorporation of spatial localization tasks during training [3]. - Processing speed is a concern, with potential solutions including KV Cache, visual token compression, and mixed data training to enhance model efficiency [3]. Group 2: Learning Paradigms and Model Improvements - The learning paradigm should shift from imitation learning (SFT) to preference learning (DPO, GRPO), with simultaneous multi-task training yielding better results than sequential single-task training [3]. - To prevent catastrophic forgetting in foundation models, adding pre-training data is a simple and effective method [3]. - Enhanced supervisory signals can lead to better model representations, achieved by adding auxiliary task heads to the VLM model [3]. Group 3: Interaction and Evaluation - Current VLMs exhibit insufficient interaction between vision and language, limiting their effectiveness as base models; improving this interaction is crucial [3]. - The output method for trajectories is flexible, with various approaches yielding satisfactory results, though diffusion heads are preferred in industry for speed [3]. - Evaluation remains challenging due to inconsistencies between training and testing conditions, necessitating better alignment of objectives and data distributions [3].

Autonomous Driving

Vision-Language Model (VLM)

Chain of Thought (CoT)

Vision-Language Model (VLM)

Chain of Thought (CoT)

Preference Learning

Autonomous Driving

VLM

一场关于自动驾驶VLA和世界模型的深度讨论！下周一不见不散~

自动驾驶之心· 2025-11-11 00:00

Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the development of the Visual-Language-Action (VLA) framework and world models, highlighting the contributions of various experts in the field [1][2][3][4][5]. Group 1: Key Contributors - Jian Kun, a senior director at Li Auto, has built the autonomous driving technology stack from scratch since 2021, achieving milestones such as Highway NoA in 2022 and City NoA in 2023 [1]. - Xu Lingyun, a PhD from the Chinese Academy of Sciences, leads the parking team at Changan Automobile, focusing on autonomous driving perception and end-to-end system research [2]. - Jiang Anqing, a senior algorithm scientist at Bosch, leads research on VLA and closed-loop algorithms [3]. Group 2: Technological Focus - The discussion includes the potential integration of world models and VLA, questioning whether a unified approach is feasible [8]. - The high demand for data and computing power is making it increasingly difficult for academia to participate in intelligent driving, raising questions about future opportunities in the academic sector [8]. Group 3: Event Highlights - A live discussion on the future of autonomous driving technologies, including insights on Tesla's FSD v14 and its implications for domestic technology [4][5]. - The event featured a deep dive into the reliability of VLM in autonomous driving, with expert opinions on data closed-loop engineering [12].

理想TOP2· 2025-10-18 08:44

Core Insights - The article discusses the differences between VLM (Visual Language Model) and VLA (Visual Language Action) in the context of autonomous driving, particularly focusing on scenarios like blind spot deceleration [1][2]. Group 1: VLM and VLA Differences - VLM operates by perceiving scenarios such as uncontrolled intersections and outputs a deceleration request to the E2E (End-to-End) model, which then reduces speed to 8-12 km/h, creating a sense of disconnection in the response [2]. - VLA, on the other hand, utilizes a self-developed base model to understand the scene directly, allowing for a more nuanced approach to blind spot deceleration, resulting in a smoother and more contextually appropriate response based on various road conditions [2]. Group 2: Action Mechanism - The action generated by VLA is described as a more native deceleration action rather than a dual-system command, indicating a more integrated approach to scene understanding and response [3]. - There are concerns raised in the comments regarding VLM's reliability as an external module, questioning its ability to accurately interpret 3D space and the stability of its triggering mechanisms [3].

【汽车智能化10月投资策略】先发优势稳固，后发发力追赶，继续看好智能化主线！

东吴汽车黄细里团队· 2025-10-17 09:20

Core Viewpoint - The market is expected to refocus on investment opportunities in smart technology in Q4, driven by the ongoing AI trend and advancements in autonomous driving capabilities, particularly in Robotaxi applications [2][8]. Group 1: Q4 Smart Technology Outlook - The Q4 market will see a renewed emphasis on smart technology investment opportunities, as AI applications in the physical world are anticipated to exceed expectations in the next 3-5 years [2][8]. - Key catalysts for smart technology in Q4 include the release of Tesla's V14 version, Xiaopeng's upcoming technology day, and the introduction of new autonomous vehicles by various companies [2][8]. Group 2: Comparison with Last Year - Similarities with last year's Q4 include the expansion of AI applications, but this year emphasizes the evolution of AI logic rather than the resonance between automotive and AI logic [3][9]. - The focus has shifted from hardware opportunities and consumer sales to software opportunities and breakthroughs in B2B applications [3][9]. Group 3: Investment Strategy - The preferred investment strategy favors Hong Kong stocks over A-shares, prioritizing software over hardware and B2B over B2C applications, with recommended stocks including Xiaopeng Motors, Horizon Robotics, and Cao Cao Mobility [4][9]. - Key investment targets include integrated models for Robotaxi, technology providers, and the transformation of ride-hailing services [4][9]. Group 4: Smart Technology Market Dynamics - The price war among passenger car manufacturers is more intense than expected, which could significantly impact profitability across the supply chain [5]. - The recovery of terminal demand is below expectations, which may affect sales growth for car manufacturers [5]. Group 5: Smart Technology Development Review - In August, the penetration rate of smart technology reached 23.3%, with significant advancements in autonomous driving capabilities among leading players [10]. - By October, the focus will be on the iterative development of next-generation driving architectures and the sales performance of key smart vehicles [10]. Group 6: Consumer Willingness to Pay - The consumer willingness to pay for smart technology is expected to evolve in two phases, with the first phase focusing on helping car manufacturers sell vehicles and the second phase aiming for software monetization [20][18]. Group 7: Future Projections - By 2025-2027, the core task of automotive smart technology will be to achieve a penetration rate of 50%-80% for new energy vehicles, while the period from 2028-2030 is expected to see the large-scale commercialization of Robotaxi services [20][19]. Group 8: Smart Technology Supply Chain Tracking - The supply chain for smart technology is being closely monitored, with various companies contributing to different aspects of the technology, including perception, decision-making, and execution [14][13]. Group 9: Key Metrics and Trends - The penetration rates for smart driving capabilities among different brands show significant variation, with Xiaopeng at 76.1% and Wey at 95.6% [25][26]. - The overall market dynamics indicate a competitive landscape with rapid advancements in technology and varying consumer adoption rates [24][23].

Waymo自动驾驶最新探索：世界模型、长尾问题、最重要的东西

自动驾驶之心· 2025-10-10 23:32

Core Insights - Waymo has developed a large-scale AI model called the Waymo Foundation Model, which supports vehicle perception, behavior prediction, scene simulation, and driving decision-making [5][11] - The model integrates data from multiple sensors to understand the environment, similar to how large language models operate [5][11] - The focus on data quality and selection is crucial for ensuring that the model addresses the right problems effectively [25][30] Group 1: World Model Development - Waymo's world model encodes all sensor data and incorporates world knowledge, enabling it to decode driving-related tasks [11] - The model allows for real-time perception and decision-making on the vehicle while simulating real driving environments in the cloud for testing [7][11] - The long-tail problem in autonomous driving, which includes complex scenarios like adverse weather and construction, remains a significant challenge [11][12] Group 2: Addressing Long-Tail Problems - Weather conditions such as rain and snow present unique challenges for autonomous driving, requiring high precision in judgment [12][14] - Low visibility scenarios necessitate the use of multi-modal sensors to detect objects effectively [15] - Occlusion reasoning is critical for understanding hidden objects and ensuring driving safety [18][21] Group 3: Complex Scene Understanding - Understanding complex scenes like construction zones and dynamic environments requires advanced reasoning capabilities [24] - Real-time responses to dynamic signals, such as traffic officer gestures, are essential for safe navigation [24] - The use of large language models is being explored to enhance scene understanding and decision-making [24] Group 4: Importance of Data, Algorithms, and Computing Power - The three critical components for successful autonomous driving are data, algorithms, and computing power, with a strong emphasis on data quality [25][30] - Efficient data mining from vast video datasets is vital for understanding driving events [30] - Quick decision-making is essential for safety and smooth operation, with a focus on reducing response times across the algorithmic chain [30][31] Group 5: Operational Infrastructure - Waymo's operational facilities, including depots and modification workshops, are crucial for the efficient deployment of Level 4 autonomous vehicles [33] - Vehicles can autonomously navigate to charging stations and begin operations after sensor installation [33] - The engineering challenges of scaling autonomous driving technology require collaboration with traditional automotive engineers [34] Group 6: Sensor and Algorithm Response - The responsiveness of sensors, such as camera frame rates, is critical for effective autonomous driving [36] - Algorithms must process data at high frequencies to ensure timely execution of driving commands [36] - The evolution of vehicle control systems is moving towards higher frequency responses, particularly in electric and electronically controlled systems [36]

李想目前对AI兴趣远大于汽车硬件维度产品细节打磨

理想TOP2· 2025-09-01 07:50

Core Viewpoints - Li Xiang's personal interest in AI currently outweighs the focus on the incremental details of automotive hardware products [1][4] - Discussing the short-term market, Li Xiang's preference for AI over hardware may pose a potential risk to short-term sales, as many consumers prefer hardware-defined products [1] - The foundational anchor for both short-term and long-term commercial value is the product's utility, supported by varying levels of emotional value; in the AI era, models are products [1] - Within a three-month timeframe, AI-related product utility is unlikely to reach early mainstream adoption, remaining in the early adopter phase, with low emotional value among the general public [1] Detailed Analysis - The head of the first product line, Lao Tang, actively shares the product development process online, while the heads of the second and third product lines, Zhang Xiao and Li Xinyang, are less inclined to do so [3] - The MEGA Home was developed based on user feedback regarding accessibility for the elderly, with differing opinions between Li Xiang and Lao Tang on design solutions [3] - Li Xiang has been the primary decision-maker for many product details in the Li ONE, while there is speculation that the i8 may shift to a configuration with fewer options, likely influenced by Li Xiang [3] - There is no evidence from public information that Li Xiang has strongly insisted on hardware dimension enhancements for the new product lines [3] - Li Xiang's strong insistence on running VLA on dual Orin chips led to significant technical challenges being overcome, showcasing his first-principles thinking [5] - All vehicles equipped with the Thor chip are expected to be able to switch to Li Auto's own autonomous driving chip in the future, although it is uncertain if the Orin chip will also be replaceable [5]

何小鹏回应：与特斯拉市值差50倍合理吗？劝雷军造车是“害”他吗？

3 6 Ke· 2025-08-28 09:43

Core Insights - He Xiaopeng emphasized that urging Lei Jun to enter the automotive industry was not intended to harm him, highlighting the challenges of the sector [1] - The company claims to be the only one in China to have truly developed a Vehicle Level Automation (VLA) system, showcasing confidence in its technological advancements [1][18] - He Xiaopeng expressed optimism about the company's valuation changing significantly within six months as Robotaxi services are expected to launch [1][26] Product and Market Strategy - The new P7 model sold 10,000 units in just 7 minutes, indicating strong market demand [3] - The P7 is considered a crucial product for the company, with a focus on simplicity and cutting-edge technology [3][4] - The company aims to achieve top-three production capacity in its class for the P7, leveraging modular manufacturing improvements [4] Financial Performance and Cost Structure - He Xiaopeng discussed the challenges of profitability in the automotive sector, noting that traditional cost structures do not apply to electric vehicles, where battery costs account for 40%-50% of total expenses [7] - The company anticipates recovering previous losses within one to two years, thanks to its unique integration of software and hardware in smart vehicles [10] - The company plans to invest approximately 5 billion yuan annually in VLA development, emphasizing the need for substantial investment to achieve meaningful advancements [16] Technological Development - The company is transitioning from single-unit intelligence to group intelligence in its AI systems, with a long-term vision extending to 2027-2028 [10] - He Xiaopeng highlighted the importance of VLA and VLM (Vehicle Level Model) integration, suggesting that the latter will enhance task execution capabilities [20][22] - The company is utilizing both NVIDIA and domestic GPUs, with over 30,000 units deployed, to enhance AI capabilities [24] Competitive Landscape - He Xiaopeng acknowledged the significant valuation gap between the company and Tesla, attributing it to broader market trends and the company's potential yet to be fully realized [26] - The company believes that the automotive industry will see a differentiation in capabilities among competitors in the near future, driven by substantial investments in AI [22] Industry Insights - He Xiaopeng shared insights on the challenges of the automotive industry, likening it to a marathon where resilience is crucial for success [30][32] - The company recognizes the complexities of merging hardware and software effectively, which poses a greater challenge compared to purely digital companies [27]

理想TOP2· 2025-07-21 14:36

Core Viewpoints - The current development of cutting-edge technologies in autonomous driving is not yet fully mature for mass production, with significant challenges remaining to be addressed [1][27][31] - Emerging technologies such as VLA/VLM, diffusion models, closed-loop simulation, and reinforcement learning are seen as potential key directions for future exploration in autonomous driving [6][7][28] - The choice between deepening expertise in autonomous driving or transitioning to embodied intelligence depends on individual circumstances and market dynamics [19][34] Group 1: Current Technology Maturity - The BEV (Bird's Eye View) perception model has reached a level of maturity suitable for mass production, while other models like E2E (End-to-End) are still in the experimental phase [16][31] - There is a consensus that the existing models struggle with corner cases, particularly in complex driving scenarios, indicating that while basic functionalities are in place, advanced capabilities are still lacking [16][24][31] - The industry is witnessing a shift towards utilizing larger models and advanced techniques to enhance scene understanding and decision-making processes in autonomous vehicles [26][28] Group 2: Emerging Technologies - VLA/VLM is viewed as a promising direction for the next generation of autonomous driving, with the potential to improve reasoning capabilities and safety [2][28] - The application of reinforcement learning is recognized as having significant potential, particularly when combined with effective simulation environments [6][32] - Diffusion models are being explored for their ability to generate multi-modal trajectories, which could be beneficial in uncertain driving conditions [7][26] Group 3: Future Directions - Future advancements in autonomous driving technology are expected to focus on enhancing safety, improving passenger experience, and achieving comprehensive scene coverage [20][28] - The integration of closed-loop simulations and data-driven approaches is essential for refining autonomous driving systems and ensuring their reliability [20][30] - The industry is moving towards a data-driven model where the efficiency of data collection, cleaning, labeling, training, and validation will determine competitive advantage [20][22] Group 4: Career Choices - The decision to specialize in autonomous driving or shift to embodied intelligence should consider personal interests, market trends, and the maturity of each field [19][34] - The autonomous driving sector is perceived as having more immediate opportunities for impactful work compared to the still-developing field of embodied intelligence [19][34]

师兄自己发了篇自动驾大模型，申博去TOP2了。。。

自动驾驶之心· 2025-07-09 12:56

Core Viewpoint - The article discusses the advancements in large models (LLMs) for autonomous driving, highlighting the need for optimization in efficiency, knowledge expansion, and reasoning capabilities as the technology matures [2][3]. Group 1: Development of Large Models - Companies like Li Auto and Huawei are implementing their own VLA and VLM solutions, indicating a trend towards the practical application of large models in autonomous driving [2]. - The focus for the next generation of large models includes lightweight design, hardware adaptation, knowledge distillation, quantization acceleration, and efficient fine-tuning [2][3]. Group 2: Course Introduction - A course is being offered to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [3]. - The course aims to address core challenges in model optimization, including pruning, quantization, retrieval-augmented generation (RAG), and advanced reasoning paradigms like Chain-of-Thought (CoT) and reinforcement learning [3][4]. Group 3: Enrollment and Requirements - The course will accept a maximum of 8 students per session, targeting individuals with a background in deep learning or machine learning who are familiar with Python and PyTorch [5][10]. - Participants will gain a systematic understanding of large model optimization, practical coding skills, and insights into academic writing and publication processes [8][10]. Group 4: Course Outcomes - Students will learn to combine theoretical knowledge with practical coding, develop their own research ideas, and produce a draft of a research paper [8][9]. - The course includes a structured timeline with specific topics each week, covering model pruning, quantization, efficient fine-tuning, and advanced reasoning techniques [20].

大模型在自动驾驶后期的落地与研究方向有哪些？

自动驾驶之心· 2025-07-07 23:31

Core Insights - The article discusses the evolving landscape of large models in autonomous driving, highlighting the focus on lightweight solutions, hardware compatibility, knowledge distillation, and efficient fine-tuning of large models [1] - It emphasizes the importance of advanced reasoning paradigms such as Chain-of-Thought (CoT) and VLA combined with reinforcement learning in enhancing spatial perception capabilities [1] Group 1: Course Overview - The course aims to explore cutting-edge optimization methods for large models, focusing on parameter-efficient computation, dynamic knowledge expansion, and complex reasoning [2] - Key challenges in model optimization include parameter compression through pruning and quantization, dynamic knowledge injection techniques, and advanced reasoning paradigms [2][3] Group 2: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and machine learning [4][8] - Participants are expected to have basic programming skills in Python and familiarity with PyTorch, along with a genuine interest in research [8] Group 3: Course Outcomes - The course aims to provide a systematic understanding of large model optimization, helping participants develop their own research ideas and enhance their coding skills [6][7] - Participants will receive guidance on writing and submitting academic papers, including methodologies for drafting and revising manuscripts [6][7] Group 4: Course Structure - The course spans 12 weeks of online group research followed by 2 weeks of paper guidance, covering topics such as model pruning, quantization, and dynamic knowledge expansion [7][18] - Each week focuses on specific themes, including advanced reasoning techniques and collaborative multi-agent systems [18][20] Group 5: Additional Information - The course will utilize publicly available datasets and baseline codes tailored to specific applications, ensuring practical relevance [15][16] - Participants will engage in discussions and hands-on experiments using mainstream large models like LLaMA and GPT [2][18]