Workflow
VLA
icon
Search documents
理想汽车(LI):跟踪报告:3Q25 业绩承压,静待管理模式转型后的再次跃升
EBSCN· 2025-11-28 12:47
Investment Rating - The report maintains a "Buy" rating for the company, specifically an "Increase" rating, indicating a projected investment return exceeding the market benchmark by 5% to 15% over the next 6-12 months [4]. Core Views - The company's performance in Q3 2025 was under pressure, with total revenue declining by 36.2% year-on-year and 9.5% quarter-on-quarter to 27.36 billion yuan. The gross margin also decreased by 5.2 percentage points year-on-year to 16.3%. The Non-GAAP net loss attributable to shareholders was 360 million yuan, marking the first quarterly Non-GAAP loss in 2023 [1][2]. - The automotive business revenue fell by 37.4% year-on-year, with sales volume down by 39.0% year-on-year to 93,000 units. The average selling price (ASP) increased by 2.6% year-on-year to 278,000 yuan. The gross margin for the automotive business was 15.5% [2]. - Management indicated that the i6 battery supply will adopt a dual-supplier model starting in November, with production capacity expected to reach 20,000 units by early 2026. The company is also focusing on improving product capabilities and operational efficiency through internal adjustments [3]. Summary by Sections Q3 2025 Performance - Total revenue for Q3 2025 was 27.36 billion yuan, down 36.2% year-on-year and 9.5% quarter-on-quarter. Gross margin decreased to 16.3%, with a Non-GAAP net loss of 360 million yuan [1]. Automotive Business - Revenue from the automotive segment was 25.87 billion yuan, a decline of 37.4% year-on-year. Sales volume dropped to 93,000 units, with an ASP of 278,000 yuan. The gross margin for this segment was 15.5% [2]. Future Outlook - The company expects continued pressure on fundamentals in Q4 2025 and Q1 2026 due to policy fluctuations and intensified competition. However, management's shift back to a startup management model and advancements in self-developed technologies are anticipated to enhance product capabilities and operational efficiency [3][4].
关于端到端和VLA岗位,近期的一些态势变化
自动驾驶之心· 2025-11-28 00:49
Core Insights - The article discusses the challenges in recruiting talent in the autonomous driving sector, highlighting a shortage of experienced professionals in advanced roles [2] - It emphasizes the importance of education and training in cutting-edge technologies related to end-to-end and VLA (Vision-Language-Action) autonomous driving [2] Course Offerings - A course titled "End-to-End and VLA Autonomous Driving" is being offered, focusing on the latest technologies in the field, including BEV perception, VLM, diffusion models, and reinforcement learning [2][12] - The course is designed for individuals with a foundational knowledge of autonomous driving and related technologies, and it includes practical assignments to build VLA models and datasets [12][16] Instructor Profiles - The course features a team of instructors with strong academic backgrounds and practical experience in autonomous driving and large models, including researchers from top universities [8][11][14] - Instructors have published numerous papers in prestigious conferences and have experience in developing and implementing advanced algorithms in the industry [8][11][14] Target Audience - The course is aimed at individuals who have a basic understanding of autonomous driving modules and are familiar with concepts such as transformer models, reinforcement learning, and BEV perception [16] - Participants are required to have access to a GPU with recommended specifications of 4090 or higher [15][16]
具身智能之心技术交流群成立了!
具身智能之心· 2025-11-26 10:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence, covering areas such as VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, VLA+RL, sim2real, multimodal large models, simulation, motion control, target navigation, mapping and localization, and navigation [1] - Interested individuals can add the assistant's WeChat AIDriver005 to join the community [2] - To expedite the joining process, it is advised to include a note with the institution/school, name, and research direction [3]
VLA+RL方向的合伙人招募了~
具身智能之心· 2025-11-24 10:02
Group 1 - The article discusses the recruitment of instructors for courses and projects related to VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) within the community [1] - The community seeks candidates with a research focus on VLA and RL, preferably holding a PhD or currently enrolled in a doctoral program, and having experience in top conferences in the academic field [2] - For industry candidates, practical experience and hands-on debugging experience with real machines are desired [2] Group 2 - The company, "Embodied Intelligence," is identified as the first comprehensive technical exchange community in China, focusing on VLA and RL, and has gathered a large number of students in these fields [3] - The organization offers compensation above the industry average along with abundant industry resources for the recruited instructors [4] - For further details, interested individuals are encouraged to add a specified WeChat contact for inquiries [5]
认知驱动下的小米智驾,从端到端、世界模型再到VLA......
自动驾驶之心· 2025-11-24 00:03
Core Viewpoint - Xiaomi is making significant investments in intelligent driving technology, focusing on safety, comfort, and efficiency, with safety being the top priority in their development strategy [4][7]. Development Progress - Xiaomi's intelligent driving has progressed through several versions: from high-precision maps for highway NOA (version 24.3) to urban NOA (version 24.5), and moving towards light map and no map versions (version 24.10) [7]. - The company is advancing through three stages of intelligent driving: 1.0 (rule-driven), 2.0 (data-driven), and 3.0 (cognitive-driven), with a focus on VLA (Vision Language Architecture) for the next production phase [7][10]. World Model Features - The world model introduced by Xiaomi has three essential characteristics: diversity in generated scenarios, multimodal input and output, and interactive capabilities that influence vehicle behavior [8][9]. - The world model is designed to enhance model performance through cloud-based data generation, closed-loop simulation, and reinforcement learning, rather than direct action outputs from the vehicle [10]. VLA and Learning Models - VLA is described as an enhancement over end-to-end learning, integrating high-level human knowledge (traffic rules, values) into the driving model [13]. - Xiaomi's development roadmap includes various model training stages, from LLM pre-training to embodied pre-training, with recent advancements in MiMo and MiMo-vl models [13]. Community and Knowledge Sharing - The "Automated Driving Heart Knowledge Sphere" community aims to provide a comprehensive platform for learning and sharing knowledge in the field of autonomous driving, with over 4,000 members and plans to expand [15][26]. - The community offers resources such as technical routes, video tutorials, and Q&A sessions to assist both beginners and advanced learners in the autonomous driving sector [27][30].
VLA+RL方向的同学可以看过来了~
具身智能之心· 2025-11-21 00:04
Group 1 - The article discusses the recruitment of instructors for courses and projects related to VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) within the community [1] - The community seeks candidates with a research focus on VLA and RL, preferably holding a PhD or currently enrolled in a doctoral program, and having experience in top conferences in the academic field [2] - For industry candidates, practical experience and hands-on debugging experience with real machines are desired [2] Group 2 - The company, referred to as "Embodied Intelligence," is the first comprehensive technical exchange community in China, gathering a large number of individuals focused on VLA and RL [3] - The company offers compensation above the industry average along with abundant industry resources for the recruited instructors [4] - For more detailed information, interested individuals are encouraged to add a specified WeChat contact for consultation [5]
自动驾驶三大技术路线:端到端、VLA、世界模型
自动驾驶之心· 2025-11-21 00:04
Overview - The article discusses the ongoing technological competition in the autonomous driving industry, focusing on different approaches to solving corner cases and enhancing safety and efficiency in driving systems [1][3]. Technological Approaches - There is a debate between two main technological routes: single-vehicle intelligence (VLA) and intelligent networking (VLM) [1]. - Major companies like Waymo utilize VLM, which allows AI to handle environmental understanding and reasoning, while traditional modules maintain decision-making control for safety [1]. - Companies such as Tesla, Geely, and XPeng are exploring VLA, aiming for AI to learn all driving skills through extensive data training for end-to-end decision-making [1]. Sensor and Algorithm Developments - The article highlights the evolution of perception technologies, with BEV (Bird's Eye View) perception becoming mainstream by 2022, and OCC (Occupancy) perception gaining traction in 2023 [3][5]. - BEV integrates various sensor data into a unified spatial representation, facilitating better path planning and dynamic information fusion [8][14]. - OCC perception provides detailed occupancy data, clarifying the probability of space being occupied over time, which enhances dynamic interaction modeling [6][14]. Modular and End-to-End Systems - Prior to the advent of multimodal large models and end-to-end autonomous driving technologies, perception and prediction tasks were typically handled by separate modules [5]. - The article outlines a phased approach to modularization, where perception, prediction, decision-making, and control are distinct yet interconnected [4][31]. - End-to-end systems aim to streamline the process by allowing direct mapping from raw sensor inputs to actionable outputs, enhancing efficiency and reducing bottlenecks [20][25]. VLA and VLM Frameworks - VLA (Visual-Language-Action) and VLM (Visual-Language Model) frameworks are discussed, with VLA focusing on understanding complex scenes and making autonomous decisions based on visual and language inputs [32][39]. - The article emphasizes the importance of language models in enhancing the interpretability and safety of autonomous driving systems, allowing for better cross-scenario knowledge transfer and decision-making [57]. Future Directions - The competition between VLA and WA (World Action) architectures is highlighted, with WA emphasizing direct visual-to-action mapping without language mediation [55][56]. - The article suggests that the future of autonomous driving will involve integrating world models that understand physical laws and temporal dynamics, addressing the limitations of current language models [34][54].
基于准确的原始材料对比小鹏理想VLA
理想TOP2· 2025-11-20 10:42
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the VLA (Vision-Language-Action) architecture developed by Li Auto and the insights shared by Xiaopeng's autonomous driving head, Liu Xianming, during a podcast. Liu emphasizes the removal of the intermediate language component (L) to enhance scalability and efficiency in data usage [1][4][5]. Summary by Sections VLA Architecture and Training Process - The VLA architecture involves a pre-training phase using a 32 billion parameter (32B) vision-language model that incorporates 3D vision and high-definition 2D vision, improving clarity by 3-5 times compared to open-source models. It also includes driving-related language data and key VL joint data [10][11]. - The model is distilled into a 3.2 billion parameter (3.2B) MoE model to ensure fast inference on vehicle hardware, followed by a post-training phase that integrates action to form the VLA, increasing the parameter count to nearly 4 billion [13][12]. - The reinforcement learning phase consists of two parts: human feedback reinforcement learning (RLHF) and pure reinforcement learning using world model-generated data, focusing on comfort, collision avoidance, and adherence to traffic regulations [15][16]. Data Utilization and Efficiency - Liu argues that using language as a supervisory signal can introduce human biases, reducing data efficiency and scalability. The most challenging data to collect are corner cases, which are crucial for training [4][6]. - The architecture aims to achieve a high level of generalization, with plans to implement L4 robotaxi services in Guangzhou based on the current framework [4][5]. Future Directions and Challenges - Liu acknowledges the uncertainties in scaling the technology and ensuring safety, questioning how to maintain safety standards and align the model with human behavior [5][18]. - The conversation highlights that the VLA, VLM, and world model are fundamentally end-to-end architectures, with various companies working on similar concepts in the realm of Physical AI [5][18]. Human-Agent Interaction - The driver agent is designed to process short commands directly, while complex instructions are sent to the cloud for processing before execution. This approach allows the system to understand and interact with the physical world like a human driver [17][18]. - The article concludes that the traffic domain is a suitable environment for VLA implementation due to its defined rules and the ability to model human driving behavior effectively [19][20].
从纯小白到具身算法工程师的打怪之路
具身智能之心· 2025-11-20 04:02
Core Insights - The article discusses the evolution and research directions in Visual Language Action (VLA), Visual Language Navigation (VLN), and reinforcement learning in robotics, highlighting the importance of these technologies in enhancing robot capabilities and performance [1][2][5][9]. VLA Direction - VLA systems consist of visual perception processing, language instruction understanding, and action strategy networks, categorized into three paradigms: explicit end-to-end VLA, implicit end-to-end VLA, and hierarchical end-to-end VLA [1][2]. - Explicit end-to-end VLA compresses visual and language information into a joint representation, which is then mapped to action space, leveraging various architectures and models to achieve good performance [1]. - Implicit end-to-end VLA focuses on interpretability by predicting future states using video diffusion models, enhancing the potential for scaling VLA models [2]. - Hierarchical end-to-end VLA aims to utilize the characteristics of large models to improve generalization while maintaining efficiency for downstream execution [2]. VLN Direction - VLN systems are composed of visual language encoders, environmental history representation, and action strategies, requiring effective information compression from visual and language inputs [5][6]. - The choice of encoder and whether to project visual and language representations into a common space are critical issues, with current trends favoring pre-trained models on large datasets and the use of large language models (LLM) for instruction decomposition [6]. - VLN robots operate in a sequential decision-making task, accumulating historical information to inform future actions, with implicit methods representing past information as latent variables [6]. - Object Navigation within VLN emphasizes identifying target objects based on category information, reducing the need for detailed instructions and enhancing exploration capabilities [7]. Reinforcement Learning & Legged Robots - Reinforcement learning is crucial for legged robots, covering various aspects such as kinematics, dynamics, multi-modal sensor fusion, and advanced algorithms for task adaptation [9][10]. - Key areas include gait planning, balance control for bipedal robots, and the application of deep reinforcement learning and imitation learning for multi-task training [10]. - Techniques like domain randomization and safety mechanisms are essential for ensuring successful real-world deployment of robotic systems [10]. Diffusion Policy - The introduction of diffusion models in robotics has led to significant advancements, with the Diffusion Policy achieving an average performance improvement of 46.9% in various simulation environments [21][22]. - The Robotic Diffusion Transformer (RDT), with 1.2 billion parameters, showcases strong zero-shot generalization capabilities and the ability to learn new skills with minimal examples [22]. - The application of diffusion strategies is expanding beyond robotic manipulation to areas like autonomous navigation and dexterous grasping, enhancing task success rates through real-time environmental adaptation [22][23]. - Recent developments in diffusion strategies include advancements in 3D applications and the integration of safety and online reinforcement learning, opening new research avenues [23].
从技术路线到人员更迭,为什么智能驾驶又开始了“新造词”?
3 6 Ke· 2025-11-19 12:19
Core Insights - The automotive and intelligent driving industry is experiencing rapid technological iterations, leading to new terminologies and concepts that challenge user understanding and acceptance [1] - The transition from rule-based systems to end-to-end and world model architectures is reshaping the landscape of autonomous driving, with significant implications for company strategies and personnel [2][4][10] Industry Trends - The shift towards end-to-end systems, exemplified by Tesla's FSD V12, has prompted other companies like Huawei, Xpeng, and NIO to explore similar approaches, indicating a trend towards more integrated solutions [2][4] - The industry recognizes the upcoming critical period for the implementation of advanced driver assistance technologies, particularly from Q4 2023 to mid-2024, as companies race to adopt and refine these technologies [1] Technical Developments - Current autonomous driving systems, whether rule-based or end-to-end, primarily rely on mimicking human driving through extensive data collection and learning, which presents challenges in efficiency and adaptability [4][5] - The introduction of VLA (vision-language-action) models aims to enhance understanding of the physical world, moving beyond mere imitation to a more human-like comprehension of driving scenarios [7][11] Company Strategies - Companies like Xpeng and Li Auto are pivoting towards VLA models, with Xpeng's second-generation VLA eliminating the language translation step to improve efficiency and data utilization [8][11] - The restructuring of R&D departments within companies such as Li Auto and NIO reflects a strategic shift towards prioritizing VLA and world model approaches, indicating a broader industry trend towards adapting organizational structures to new technological demands [15][17] Competitive Landscape - The competition between self-developed autonomous driving technologies and third-party solutions is intensifying, with companies increasingly opting for partnerships with specialized suppliers to enhance their capabilities [18][21] - The financial burden of self-development is prompting companies to reconsider their strategies, as seen in Xpeng's significant investment in computing resources and the need for profitability in Q4 2023 [19][22]