Workflow
VLA
icon
Search documents
华为靳玉志:我们不走VLA路线,WA才是自动驾驶终极方案
3 6 Ke· 2025-08-28 03:19
Core Insights - Huawei's automotive business has achieved significant milestones, including 1 million vehicles equipped with Huawei's QianKun intelligent driving system and over 1 million units of laser radar shipped as of July this year [1] - The company emphasizes a long-term strategic vision, having invested in the automotive sector since 2014, which has led to current profitability without setting explicit commercialization goals [1][4] - Huawei's CEO of the Intelligent Automotive Solutions BU, Jin Yuzhi, believes that focusing solely on commercialization can be counterproductive, advocating for a commitment to technology development and user needs [1] Automotive Business Performance - As of August, 28 models have been launched in collaboration with Huawei, including brands like Audi and Avita [1] - Cumulative mileage for assisted driving has reached 4 billion kilometers [1] - The company has adopted a full lifecycle management approach for its products, ensuring continuous upgrades and maintenance for users [5][16] Technology Strategy - Huawei prefers the World Action (WA) model over the Video Language Action (VLA) model for autonomous driving, believing WA is the ultimate solution for achieving true autonomous driving [3][10] - The WA model processes information directly through vision inputs, eliminating the need to convert data into language, which is seen as a shortcut [3][11] - Huawei has developed the WEWA model based on the WA architecture, which will be deployed in ADS 4.0 [4] Future Plans - Huawei aims to achieve Level 3 (L3) autonomous driving capabilities on highways and Level 4 (L4) pilot capabilities in urban areas by 2026, with plans for large-scale commercial use of L4 by 2028 [9] - The company is also working to transform smart cockpits into "digital nannies," integrating AI as an AI Agent [9] Pricing and Business Model - Jin Yuzhi asserts that there is no such thing as free services in the automotive industry, as costs are often transferred in different forms [4][15] - The pricing for assisted driving systems is justified due to the ongoing costs of iteration, maintenance, and over-the-air updates [5][15] - Users who initially purchase the first version of ADS benefit from continuous upgrades, making the long-term cost of ownership more favorable [16] Safety and Sensor Technology - Huawei's increase in sensor configurations, such as additional laser radars, is driven by a commitment to safety rather than merely increasing product pricing [17][18] - The company aims to enhance safety in various driving scenarios, including parking and urban driving, by improving system precision through advanced sensor technology [17][18]
具身智能之心B端和C端培训老师招募来啦~
具身智能之心· 2025-08-28 01:20
Group 1 - The article announces the recruitment of teachers for embodied intelligence training, targeting both B-end (business) and C-end (consumer) training services, with compensation above industry standards [1] - The training covers various advanced topics including VLA, VLN, remote operation, Diffusion Policy, reinforcement learning, sim2real, multimodal large models, simulation, motion control, and target navigation [2] - B-end training is aimed at enterprises, universities, and research institutions, while C-end training focuses on students and job seekers, with responsibilities including curriculum design and material preparation [3] Group 2 - Candidates are required to have a doctoral degree or higher (including those currently enrolled), with a preference for those who have published two papers in A-level or Q1 journals/conferences, or have two years of industry experience [3] - Interested individuals can add a specified WeChat contact for further inquiries [4]
华为高管:世界上根本没有免费的东西
半导体芯闻· 2025-08-27 10:40
Core Viewpoint - Huawei's automotive business is rapidly expanding, particularly in the field of assisted driving, with various collaboration models with car manufacturers being explored [2][3]. Group 1: Collaboration Models - Huawei's automotive business unit (BU) collaborates with car manufacturers through multiple models, including component supply, single intelligence (either smart cockpit or assisted driving), dual intelligence (both smart cockpit and assisted driving), and full-stack solutions [2]. - The collaboration process involves Huawei supporting car manufacturers throughout the entire lifecycle, from product definition and design to manufacturing and marketing [2][7]. Group 2: Technology Approach - Huawei's approach to assisted driving does not align with the Vision-Language-Action (VLA) model favored by some car manufacturers; instead, it emphasizes the World and Action (WA) model, which directly controls the vehicle through sensory inputs [3][9]. - The WA model is considered by Huawei to be the ultimate solution for achieving true autonomous driving, bypassing the language processing step [9]. Group 3: Commercialization and Market Strategy - Huawei does not have a specific short-term commercialization goal for its assisted driving technology, focusing instead on long-term user-centered strategies and sustainable investment [7]. - The company believes that the market for assisted driving features will evolve, and that pricing strategies should reflect the ongoing development and maintenance costs associated with these technologies [12]. Group 4: Industry Trends and Future Outlook - The number of players in the autonomous driving space is expected to decrease as the industry consolidates, with future success relying heavily on data-driven approaches [10]. - The differentiation in assisted driving technology is minimal, as the primary goal remains zero accidents and fatalities, with pricing determined by perceived value to consumers [11].
华为高管:世界上根本没有免费的东西
Di Yi Cai Jing Zi Xun· 2025-08-27 08:51
Core Insights - Huawei's automotive business is rapidly expanding its assisted driving solutions and collaborating with various car manufacturers, including Baojun, Leap Motor, and Hongqi, indicating a growing presence in the industry [2][3] - The company emphasizes a diverse cooperation model with car manufacturers, ranging from component supply to full-stack solutions, enhancing their capabilities from product definition to marketing [2][9] - Huawei's approach to assisted driving technology diverges from the prevalent Vision-Language-Action (VLA) model, focusing instead on a World and Action (WA) model that utilizes direct sensory inputs for vehicle control [3][10] Cooperation Models - Huawei's cooperation with car manufacturers includes multiple models: component supply, single intelligence (either smart cockpit or assisted driving), dual intelligence (both), and full-stack solutions [2][9] - The collaboration process is designed to deepen over time, with Huawei supporting car manufacturers throughout the entire product lifecycle, from design to marketing [2][9] Technology Perspective - Huawei does not endorse the VLA approach, believing it is not the ultimate solution for autonomous driving; instead, it prioritizes the WA model, which aims for direct control through sensory inputs [3][10] - The company acknowledges the rapid development of assisted driving technology and anticipates a consolidation of players in the market, driven by data, computing power, and algorithms [11] Commercial Strategy - Huawei does not have a specific short-term profitability target for its automotive business, focusing instead on long-term user-centered investments and sustainable growth [8] - The company argues that there is no such thing as a free service in the automotive sector, as costs are often hidden in vehicle pricing or future service fees [13]
人形机器人,缺一个杀手级共识
创业邦· 2025-08-26 03:37
Core Viewpoint - The article discusses the contrasting approaches of two leading companies in the humanoid robotics industry, Starry Era and Yuzhu Technology, highlighting their differing philosophies on how to enhance robot capabilities and their respective paths towards commercialization [8][10][49]. Group 1: Company Strategies - Starry Era focuses on a "soft and hard integration" approach, emphasizing the importance of combining hardware and software to create a cohesive system for humanoid robots [30][32]. - Yuzhu Technology adopts a "hardware-first" strategy, prioritizing the development of hardware capabilities before integrating software solutions [31][32]. - Both companies have distinct views on the viability of the VLA (Vision-Language-Action) paradigm, with Starry Era seeing it as a broad framework for integrating various modalities, while Yuzhu expresses skepticism about its practical application [12][16]. Group 2: Technical Development - Starry Era has developed an end-to-end VLA model, ERA-42, which integrates reinforcement learning and world models, showcasing their commitment to advancing robot intelligence [15][39]. - Yuzhu Technology is concentrating on building reusable data and model resources, focusing on the engineering aspects of distributed computing to enhance their robots' capabilities [22][27]. - Both companies recognize the necessity of a closed-loop system that combines perception, decision-making, and execution to achieve effective humanoid robot performance in complex environments [34][54]. Group 3: Market Positioning - Starry Era is currently deploying its robots in B-end industrial scenarios, achieving over 70% efficiency in real-world applications, with plans to reach around 90% efficiency next year [23][36]. - Yuzhu Technology is primarily focusing on entertainment and demonstration scenarios, acknowledging that their robots are not yet ready for complex tasks, thus adopting a strategy of gradual market entry [26][27]. - Both companies anticipate a significant shift in the humanoid robotics market, with predictions of a "ChatGPT moment" within the next few years, where robots will be capable of understanding and executing complex instructions in unfamiliar environments [50][56]. Group 4: Future Outlook - The industry is expected to see parallel advancements in various technical paths, including end-to-end VLA and world models, with leading companies validating commercial viability in specific industrial applications [56]. - In the mid-term, a unified technical standard may emerge, expanding applications from industrial to logistics, healthcare, and retail sectors [56]. - Long-term aspirations include humanoid robots becoming household companions, necessitating advancements in safety, reliability, and natural interaction [56].
VLA方向的论文还不知怎么下手?有的同学已经CCF-A了......
自动驾驶之心· 2025-08-22 12:00
Core Insights - The article discusses the advancements of the Li Auto VLA driver model, highlighting its improved capabilities in understanding semantics, reasoning, and trajectory planning, which are crucial for autonomous driving [1][3][5] Group 1: VLA Model Capabilities - The VLA model demonstrates enhanced semantic understanding through multimodal input, improved reasoning via thinking chains, and a closer approximation to human driving intuition through trajectory planning [1] - Four core abilities of the VLA model are showcased: spatial understanding, reasoning ability, communication and memory capability, and behavioral ability [1][3] Group 2: Research and Development Trends - The VLA model has evolved from VLM+E2E, integrating various cutting-edge technologies such as end-to-end learning, trajectory prediction, visual language models, and reinforcement learning [5] - While traditional perception and planning tasks are still being optimized in the industry, the academic community is increasingly shifting focus towards large models and VLA, indicating a wealth of subfields still open for exploration [5] Group 3: VLA Research Guidance Program - A VLA research paper guidance program has been initiated, receiving positive feedback, aimed at helping participants systematically grasp key theoretical knowledge and develop their own research ideas [6] - The program includes a structured curriculum over 14 weeks, covering topics from traditional end-to-end autonomous driving to writing methodologies for research papers [9][11][30] Group 4: Course Structure and Requirements - The course is designed for a maximum of 8 participants per session, targeting individuals with a background in VLA and autonomous driving at various academic levels [12][15] - Participants are expected to have a foundational understanding of deep learning, Python programming, and familiarity with PyTorch, with specific hardware requirements suggested for optimal performance [21][22] Group 5: Expected Outcomes - Participants will gain insights into classic and cutting-edge research papers, coding skills, and methodologies for writing and submitting research papers, culminating in the production of a draft paper [20][34] - The program aims to enhance participants' understanding of algorithms, their advantages and disadvantages, and to stimulate their research ideas through structured guidance [20][34]
传统的感知被嫌弃,VLA逐渐成为新秀...
自动驾驶之心· 2025-08-20 09:15
Core Viewpoint - The article discusses the advancements in the VLA (Vision-Language Action) driver model by Li Auto, highlighting its four core capabilities: spatial understanding, reasoning, communication and memory, and behavioral capabilities. It emphasizes the significance of VLA in the field of autonomous driving, indicating a shift in focus from traditional perception and planning tasks to large models and VLA technologies [2][4]. Summary by Sections VLA Model Capabilities - The VLA model integrates dynamic targets, static elements, navigation maps, and spatial understanding, showcasing a more human-like reasoning ability. This positions VLA as a leading focus in both academia and industry for autonomous driving [2]. Shift in Research Focus - Traditional perception and planning tasks are becoming less prominent in top conferences, with academia increasingly shifting towards large models and VLA. Despite this, the industry continues to optimize traditional methods, indicating ongoing opportunities in both areas [4]. Educational Program - An educational program is introduced to help students systematically grasp key theoretical knowledge in VLA, enhance practical coding skills, and develop their own research ideas. The program includes a structured 12-week online group research course followed by 2 weeks of paper guidance and a 10-week maintenance period [5][34]. Course Structure - The course spans 14 weeks, covering topics from introductory lessons to advanced VLA models and paper writing methodologies. Each week focuses on different aspects of VLA and autonomous driving, culminating in a final project report and submission guidance [8][10][35]. Target Audience - The program is designed for master's and doctoral students in VLA and autonomous driving, individuals seeking to enhance their resumes for further studies abroad, and professionals in the AI and autonomous driving sectors looking to deepen their algorithmic knowledge [14][24]. Course Requirements - Participants are expected to have a foundational understanding of deep learning, basic programming skills in Python, and familiarity with PyTorch. Access to high-performance computing resources is recommended for optimal learning [20][21]. Course Highlights - The program features a "2+1" teaching model with experienced instructors, ensuring comprehensive support throughout the learning process. It emphasizes academic integrity and provides a structured evaluation system to enhance the learning experience [22][23].
端到端VLA的起点:聊聊大语言模型和CLIP~
自动驾驶之心· 2025-08-19 07:20
Core Viewpoint - The article discusses the development and significance of end-to-end (E2E) algorithms in autonomous driving, emphasizing the integration of various advanced technologies such as large language models (LLMs), diffusion models, and reinforcement learning (RL) in enhancing the capabilities of autonomous systems [21][31]. Summary by Sections Section 1: Overview of End-to-End Autonomous Driving - The first chapter provides a comprehensive overview of the evolution of end-to-end algorithms, explaining the transition from modular approaches to end-to-end solutions, and discussing the advantages and challenges of different paradigms [40]. Section 2: Background Knowledge - The second chapter focuses on the technical stack associated with end-to-end systems, detailing the importance of LLMs, diffusion models, and reinforcement learning, which are crucial for understanding the future job market in this field [41][42]. Section 3: Two-Stage End-to-End Systems - The third chapter delves into two-stage end-to-end systems, exploring their emergence, advantages, and disadvantages, while also reviewing notable works in the field such as PLUTO and CarPlanner [42][43]. Section 4: One-Stage End-to-End and VLA - The fourth chapter highlights one-stage end-to-end systems, discussing various subfields including perception-based methods and the latest advancements in VLA (Vision-Language Alignment), which are pivotal for achieving the ultimate goals of autonomous driving [44][50]. Section 5: Practical Application and RLHF Fine-Tuning - The fifth chapter includes a major project focused on RLHF (Reinforcement Learning from Human Feedback) fine-tuning, providing practical insights into building pre-training and reinforcement learning modules, which are applicable to VLA-related algorithms [52]. Course Structure and Learning Outcomes - The course aims to equip participants with a solid understanding of end-to-end autonomous driving technologies, covering essential frameworks and methodologies, and preparing them for roles in the industry [56][57].
从方法范式和应用场景上看强化与VLA/Flow Matching/机器人控制算法
具身智能之心· 2025-08-19 01:54
Core Viewpoint - The article discusses recent advancements in reinforcement learning (RL) and its applications in robotics, particularly focusing on the VLA (Vision-Language Action) models and diffusion policies, highlighting their potential to handle complex tasks that traditional RL struggles with [2][4][35]. Method Paradigms - Traditional RL and imitation learning combined with Sim2Real techniques are foundational approaches in robotics [3]. - VLA models differ fundamentally from traditional RL by using training data distributions to describe task processes and goals, allowing for the execution of more complex tasks [4][35]. - Diffusion Policy is a novel approach that utilizes diffusion models to generate continuous action sequences, demonstrating superior capabilities in complex task execution compared to traditional RL methods [4][5]. Application Scenarios - The article categorizes applications into two main types: basic motion control for humanoid and quadruped robots, and complex/long-range operational tasks [22][23]. - Basic motion control primarily relies on RL and Sim2Real, with current implementations still facing challenges in achieving fluid motion akin to human or animal movements [22]. - For complex tasks, architectures typically involve a pre-trained Vision Transformer (ViT) encoder and a large language model (LLM), utilizing diffusion or flow matching for action output [23][25]. Challenges and Future Directions - The article identifies key challenges in the field, including the need for better simulation environments, effective domain randomization, and the integration of external goal conditions [35]. - It emphasizes the importance of human intention in task definition and the limitations of current models in learning complex tasks without extensive human demonstration data [35][40]. - Future advancements may involve multi-modal input predictions for task goals and the potential integration of brain-machine interfaces to enhance human-robot interaction [35].
自动驾驶秋招交流群成立了!
自动驾驶之心· 2025-08-18 23:32
Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, indicating a shift from numerous diverse approaches to a more unified model, which raises the technical barriers in the industry [1] Group 1 - The industry is witnessing a trend where previously many directions requiring algorithm engineers are now consolidating into unified models such as one model, VLM, and VLA [1] - The article encourages the establishment of a large community to support individuals in the industry, highlighting the limitations of individual efforts [1] - A new job and industry-related community is being launched to facilitate discussions on industry trends, company developments, product research, and job opportunities [1]