Workflow
端到端模型
icon
Search documents
理想VLM/VLA盲区减速差异
理想TOP2· 2025-10-18 08:44
Core Insights - The article discusses the differences between VLM (Visual Language Model) and VLA (Visual Language Action) in the context of autonomous driving, particularly focusing on scenarios like blind spot deceleration [1][2]. Group 1: VLM and VLA Differences - VLM operates by perceiving scenarios such as uncontrolled intersections and outputs a deceleration request to the E2E (End-to-End) model, which then reduces speed to 8-12 km/h, creating a sense of disconnection in the response [2]. - VLA, on the other hand, utilizes a self-developed base model to understand the scene directly, allowing for a more nuanced approach to blind spot deceleration, resulting in a smoother and more contextually appropriate response based on various road conditions [2]. Group 2: Action Mechanism - The action generated by VLA is described as a more native deceleration action rather than a dual-system command, indicating a more integrated approach to scene understanding and response [3]. - There are concerns raised in the comments regarding VLM's reliability as an external module, questioning its ability to accurately interpret 3D space and the stability of its triggering mechanisms [3].
FSD用多了会变傻:逆行闯红灯幻觉严重,50多起事故后,特斯拉被调查了
3 6 Ke· 2025-10-10 07:57
Core Viewpoint - The article discusses a new investigation by the National Highway Traffic Safety Administration (NHTSA) into Tesla's Full Self-Driving (FSD) system, raising concerns about its potential to cause traffic safety violations and accidents, particularly the risk of "diminished intelligence" with prolonged use of the system [1][6]. Investigation Details - NHTSA has opened an investigation into FSD, prompted by user complaints and media reports, focusing on traffic safety violations while FSD is engaged [2]. - The investigation covers approximately 2,882,566 Tesla vehicles equipped with FSD, which could lead to a recall if issues are confirmed [2][10]. Types of Violations - The investigation highlights two main types of violations: 1. Ignoring red lights, with 18 complaints confirmed, including 4 incidents resulting in injuries [2][3]. 2. Incorrect lane usage, such as entering oncoming traffic lanes or ignoring road signs, with another 18 complaints reported [3][10]. Incident Reports - A total of 58 reports of FSD-related traffic safety violations have been documented, resulting in 23 injuries [3][8]. - Notably, a testing agency found that while FSD performed well in initial tests, issues arose after extended use, leading to dangerous situations [6][11]. System Evaluation - NHTSA's review will assess the FSD system's ability to warn users of upcoming actions, response times, and its recognition of traffic signals and lane markings [10]. - The investigation will also evaluate whether over-the-air updates affect FSD's compliance with traffic laws [10]. Historical Context - This investigation is part of a series of ongoing inquiries into Tesla's FSD, with previous investigations addressing various incidents and compliance issues [13]. - The typical duration for such investigations is at least 18 months, indicating a slow regulatory response to rapidly evolving AI technology [15]. Future Implications - The outcome of the investigation may not significantly impact Tesla, as the company has historically navigated regulatory challenges effectively [11][15]. - The evolving nature of AI technology poses challenges for traditional regulatory frameworks, which may struggle to keep pace with advancements in systems like FSD [15].
自动驾驶Ask Me Anything问答整理!VLA和WA的路线之争?
自动驾驶之心· 2025-10-08 23:33
Core Insights - The article discusses the current state and future prospects of autonomous driving technology, emphasizing the importance of AI and various modeling approaches in achieving higher levels of automation [4][6][9]. Group 1: Industry Development - The autonomous driving industry is rapidly evolving, with significant advancements expected in the next few years, particularly in AI and related fields [4]. - Companies like Waymo and Tesla are leading the way in achieving Level 4 (L4) automation, while Level 5 (L5) may take at least five more years to realize [4][6]. - The integration of Vision-Language Models (VLA) is seen as a key to enhancing decision-making capabilities in autonomous vehicles, addressing long-tail problems that pure end-to-end models may struggle with [6][9]. Group 2: Technical Approaches - The article outlines different modeling approaches in autonomous driving, including end-to-end models and the emerging VLA paradigm, which combines language processing with visual data to improve reasoning and decision-making [5][9]. - The effectiveness of current autonomous driving systems is still limited, with many challenges remaining in achieving full compliance with traffic regulations and safety standards [10][14]. - The discussion highlights the importance of data and cloud computing capabilities in narrowing the performance gap between domestic companies and leaders like Tesla [14][15]. Group 3: Talent and Education - There is a recognized talent gap in the autonomous driving sector, with a strong recommendation for students to pursue AI and computer science to prepare for future opportunities in the industry [4][6]. - The article suggests that practical experience in larger autonomous driving companies may provide better training and growth opportunities compared to smaller robotics firms [16][20].
自动驾驶的流派纷争史
3 6 Ke· 2025-09-28 02:50
Core Insights - The commercialization of autonomous driving is accelerating globally, with companies like Waymo and Baidu Apollo significantly increasing their fleets and service offerings [1][2] - Despite the apparent maturity of technology, there are still unresolved debates regarding sensor solutions and system architectures that will shape the future of autonomous driving [3][4] Sensor Solutions - There are two main camps in the sensor debate: pure vision and multi-sensor fusion, each with its own advantages and challenges [4][9] - The pure vision approach, championed by Tesla, relies on cameras and deep learning algorithms, offering lower costs and scalability, but struggles in adverse weather conditions [7][9] - Multi-sensor fusion, favored by companies like Waymo and NIO, emphasizes safety through redundancy, combining various sensors to enhance reliability [9][10] Sensor Types - LiDAR is known for its high precision in creating 3D point clouds but comes with high costs, making it less accessible for mass commercialization [11][13] - 4D millimeter-wave radar offers advantages in adverse weather conditions but lacks the resolution of LiDAR, leading to a complementary relationship between the two technologies [13][15] Algorithmic Approaches - The industry is divided between modular and end-to-end algorithm designs, with the latter gaining traction for its potential to optimize performance without information loss [16][18] - End-to-end models, while promising, face challenges related to traceability and safety, leading to the emergence of hybrid approaches that seek to balance performance and explainability [18][22] AI Models - The debate continues between Visual Language Models (VLM) and Visual Language Action Models (VLA), with VLM focusing on interpretability and VLA on performance optimization [19][21] - VLM is currently more widely adopted among major companies due to its maturity and lower training costs, while VLA is explored by companies like Tesla and Geely for its advanced reasoning capabilities [25][26] Industry Trends - The ongoing technological debates are leading to a convergence of ideas, with sensor technologies and algorithmic approaches increasingly integrating to enhance the capabilities of autonomous driving systems [25][26]
具身智能,为何成为智驾公司的下一个战场?
雷峰网· 2025-09-26 04:17
Core Viewpoint - Embodied intelligence is emerging as the next battleground for smart driving entrepreneurs, with significant investments and developments in the sector [2][4]. Market Overview - The global embodied intelligence market is on the verge of explosion, with China's market expected to reach 5.295 billion yuan by 2025, accounting for approximately 27% of the global market [3][21]. - The humanoid robot market is projected to reach 8.239 billion yuan, representing about 50% of the global market [3]. Industry Trends - Several smart driving companies, including Horizon Robotics and Zhixing Technology, are strategically investing in embodied intelligence through mergers, acquisitions, and subsidiary establishments to seize historical opportunities [4]. - The influx of talent from the smart driving sector into embodied intelligence has been notable since 2022, with many professionals making the transition in 2023 [13]. Technological Integration - The integration of smart driving and embodied intelligence is based on the concept of "embodied cognition," where intelligent behavior is formed through continuous interaction with the physical environment [6]. - The technical pathways for both fields are highly aligned, with smart driving vehicles functioning as embodied intelligent agents through multi-sensor perception, algorithmic decision-making, and control systems [6]. Technical Framework - The technical layers of smart driving applications and their migration to embodied intelligence include: - Perception Layer: Multi-sensor fusion for environmental modeling and object recognition [7]. - Decision Layer: Path planning and behavior prediction for task planning and interaction strategies [7]. - Control Layer: Vehicle dynamics control for motion control and execution [7]. - Simulation Layer: Virtual scene testing for skill learning and adaptive training [7]. Investment and Growth Potential - The embodied intelligence market is expected to maintain a growth rate of over 40% annually, providing a valuable channel for smart driving companies facing growth bottlenecks [21]. - The dual development pattern of humanoid and specialized robots allows smart driving companies to leverage their technological strengths for market entry [22]. Profitability Insights - The gross profit margins for embodied intelligence products are generally higher than those for smart driving solutions, with professional service robots achieving margins over 50%, compared to 15-25% for autonomous driving kits [23][25]. - This profit difference arises from the stronger differentiation and lower marginal costs of embodied intelligence products, allowing for rapid market entry and reduced development costs [25]. Future Outlook - The boundaries between smart driving and embodied intelligence are increasingly blurring, with companies like Tesla viewing autonomous vehicles as "wheeled robots" and developing humanoid robots based on similar AI architectures [26]. - Early movers in this transition are likely to secure advantageous positions in the future intelligent machine ecosystem [26].
斑马智行司罗:智能座舱正经历范式重构,端到端+主动感知成破局关键
Zhong Guo Jing Ji Wang· 2025-09-22 09:07
Core Insights - The core argument presented by the CTO of Zebra Zhixing is that smart cockpits are becoming a crucial entry point for user experience and the Internet AI ecosystem in smart vehicles, representing a golden track with both technological depth and commercial value [3][4]. Industry Overview - Smart cars are identified as a significant testing ground for Physical AI, with the potential for AI value in physical spaces being more substantial than in digital realms [3]. - The smart cockpit is characterized by three core features: high complexity, high safety, and high commercial value, with Zebra Zhixing having collaborated on over 8 million vehicles to validate the feasibility of large-scale technology applications [3]. Technical Architecture - The smart cockpit's five-layer integration architecture includes: 1. Chip and computing power layer, centered around companies like NVIDIA and Qualcomm. 2. System layer, led by companies such as Zebra Zhixing and Huawei, providing efficient system-level services. 3. Large model layer, integrating general and vehicle-specific models to address multi-modal processing and data privacy. 4. Intelligent agent layer, responsible for central decision-making and service module coordination. 5. Platform service layer, enabling AI-native services through natural language interaction [4]. Development Phases - The development of smart cockpits is categorized into three phases: 1. "Verification Period" (2024 to early 2025) focusing on whether large models can be integrated into vehicles. 2. "Application Period" (2025) emphasizing the implementation of intelligent agent systems for practical service delivery. 3. "Reconstruction Period" (current to 2026) where the industry shifts from traditional assembly line architectures to end-to-end models [4][5]. Interaction Experience - The transition from a "passive response" to "active perception" in smart cockpits is highlighted, where intelligent assistants can proactively identify user needs through sensory inputs, evolving from mere tools to supportive partners [5]. - Zebra Zhixing aims to drive the smart cockpit towards a trillion-level commercial market, positioning it as a core hub in the Physical AI ecosystem [5].
黄仁勋随特朗普访英:26亿美元下注英国AI,智驾公司Wayve或获5亿美元加码
Sou Hu Cai Jing· 2025-09-20 09:57
Core Insights - NVIDIA's CEO Jensen Huang announced a £2 billion (approximately $2.6 billion) investment in the UK to catalyze the AI startup ecosystem and accelerate the creation of new companies and jobs in the AI sector [1] - Wayve, a UK-based autonomous driving startup, is expected to secure one-fifth of this investment, with NVIDIA evaluating a $500 million investment in its upcoming funding round [1][2] - Wayve's upcoming Gen 3 hardware platform will be built on NVIDIA's DRIVE AGX Thor in-vehicle computing platform [1] Company Overview - Wayve was founded in 2017 with the mission to reimagine autonomous mobility using embodied AI [3] - The company has developed a unique technology path focused on embodied AI and end-to-end deep learning models, distinguishing itself from mainstream autonomous driving companies [3][8] - Wayve is the first company in the world to deploy an end-to-end deep learning driving system on public roads [3] Technology and Innovation - Embodied AI allows an AI system to learn tasks through direct interaction with the physical environment, contrasting with traditional systems that rely on manually coded rules [8] - Wayve's end-to-end model, referred to as AV2.0, integrates deep neural networks with reinforcement learning, processing raw sensor data to output vehicle control commands [8][10] - To address the challenges of explainability in end-to-end models, Wayve developed the LINGO-2 model, which uses visual and language inputs to predict driving behavior and explain actions [10][12] Data and Training - Wayve has created the GAIA-2 world model, a video generation model designed for autonomous driving, which generates realistic driving scenarios based on structured inputs [14][15] - GAIA-2 is trained on a large dataset covering various geographical and driving conditions, allowing for effective training without extensive real-world driving data [16][17] - The model's ability to simulate edge cases enhances training efficiency and scalability [18] Strategic Partnerships - Wayve's technology does not rely on high-definition maps and is hardware-agnostic, allowing compatibility with various sensor suites and vehicle platforms [20] - The company has established partnerships with Nissan and Uber to test its autonomous driving technology [20] Leadership and Team - Wayve's leadership team includes experienced professionals from leading companies in the autonomous driving sector, enhancing its strategic direction and technological capabilities [25][26]
机器人跨越“三重门”——具身智能创新者亲历的现实与趋势
Xin Hua Wang· 2025-09-15 08:08
Core Insights - The humanoid robot industry is experiencing a dichotomy of rapid advancements in capabilities and significant challenges in commercial viability, with a notable gap between technological achievements and actual orders received [1][5][41] - Investment in humanoid robotics has surged, with over 20 companies in the sector moving towards IPOs, marking a pivotal year for mass production in humanoid robots [1][12] - The development of embodied intelligence is at a crossroads, requiring a balance between technological innovation and practical application in real-world scenarios [1][18] Group 1: Industry Developments - The first city-level operational humanoid robot demonstration zone was established in Beijing, featuring a robot-operated unmanned supermarket, indicating a significant step towards integrating humanoid robots into daily life [5] - Companies like Beijing Galaxy General Robotics are leading the way in deploying humanoid robots in various sectors, including industrial and retail applications, with plans to open 100 smart pharmacies nationwide [12][41] - The industry is witnessing a shift from merely showcasing capabilities to focusing on practical applications that can generate revenue and sustain growth [1][41] Group 2: Technological Challenges - The primary challenge for humanoid robots lies in their ability to operate autonomously without remote control, which is contingent on the development of advanced models that can generalize across different scenarios [7][13] - Data quality and diversity are critical for enhancing the capabilities of humanoid robots, with a focus on using high-quality synthetic data to train models effectively [15][33] - The current models used in humanoid robotics are not fully mature, and the industry is still grappling with the need for a unified approach to model architecture that can handle the complexities of the physical world [27][34] Group 3: Market Dynamics - The humanoid robot market is characterized by a "chicken or egg" dilemma, where the lack of orders hampers technological iteration, while insufficient technology prevents securing orders [41] - The cost of humanoid robots remains high, with individual units exceeding 100,000 yuan, making them less competitive compared to traditional labor in industrial settings [46][47] - The focus is shifting towards household applications as the ultimate goal for humanoid robots, with the belief that their true value lies in versatility and the ability to create new ecosystems [47]
π0.5开源前,国内也开源了一个强大的端到端统一基础模型!具备强泛化和长程操作
具身智能之心· 2025-09-11 02:07
Core Viewpoint - The article discusses the release of π0.5 and WALL-OSS, highlighting their advancements in embodied intelligence and the significance of these models in the robotics industry, particularly in enhancing task execution in complex environments [1][3][5]. Group 1: Model Capabilities - π0.5 demonstrates enhanced generalization capabilities through heterogeneous task collaborative training, enabling robots to perform long-term, fine-grained operations in new household environments [3][5]. - WALL-OSS achieves embodied perception through large-scale multimodal pre-training, allowing seamless integration of instruction reasoning, sub-goal decomposition, and fine-grained action synthesis within a single differentiable framework [8][18]. - The model exhibits high success rates in complex long-term manipulation tasks, showcasing robust instruction-following abilities and understanding of complex scenarios, surpassing existing baseline models [8][18][28]. Group 2: Training and Data - The training process for WALL-OSS involves discrete, continuous, and joint phases, requiring only RTX 4090-level computational power for training and inference deployment [14][15]. - A multi-source dataset centered on embodied tasks was constructed, addressing the lack of large-scale, aligned VLA supervision and current visual language models' spatial understanding gaps [20][22]. - The dataset includes thousands of hours of data, focusing on both short-range operation tasks and long-range reasoning tasks, ensuring comprehensive training for the model [20][22][24]. Group 3: Experimental Analysis - Experimental analysis on embodied visual question answering and six robotic operation tasks focused on language instruction understanding, reasoning, and generalization, as well as planning and execution of long-term, multi-stage tasks [25][31]. - WALL-OSS significantly outperformed its original baseline model in object grounding, scene captioning, and action planning tasks, demonstrating its enhanced scene understanding capabilities [27][28]. - The model's ability to follow novel instructions without task-specific fine-tuning was validated, achieving 85% average task progress on known object instructions and 61% on novel object instructions [29][31]. Group 4: Industry Impact - The advancements in WALL-OSS and π0.5 are positioned to address existing limitations in visual language models and embodied understanding, paving the way for more capable and versatile robotic systems [5][8][20]. - The company, established in December 2023, focuses on developing a general embodied intelligence model using real-world data, aiming to create robots with fine operational capabilities [39]. - The recent completion of a nearly 1 billion yuan A+ round of financing indicates strong investor confidence in the company's direction and potential impact on the industry [39].
拆解华为乾崑智驾ADS 4:世界模型乱战,尖子生如何闯关?
Core Viewpoint - The article discusses the evolution of autonomous driving technology, emphasizing the shift from traditional end-to-end models to world models that enable vehicles to understand and predict their environment more effectively [2][4][8]. Group 1: World Model Development - The world model allows vehicles to possess predictive capabilities, moving beyond mere reactive responses to real-time stimuli [2][3]. - Huawei's ADS 4 system, launched in April 2023, represents a significant advancement in high-level driving assistance, relying on the self-developed WEWA architecture [3][4]. - By 2025, several tech companies, including Xiaopeng and SenseTime, are expected to adopt world models as a crucial step towards achieving fully autonomous driving [4][8]. Group 2: Challenges in Autonomous Driving - The industry has recognized that traditional end-to-end models, which rely heavily on human driving data, often lead to suboptimal decision-making and do not truly understand physical laws [6][7]. - Research indicates that low-precision training can limit the effectiveness of models, highlighting the need for improved generalization capabilities in real-world scenarios [7]. Group 3: Competitive Landscape - Huawei's market share in the domestic pre-installed auxiliary driving domain is reported at 79.0%, maintaining its position as a leading supplier [9]. - The company differentiates itself by focusing on a more fundamental approach to driving, emphasizing spatial reasoning over merely following trends [9][10]. Group 4: Technological Innovations - Huawei's world model architecture integrates a cloud-based world engine and a vehicle-side behavior model, enhancing real-time reasoning and decision-making capabilities [12][14]. - The company has developed a unique approach to generating training scenarios, focusing on extreme cases that are often difficult to capture in real-world data [13][14]. Group 5: Implementation and Future Prospects - Huawei's intelligent driving system has been deployed in over 1 million vehicles across various manufacturers, facilitating rapid feedback and continuous improvement of the system [15]. - The integration of a large-scale real vehicle fleet supports the evolution of the driving system, paving the way for higher levels of autonomous driving capabilities [15].