Workflow
自动驾驶之心
icon
Search documents
关于理想VLA未来发展的一些信息
自动驾驶之心· 2025-11-10 03:36
Core Viewpoint - The article discusses the future of Li Auto's VLA (Vehicle Learning Architecture), emphasizing the development of a reinforcement learning closed loop by the end of 2025, which is expected to significantly enhance user experience and vehicle performance [2][3]. Short-term Outlook - Li Auto aims to establish a reinforcement learning closed loop by the end of 2025, with expectations of noticeable improvements in vehicle performance and user perception by early 2026 [2]. Mid-term Outlook - After strengthening the reinforcement learning closed loop, Li Auto anticipates surpassing Tesla in the Chinese market due to its unique advantages in closed-loop iteration [3]. - The transformation brought by VLA's reinforcement learning is seen as a significant business change, creating a true competitive moat for the company, which will take 1-2 years to fully implement [3]. Long-term Outlook - VLA is projected to achieve Level 4 autonomy, but new technologies are expected to emerge beyond this [4]. - Current safety restrictions are in place to mitigate risks, with the system designed to autonomously identify and address issues through data collection and training [4]. Key Insights on VLA - Li Auto's leadership believes that the intelligence required for driving is relatively low, and after business process reforms, the computational needs for vehicle performance will not be excessively high [5][6]. - The company is focusing on a balanced computational requirement of around 1000 to 2000 TOPS for vehicles and 32 billion for cloud processing [6]. Organizational Adjustments - Li Auto's autonomous driving department is undergoing structural changes to enhance its business system rather than relying on individual talents, with a focus on AI-oriented organization [12]. - The restructuring includes splitting existing teams into specialized departments to improve efficiency and innovation [12]. Competitive Landscape - Li Auto's approach to VLA has faced skepticism from competitors, but the company views this as validation of its strategy [14]. - The article highlights the importance of data quality and distribution in achieving effective autonomous driving, emphasizing the need for human-like reasoning capabilities in systems [18]. Strategic Focus - The company is committed to delivering substantial functional upgrades and user experience improvements on a quarterly basis [18]. - Li Auto's leadership emphasizes the importance of clear communication of company strategy to engage younger employees effectively [18].
合作了一款高性价比3D扫描仪!
自动驾驶之心· 2025-11-10 03:36
Core Viewpoint - The article introduces the GeoScan S1, a highly cost-effective handheld 3D laser scanner designed for industrial and research applications, emphasizing its advanced features and capabilities for real-time 3D scene reconstruction and mapping. Group 1: Product Features - GeoScan S1 offers a lightweight design with a one-button start for efficient 3D scanning solutions, achieving centimeter-level accuracy in real-time scene reconstruction [2][10]. - The device can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large-scale scanning over 200,000 square meters [2][30]. - It integrates multiple sensors, including a high-precision IMU and RTK, enabling high-accuracy mapping and data synchronization [35][39]. Group 2: Technical Specifications - The GeoScan S1 operates on an Ubuntu system and supports various data export formats, including .pcd and .las, with a relative accuracy of better than 3 cm and absolute accuracy of better than 5 cm [23][28]. - The device dimensions are 14.2 cm x 9.5 cm x 45 cm, weighing 1.3 kg without the battery and 1.9 kg with the battery, and it has a power input range of 13.8V to 24V [23][24]. - It features a battery capacity of 88.8 Wh, providing approximately 3 to 4 hours of operational time [23][27]. Group 3: Market Positioning - The GeoScan S1 is positioned as the most cost-effective handheld 3D laser scanner in the market, with a starting price of 19,800 yuan for the basic version [10][58]. - The product is backed by extensive research and validation from teams at Tongji University and Northwestern Polytechnical University, with over a hundred projects demonstrating its capabilities [10][39]. Group 4: Application Scenarios - The GeoScan S1 is suitable for various environments, including office buildings, parking lots, industrial parks, tunnels, forests, and mining sites, effectively constructing 3D scene maps in complex settings [39][47]. - It supports cross-platform integration, making it adaptable for use with drones, unmanned vehicles, and robotic systems for automated operations [45][47].
模仿学习之外,端到端轨迹如何优化?轻舟一篇刷榜的工作......
自动驾驶之心· 2025-11-10 03:36
Core Insights - The article discusses the development of CATG, a new trajectory generation framework based on flow matching, which addresses limitations in existing end-to-end autonomous driving systems [1][4][22] - CATG achieved a score of 51.31 in the NAVSIM V2 challenge, demonstrating its effectiveness in trajectory planning and robustness against out-of-distribution data [4][22] Background Review - End-to-end multimodal planning has become a key method in autonomous driving, significantly improving robustness and adaptability compared to single trajectory prediction methods [3] - Current multimodal methods often rely on imitation learning, leading to a lack of behavioral diversity due to insufficient strategy diversity in real trajectories [3][6] - Various alternative strategies have been proposed to capture a broader distribution of reasonable trajectories, but many still struggle with integrating safety constraints directly into the generation process [3][6] Proposed Framework - CATG completely abandons imitation learning and supports the flexible injection of explicit constraints during the generation process [4][22] - The framework integrates feasibility and safety constraints into the generation process through a progressive mechanism, utilizing prior perception anchor points [7][22] - CATG allows for controllable trade-offs between aggressive and conservative driving styles by using environmental reward signals as conditional inputs [7][13] Experimental Results - CATG was extensively evaluated in the NAVSIM V2 challenge, showcasing superior planning accuracy and robust generalization capabilities [4][14] - The model's training involved two phases: the first focused on training the flow matching process, and the second on fine-tuning the energy matching process [18][22] - The results indicated high compliance with various metrics, including 100% drivable area compliance and 98.21% no-at-fault collisions in stage one [19] Limitations - The computational cost of generating trajectories through 100-step sampling remains high, and accelerating the sampling process may compromise trajectory quality [21] Conclusion - The article concludes that CATG represents a significant advancement in end-to-end planning for autonomous driving, effectively incorporating flexible conditional signals and explicit constraints during trajectory generation [22]
世界模型和VLA正在逐渐走向融合统一
自动驾驶之心· 2025-11-10 03:36
Core Viewpoint - The integration of Vision-Language Action (VLA) and World Model (WM) technologies is becoming increasingly evident, suggesting a trend towards their unification in the development of autonomous driving systems [2][4][6]. Summary by Sections VLA and WM Integration - Recent discussions highlight that VLA and WM should not be seen as opposing technologies but rather as complementary, with evidence from recent academic work supporting their combined application [2][3]. - The DriveVLA-W0 project demonstrates the feasibility of integrating VLA with WM, indicating a path towards more advanced general artificial intelligence (AGI) [3]. Language and World Models - Language models focus on abstract reasoning and high-level logic, while world models emphasize physical laws and low-level capabilities such as speed perception [3]. - The combination of these models is essential for achieving stronger embodied intelligence, with various academic explorations already underway in this area [3]. Industry Trends and Future Directions - The ongoing debate within the industry regarding VLA and WA is largely a matter of promotional terminology, with both approaches referencing similar technological foundations [6]. - The future of autonomous driving training chains is expected to incorporate VLA, reinforcement learning (RL), and WM, all of which are crucial components [4][6]. Community and Knowledge Sharing - The "Autonomous Driving Heart Knowledge Planet" community aims to provide a comprehensive platform for knowledge sharing among industry professionals and academics, facilitating discussions on technological advancements and career opportunities [9][22]. - The community has gathered over 4000 members and aims to expand to nearly 10,000, offering resources such as learning routes, Q&A sessions, and job referrals [9][22]. Educational Resources - The community offers a variety of educational materials, including video tutorials and detailed learning paths for newcomers and experienced professionals alike, covering topics from end-to-end autonomous driving to multi-sensor fusion [17][23]. - Members can access a wealth of resources, including open-source projects, datasets, and industry insights, to enhance their understanding and skills in the autonomous driving field [23][41].
招募4D标注和世界模型方向的合伙人!
自动驾驶之心· 2025-11-08 16:03
Group 1 - The article emphasizes the increasing demand for corporate training and job counseling in the autonomous driving sector, highlighting the need for diverse training programs ranging from technology updates to industry development summaries [2] - There is a notable interest from individuals seeking guidance, particularly those struggling with resume enhancement and project experience [3] - The company is actively seeking collaboration with professionals in the autonomous driving field to enhance training services, course development, and research guidance [4] Group 2 - The company offers competitive compensation and access to extensive industry resources, focusing on various areas such as autonomous driving product management, data annotation, world models, and reinforcement learning [5] - The primary target for training collaborations includes enterprises, universities, and research institutions, as well as students and job seekers [6] - Interested parties are encouraged to reach out for further consultation via WeChat [7]
滴滴和港中文最新的前馈3D重建算法UniSplat!史少帅参与~
自动驾驶之心· 2025-11-08 16:03
Core Insights - The article discusses the introduction of UniSplat, a novel feed-forward framework for dynamic scene reconstruction in autonomous driving, which addresses challenges in existing methods due to sparse camera views and dynamic environments [6][44]. Group 1: Background and Challenges - Reconstructing 3D scenes from urban driving scenarios is a core capability for autonomous driving systems, supporting tasks like simulation and scene understanding [5]. - Recent advancements in 3D Gaussian splatting have shown impressive rendering efficiency and fidelity, but existing methods often assume significant overlap between input images, limiting their applicability in real-time driving scenarios [5][6]. - The challenges include maintaining a unified latent representation over time, handling partial observations and occlusions, and efficiently generating high-fidelity Gaussian bodies from sparse inputs [5][6]. Group 2: UniSplat Framework - UniSplat is designed to model dynamic scenes using a unified 3D scaffold that integrates multi-view spatial information and multi-frame temporal information [6][9]. - The framework operates in three stages: constructing a 3D scaffold from multi-view images, performing spatio-temporal fusion, and decoding the fused scaffold into Gaussian bodies [6][9]. - The dual-branch decoder strategy enhances detail retention and scene completeness by predicting Gaussian bodies from both sparse point locations and voxel centers [6][9]. Group 3: Experimental Results - Evaluations on the Waymo Open and NuScenes datasets demonstrate that UniSplat achieves state-of-the-art performance in both input view reconstruction and new view synthesis tasks [7][34]. - The model exhibits strong robustness and superior rendering quality when synthesizing views outside the original camera coverage, thanks to its temporal memory mechanism [7][34]. - Comparative results indicate that UniSplat consistently outperforms existing methods, such as MVSplat and DepthSplat, across all metrics [33][34]. Group 4: Conclusion and Future Directions - UniSplat represents a significant advancement in dynamic scene reconstruction and new view synthesis, providing a robust framework for integrating spatio-temporal information from multi-camera video [44]. - The framework's potential applications extend to dynamic scene understanding, interactive 4D content creation, and lifelong world modeling [44].
英伟达内向黄仁勋汇报的36人
自动驾驶之心· 2025-11-08 16:03
Core Viewpoint - The article discusses the organizational structure and strategic positioning of NVIDIA under CEO Jensen Huang, highlighting the importance of hardware and AI in the company's growth, as well as the recent personnel changes that signal a shift towards a more structured management approach. Group 1: Organizational Structure - Jensen Huang has 36 direct reports, divided into seven functional areas: strategy, hardware, software, AI, public relations, networking, and an executive assistant [2][4]. - Among these, nine are focused on hardware, indicating that hardware remains a cornerstone of NVIDIA's business [8][9]. - The presence of three public relations executives under Huang's direct supervision contrasts sharply with other tech leaders, emphasizing NVIDIA's need for a systematic external communication strategy [13][16]. Group 2: Key Personnel - Jonah Alben, a long-time leader at NVIDIA, is recognized as the "soul of GPU architecture" and has been with the company for 28 years, overseeing GPU design and development [24][25][32]. - Dwight Diercks, another veteran with 31 years at NVIDIA, manages the software engineering team and has played a crucial role in the company's software development [33][38]. - Bill Dally, NVIDIA's chief scientist, has significantly contributed to the evolution of GPUs and AI hardware architecture, having transitioned from academia to NVIDIA [43][48]. Group 3: New Additions - Wu Xinzhou, the only Chinese executive directly reporting to Huang, is now the Vice President of Automotive Business at NVIDIA, responsible for strategic planning and product layout [57][58]. - Wu's experience at Qualcomm and XPeng Motors positions him well to drive NVIDIA's automotive business, which has seen revenue growth from $281 million to $567 million for the fiscal years 2024 to 2025 [72][73]. Group 4: Management Philosophy - Huang's management style emphasizes a flat organizational structure to enhance information flow and decision-making speed, which is increasingly challenged by the company's rapid growth [81][105]. - The company has seen a significant increase in employee count, from 29,600 to 36,000 in just one year, indicating a shift in management dynamics [101][115]. - Huang's approach to leadership is characterized by a high-pressure culture, focusing on task completion and performance, which has led to a demanding work environment [118][125].
被裁,大多输在薪资太高!
自动驾驶之心· 2025-11-08 16:03
Core Insights - The current job market prioritizes cost over capability when it comes to layoffs, with companies focusing on reducing expenses rather than retaining high-performing employees [3][6][7] - Companies have developed strategies to manage product quality issues by silencing dissent rather than addressing problems, often leading to the dismissal of higher-paid employees [5][6] - Employees are advised to proactively seek alternative opportunities and prepare for potential layoffs, as the corporate environment has shifted to one where cost is the primary concern [6][7] Group 1: Layoff Trends - Companies continue to conduct layoffs, with a focus on reducing costs rather than evaluating employee performance [3] - High-performing employees are often the first to be laid off if their salaries are deemed too high, regardless of their contributions [3][5] Group 2: Corporate Strategies - Some companies employ legal tactics to protect themselves from accountability, prioritizing the silencing of employees who raise concerns over resolving the issues [5] - The trend of "killing the donkey after the mill is unloaded" reflects a strategy where companies reduce staff once projects are completed, favoring lower-cost employees for maintenance roles [5][6] Group 3: Employee Advice - Employees are encouraged to consider their status as a cost to the company and to take proactive steps to secure their future employment [7] - The current environment necessitates that employees prepare for potential job loss rather than waiting until it occurs to seek alternatives [6][7]
被裁,大多输在薪资太高!
自动驾驶之心· 2025-11-08 12:35
Group 1 - The current job market is experiencing ongoing layoffs, with companies prioritizing cost over employee capability when making decisions on who to retain [3][6] - High-performing employees are often the first to be laid off due to their higher salaries, as companies focus on reducing costs rather than maintaining talent [3][5] - Companies have developed strategies to manage product quality issues by silencing dissent rather than addressing problems, indicating a shift in workplace dynamics where cost is prioritized over employee value [5][6] Group 2 - The changing workplace logic emphasizes that cost is more critical than capability, leading employees to consider their job security and potential alternatives proactively [6][7] - Companies that do not respect their employees may ultimately face consequences from customers, as the quality of service cannot be maintained solely through cost-cutting measures [7]
滴滴和港中文最新的前馈3D重建算法UniSplat!史少帅参与~
自动驾驶之心· 2025-11-08 12:35
Core Viewpoint - The article discusses the introduction of UniSplat, a novel feed-forward framework for dynamic scene reconstruction in autonomous driving, which effectively integrates spatio-temporal information from multi-camera video inputs to enhance the robustness and quality of 3D scene reconstruction [6][44]. Background Review - Reconstructing 3D scenes from urban driving scenarios has become a core capability for autonomous driving systems, supporting critical tasks such as simulation, scene understanding, and long-horizon planning [5]. - Recent advancements in 3D Gaussian splatting technology have shown impressive rendering efficiency and fidelity, but existing methods often assume significant overlap between input images and rely on scene-by-scene optimization, limiting their applicability in real-time driving scenarios [5][6]. UniSplat Overview - UniSplat is designed to address the challenges of robust reconstruction in dynamic driving scenes by constructing a unified 3D Scaffold that integrates multi-view spatial information and multi-frame temporal information [6][9]. - The framework operates in three stages: building a 3D Scaffold from multi-view images, performing spatio-temporal fusion, and decoding the fused Scaffold into Gaussian primitives [6][9]. Experimental Results - Evaluations on the Waymo Open dataset and NuScenes dataset demonstrate that UniSplat achieves state-of-the-art performance in both input view reconstruction and new view synthesis tasks, showcasing strong robustness and superior rendering quality even for views outside the original camera coverage [7][34]. - In the Waymo dataset, UniSplat outperforms existing methods such as MVSplat and DepthSplat across all metrics, achieving a PSNR of 25.37 and an SSIM of 0.765 [34]. - The model effectively distinguishes between dynamic and static elements in scenes, successfully mitigating ghosting artifacts during scene completion [40]. Methodology Details - The 3D Scaffold is constructed by inferring geometric structures using a geometric backbone model and supplementing it with semantic information from a visual backbone model [14][16]. - A dual-branch decoder is employed to generate dynamic-aware Gaussian primitives, enhancing detail retention and scene completeness [23][27]. - The framework incorporates a memory mechanism to accumulate static Gaussian representations over time, facilitating long-term scene completion [29][31]. Conclusion - UniSplat represents a significant advancement in the field of dynamic scene understanding and interactive 4D content creation, providing a robust foundation for future research in lifelong world modeling and autonomous driving applications [44].