Workflow
Autonomous Driving
icon
Search documents
又有很多自动驾驶工作中稿了ICCV 2025,我们发现了一些新趋势的变化...
自动驾驶之心· 2025-08-16 00:03
Core Insights - The article discusses the latest trends and research directions in the field of autonomous driving, highlighting the integration of multimodal large models and vision-language action generation as key areas of focus for both academia and industry [2][5]. Group 1: Research Directions - The research community is concentrating on several key areas, including the combination of MoE (Mixture of Experts) with autonomous driving, benchmark development for autonomous driving, and trajectory generation using diffusion models [2]. - The closed-loop simulation and world models are emerging as critical needs in autonomous driving, driven by the limitations of real-world open-loop testing. This approach aims to reduce costs and improve model iteration efficiency [5]. - There is a notable emphasis on performance improvement in object detection and OCC (Occupancy Classification and Counting), with many ongoing projects exploring specific pain points and challenges in these areas [5]. Group 2: Notable Projects and Publications - "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation" is a significant project from Huazhong University of Science and Technology and Xiaomi, focusing on integrating vision and language for action generation in autonomous driving [5]. - "All-in-One Large Multimodal Model for Autonomous Driving" is another important work from Zhongshan University and Meituan, contributing to the development of comprehensive models for autonomous driving [6]. - "MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding" from Chongqing University aims to enhance understanding of driving scenarios through multimodal analysis [8]. Group 3: Simulation and Reconstruction - The project "Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images" from TUM focuses on advanced reconstruction techniques for autonomous driving [14]. - "CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving" from Fraunhofer IVI and TU Munich is another notable work that addresses dynamic scene reconstruction [16]. Group 4: Trajectory Prediction and World Models - "Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics" from Hong Kong University of Science and Technology and Didi emphasizes the importance of trajectory prediction in autonomous driving [29]. - "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model" from the Chinese Academy of Sciences focuses on developing a comprehensive world model for autonomous driving [32].
死磕技术的自动驾驶黄埔军校,4000人了!
自动驾驶之心· 2025-08-15 14:23
Core Viewpoint - The article emphasizes the establishment of a comprehensive community focused on autonomous driving, aiming to bridge the gap between academia and industry while providing valuable resources for learning and career opportunities in the field [2][16]. Group 1: Community and Resources - The community has created a closed-loop system covering various fields such as industry, academia, job seeking, and Q&A exchanges, enhancing the learning experience for participants [2][3]. - The platform offers cutting-edge academic content, industry roundtables, open-source code solutions, and timely job information, significantly reducing the time needed for research [3][16]. - Members can access nearly 40 technical routes, including industry applications, VLA benchmarks, and entry-level learning paths, catering to both beginners and advanced researchers [3][16]. Group 2: Learning and Development - The community provides a well-structured learning path for beginners, including foundational knowledge in mathematics, computer vision, deep learning, and programming [10][12]. - For those already engaged in research, valuable industry frameworks and project proposals are available to further their understanding and application of autonomous driving technologies [12][14]. - Continuous job sharing and career opportunities are promoted within the community, fostering a complete ecosystem for autonomous driving [14][16]. Group 3: Technical Focus Areas - The community has compiled extensive resources on various technical aspects of autonomous driving, including perception, simulation, planning, and control [16][17]. - Specific learning routes are available for topics such as end-to-end learning, 3DGS principles, and multi-modal large models, ensuring comprehensive coverage of the field [16][17]. - The platform also features a collection of open-source projects and datasets relevant to autonomous driving, facilitating hands-on experience and practical application [32][34].
WeRide Secures Strategic Equity Investment from Grab, Partners to Deploy Robotaxis and Autonomous Shuttles in Southeast Asia
Globenewswire· 2025-08-15 09:18
Core Insights - WeRide has announced a strategic equity investment from Grab to accelerate the deployment of Level 4 Robotaxis and shuttles in Southeast Asia, aiming to integrate WeRide's autonomous vehicles into Grab's network for improved service and safety [2][3][5] Investment and Partnership - Grab's investment is expected to be finalized by the first half of 2026, contingent on customary closing conditions and WeRide's preferred timing, supporting WeRide's growth strategy in Southeast Asia [3] - This partnership builds on a prior Memorandum of Understanding signed in March 2025, focusing on the technical feasibility, commercial viability, and job creation potential of autonomous vehicles in the region [8] Operational Integration - The collaboration will establish a framework for deploying autonomous solutions across Grab's network, enhancing operational efficiency and scalability [4] - WeRide will integrate its autonomous driving technology into Grab's fleet management, vehicle matching, and routing ecosystem [4][12] Vision and Goals - WeRide aims to deploy thousands of Robotaxis in Southeast Asia, aligning with local regulations and societal readiness, leveraging Grab's regional expertise in ride-hailing and digital services [5] - Grab emphasizes the need for reliable transportation in Southeast Asia, particularly in areas with driver shortages, and plans to test WeRide's vehicles in diverse environments to adapt the technology for regional needs [6] Technical Collaboration - The partnership will focus on optimizing dispatch and routing, maximizing vehicle uptime, measuring safety performance, remote monitoring, customer support, and training for driver-partners and local communities [12]
多空博弈Robotaxi:“木头姐”建仓,机构现分歧
Di Yi Cai Jing· 2025-08-15 03:45
唱多、唱空交织,推动自动驾驶技术成熟。 今年以来,Robotaxi(自动驾驶出租车)受到全球资本市场广泛关注,但质疑声也如约而至。 近日,"木头姐"Cathie Wood旗下ARK基金斥资约1290万美元买入小马智行(NASDAQ:PONY)股 票,这是"木头姐"的主力基金首次持仓中国自动驾驶标的。据悉,"木头姐"被华尔街认为是"女版巴菲 特",其投资偏好是高成长、高风险及长期持有。 另一家中国Robotaxi头部企业文远知行(NASDAQ:WRD)二季度Robotaxi业务同比大增836.7%,该公 司早在今年5月就披露了Uber承诺向其追加投资1亿美元的事宜。 记者近期在广州体验百度旗下萝卜快跑Robotaxi时也出现"高峰期等车时间长达1个小时、且无车接 单"的情况。当记者问询叫车点附近运营车辆数量时,萝卜快跑客服回应称:"城市的可服务车辆并非固 定不变,会受多方因素影响进行动态调整。"根据附近居民、商户的反馈,下班高峰期萝卜快跑的等车 时长大于40分钟。 不可否认的是,现阶段Robotaxi派单时长、等车时长均较有人网约车更多,也是行业需要解决的课题。 韩旭表示,当自动驾驶公司开拓一个新城市时,自动驾 ...
地平线&清华Epona:自回归式世界端到端模型~
自动驾驶之心· 2025-08-12 23:33
Core Viewpoint - The article discusses a unified framework for autonomous driving world models that can generate long-term high-resolution video while providing real-time trajectory planning, addressing limitations of existing methods [5][12]. Group 1: Existing Methods and Limitations - Current diffusion models, such as Vista, can only generate fixed-length videos (≤15 seconds) and struggle with flexible long-term predictions (>2 minutes) and multi-modal trajectory control [7]. - GPT-style autoregressive models, like GAIA-1, can extend indefinitely but require discretizing images into tokens, which degrades visual quality and lacks continuous action trajectory generation capabilities [7][13]. Group 2: Proposed Methodology - The proposed world model in the autonomous driving domain uses a series of forward camera observations and corresponding driving trajectories to predict future driving dynamics [10]. - The framework decouples spatiotemporal modeling using causal attention in a GPT-style transformer and a dual-diffusion transformer for spatial rendering and trajectory generation [12]. - An asynchronous multimodal generation mechanism allows for parallel generation of 3-second trajectories and the next frame image, achieving 20Hz real-time planning with a 90% reduction in inference computational power [12]. Group 3: Model Structure and Training - The Multimodal Spatiotemporal Transformer (MST) encodes past driving scenes and action sequences, enhancing temporal position encoding for implicit representation [16]. - The Trajectory Planning Diffusion Transformer (TrajDiT) and Next-frame Prediction Diffusion Transformer (VisDiT) are designed to handle trajectory and image predictions, respectively, with a focus on action control [21]. - A chain-of-forward training strategy is employed to mitigate the "drift problem" in autoregressive inference by simulating prediction noise during training [24]. Group 4: Performance Evaluation - The model demonstrates superior performance in video generation metrics, achieving a FID score of 7.5 and a FVD score of 82.8, outperforming several existing models [28]. - In trajectory control metrics, the proposed method achieves a high accuracy rate of 97.9% in comparison to other methods [34]. Group 5: Conclusion and Future Directions - The framework integrates image generation and vehicle trajectory prediction with high quality, showing strong potential for applications in closed-loop simulation and reinforcement learning [36]. - However, the current model is limited to single-camera input, indicating a need for addressing multi-camera consistency and point cloud generation challenges in the autonomous driving field [36].
Pony Ai(PONY) - 2025 Q2 - Earnings Call Transcript
2025-08-12 13:02
Financial Data and Key Metrics Changes - Total revenues for Q2 reached $21.5 million, a 76% increase year over year, driven by strong growth in robotaxi services and licensing applications [39][41] - Robotaxi service revenues grew to $1.5 million, reflecting a 158% year over year increase, with fare charging revenues expanding by over 300% [39][40] - Gross margin improved to 16.1%, with gross profit of $3.5 million in Q2 [42] - Net loss for Q2 was $53.3 million, up from $30.9 million in the same period last year [44] Business Line Data and Key Metrics Changes - Robotaxi service revenues surged by 150% year over year, with fare charging revenues growing more than 300% [15][39] - Licensing and application revenues reached $10.4 million, a 902% increase year over year [41] - Global truck services revenue decreased by 10% year over year [41] Market Data and Key Metrics Changes - Registered users surged by 136% year over year in Q2, with a user satisfaction rate above 4.8 out of 5 [8][17] - The company operates across 2,000 square kilometers in Tier one cities in China, significantly expanding its market reach [56] Company Strategy and Development Direction - The company aims for mass production of Gen seven robotaxis, targeting over 1,000 vehicles by year-end 2025 [7][23] - A strategic partnership with Hehu Group aims to deploy over 1,000 robotaxis in Shenzhen [16] - The focus is on scaling up operations and enhancing user experience to drive higher demand [23][56] Management's Comments on Operating Environment and Future Outlook - Management expressed confidence in achieving positive unit economics for Gen seven vehicles, citing significant cost reductions and operational efficiencies [51] - The company is well-positioned for large-scale commercialization, with a solid plan and execution strategy in place [45][36] Other Important Information - The company has secured Shanghai's first fully driverless commercial license, enabling operations in all four Tier one cities [18][32] - The bond cost of Gen seven robotaxis has been reduced by 70% compared to previous generations [51] Q&A Session Summary Question: Production plan throughout 2025 - Management confirmed they are on track to exceed 1,000 robotaxi vehicles by year-end, with over 200 already produced [47][49] Question: Key drivers behind robotaxi revenue growth - Management highlighted user adoption, demand in Tier one cities, and an increased fleet as key drivers of revenue growth [55][56] Question: Impact of government comments on L4 robotaxi industry - Management noted that recent comments clarify the distinction between L2 and L4 systems, which is beneficial for public understanding and safety standards [60][62] Question: Key technical requirements for new market expansion - Management emphasized the ability to handle corner cases and the robustness of their software system as critical for entering new geographies [66][68] Question: Timetable for potential Hong Kong IPO - Management refrained from commenting on market speculation but stated they are monitoring market conditions closely [73][74] Question: Future plans for overseas market expansion - Management outlined a focus on markets with strong mobility demand and supportive regulatory environments, with ongoing operations in Dubai, South Korea, and Luxembourg [78][80]
Pony Ai(PONY) - 2025 Q2 - Earnings Call Presentation
2025-08-12 12:00
Key Highlights & Growth - Pony AI produced over 200 Gen-7 vehicles as of August 11, 2025 [7] - The company aims to produce over 1,000 vehicles by the end of 2025 [7, 22] - Registered user growth increased by 136% year-over-year from 2Q24 to 2Q25 [7, 32] - Total revenue grew by 76% in 2Q25 [7] - Fare-charging revenue experienced a growth of over 300% in 2Q25 [7, 35, 70] Commercialization & Operations - Pony AI is the only company with fully driverless commercial licenses in all four Tier-1 cities in China (Beijing, Shanghai, Guangzhou, Shenzhen) [20, 31] - Robotaxis receive approximately 15 average daily orders [20] - Accumulated autonomous driving kilometers reached 488 million+ as of June 30, 2025 [36] - Accumulated autonomous driverless kilometers reached 87 million+ as of June 30, 2025 [36] Financial Performance - Robotaxi services revenue increased by 1578% from $06 million in 2Q24 to $15 million in 2Q25 [65] - Total revenue increased by 9018% from $122 million in 2Q24 to $104 million in 2Q25 [69]
端到端盛行的当下,轨迹预测这个方向还有研究价值吗?
自动驾驶之心· 2025-08-12 08:05
Core Viewpoint - The article discusses the ongoing relevance of trajectory prediction in the context of end-to-end models, highlighting that many companies still utilize layered approaches where trajectory prediction remains a key algorithmic focus. The article emphasizes the significance of multi-agent trajectory prediction methods based on diffusion models, which are gaining traction in various applications such as autonomous driving and intelligent monitoring [1][2]. Group 1: Trajectory Prediction Research - Despite the rise of end-to-end models, trajectory prediction continues to be a hot research area, with significant output in conferences and journals [1]. - Multi-agent trajectory prediction aims to forecast future movements based on historical trajectories of multiple interacting agents, which is crucial in fields like autonomous driving and robotics [1]. - Traditional methods often struggle with the uncertainty and multimodality of human behavior, while generative models like GANs and CVAEs, although capable of simulating multimodal distributions, lack efficiency [1]. Group 2: Diffusion Models - Diffusion models have emerged as a new class of models that achieve complex distribution generation through gradual denoising, showing significant breakthroughs in image generation and other fields [2]. - The Leapfrog Diffusion Model (LED) enhances real-time prediction by reducing denoising steps, achieving a 19-30 times speedup while improving accuracy on various datasets [2]. - Mixed Gaussian Flow (MGF) and Pattern Memory-based Diffusion Model (MPMNet) are also highlighted for their advanced performance in trajectory prediction by better matching multimodal distributions and utilizing human motion patterns, respectively [2]. Group 3: Course Objectives and Structure - The course aims to provide a systematic understanding of trajectory prediction and diffusion models, helping students integrate theoretical knowledge with practical coding skills [6]. - It addresses common challenges faced by students, such as lack of direction and difficulties in reproducing research papers, by offering a structured approach to model development and academic writing [6]. - The course includes a comprehensive curriculum that covers classic and cutting-edge papers, coding implementations, and writing methodologies, ultimately guiding students to produce a draft of a research paper [6][9]. Group 4: Target Audience and Requirements - The course is designed for graduate students and professionals in trajectory prediction and autonomous driving, aiming to enhance their research capabilities and resume value [8]. - Participants are expected to have a foundational understanding of deep learning and familiarity with Python and PyTorch [10]. - The course emphasizes the importance of academic integrity and active participation, with specific requirements for attendance and assignment completion [15]. Group 5: Course Highlights and Outcomes - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [16][17]. - Students will gain access to datasets, baseline codes, and essential papers, facilitating a deeper understanding of the subject matter [20][21]. - Upon completion, students will have produced a research paper draft, a project completion certificate, and potentially a recommendation letter based on their performance [19].
自动驾驶论文速递 | 端到端、分割、轨迹规划、仿真等~
自动驾驶之心· 2025-08-09 13:26
Core Insights - The article discusses advancements in autonomous driving technologies, highlighting various frameworks and their contributions to improving safety, efficiency, and robustness in real-world scenarios. Group 1: DRIVE Framework - The DRIVE framework proposed by Stanford University and Microsoft integrates dynamic rule inference and verified evaluation for constraint-aware autonomous driving, achieving a 0.0% soft constraint violation rate and enhancing trajectory smoothness and generalization capabilities [2][6]. Group 2: Hybrid Learning-Optimization Framework - A hybrid learning-optimization trajectory planning framework developed by Beijing Jiaotong University and Hainan University achieves a 97% success rate and real-time planning performance of 54 milliseconds in highway scenarios [11][12]. Group 3: RoboTron-Sim - The RoboTron-Sim framework, developed by Meituan and Sun Yat-sen University, enhances the robustness of autonomous driving in extreme scenarios, achieving a 51.3% reduction in collision rates and a 51.5% improvement in trajectory accuracy on the nuScenes test [18][20]. Group 4: SAV Framework - The SAV framework proposed by Anhui University achieves high-precision vehicle part segmentation with an 81.23% mean Intersection over Union (mIoU) on the VehicleSeg10K dataset, surpassing previous best methods by 4.33% [34][40].
基于开源Qwen2.5-VL实现自动驾驶VLM微调
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the LLaMA Factory framework and the Qwen2.5-VL model, which enhance the capabilities of vision-language-action models for autonomous driving applications [4][5]. Group 1: LLaMA Factory Overview - LLaMA Factory is an open-source low-code framework for fine-tuning large models, gaining popularity in the open-source community with over 40,000 stars on GitHub [3]. - The framework integrates widely used fine-tuning techniques, making it suitable for developing autonomous driving assistants that can interpret traffic conditions through natural language [3]. Group 2: Qwen2.5-VL Model - The Qwen2.5-VL model serves as the foundational model for the project, achieving significant breakthroughs in visual recognition, object localization, document parsing, and long video understanding [4]. - It offers three model sizes, with the flagship Qwen2.5-VL-72B performing comparably to advanced models like GPT-4o and Claude 3.5 Sonnet, while smaller versions excel in resource-constrained environments [4]. Group 3: CoVLA Dataset - The CoVLA dataset, comprising 10,000 real driving scenes and over 80 hours of video, is utilized for training and evaluating vision-language-action models [5]. - This dataset surpasses existing datasets in scale and annotation richness, providing a comprehensive platform for developing safer and more reliable autonomous driving systems [5]. Group 4: Model Training and Testing - Instructions for downloading and installing LLaMA Factory and the Qwen2.5-VL model are provided, including commands for setting up the environment and testing the model [6][7]. - The article details the process of fine-tuning the model using the SwanLab tool for visual tracking of the training process, emphasizing the importance of adjusting parameters to avoid memory issues [11][17]. - After training, the fine-tuned model demonstrates improved response quality in dialogue scenarios related to autonomous driving risks compared to the original model [19].