Workflow
Autonomous Driving
icon
Search documents
VisionTrap: VLM+LLM教会模型利用视觉特征更好实现轨迹预测
自动驾驶之心· 2025-08-20 23:33
Core Insights - The article presents a novel method for trajectory prediction in autonomous driving, integrating visual inputs from surround cameras and textual descriptions to enhance prediction accuracy [3][4][5] - The proposed approach addresses limitations of traditional methods that rely solely on HD maps and historical trajectories, which often lack real-time adaptability to changing environments [5][6] - The introduction of a new dataset, nuScenes-Text, enriches existing datasets with textual annotations, demonstrating the positive impact of visual language models (VLM) on trajectory prediction [4][6][37] Group 1: Methodology - The proposed model consists of four key components: Per-agent State Encoder, Visual Semantic Encoder, Text-driven Guidance Module, and Trajectory Decoder [7][10] - The Per-agent State Encoder captures temporal features and spatial interactions among agents, utilizing relative displacement and attention mechanisms [10][11] - The Visual Semantic Encoder extracts image features from the environment, integrating them with agent features to enhance prediction accuracy [14][16] Group 2: Data and Training - The nuScenes-Text dataset was created using fine-tuned VLM and large language models (LLM) to generate detailed textual descriptions for each agent in various scenarios [37][39] - The training process employs multi-modal contrastive learning to align visual features with textual descriptions, improving the model's ability to extract relevant information from images [19][25] - The model's training objective includes maximizing similarity between positive pairs (agent features and corresponding text) while minimizing similarity between negative pairs (features from different agents) [19][20] Group 3: Experimental Results - The experimental results indicate significant improvements in trajectory prediction accuracy, with enhancements of over 20% attributed to the Visual Semantic Encoder and Text-driven Guidance Module [46][47] - The model's performance was validated across the entire nuScenes dataset, showcasing the effectiveness of each component in improving prediction metrics [47][48] - Visual and textual information integration led to better clustering of agent state embeddings, indicating improved understanding of agent behaviors [49][50] Group 4: Conclusion - The key innovation of the proposed method lies in using textual descriptions to guide the model in learning visual semantic features, thereby enhancing trajectory prediction accuracy [53][54] - The article highlights the importance of image information in trajectory prediction and the effectiveness of the proposed approach in leveraging both visual and textual data [54]
VLM还是VLA?从现有工作看自动驾驶多模态大模型的发展趋势~
自动驾驶之心· 2025-08-20 23:33
Core Insights - The article emphasizes the increasing importance of foundational models such as LLM (Large Language Models), VLM (Vision-Language Models), and VLA (Vision-Language-Action Models) in autonomous driving decision-making, attracting significant attention from both academia and industry [2]. Summary by Categories LLM-Based Approaches - LLM-based methods leverage the reasoning capabilities of large models to describe autonomous driving, marking the early stages of integration between autonomous driving and large models [4]. - Notable research includes: - "Distilling Multi-modal Large Language Models for Autonomous Driving" - "LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models" - "CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting" - "PADriver: Towards Personalized Autonomous Driving" [4][5]. VLM-Based Approaches - VLM and VLA algorithms are currently mainstream due to the reliance on visual sensors in autonomous driving. The article summarizes the latest works in this area for reference and learning [8]. - Key studies include: - "Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning" - "FutureSightDrive: Visualizing Trajectory Planning with Spatio-Temporal CoT for Autonomous Driving" [8][9]. VLA-Based Approaches - VLA methods focus on integrating vision, language, and action for end-to-end autonomous driving, emphasizing adaptive reasoning and reinforcement fine-tuning [17]. - Significant contributions include: - "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning" - "DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving" [17][21].
红色沃土新答卷丨晋察冀抗日根据地·山西阳泉:数字赋能 “煤城”转型“数智新城”
Yang Shi Wang· 2025-08-20 03:49
Group 1 - Yangquan City, located in Shanxi Province, has transformed from a coal-centric economy to a digital and intelligent mining hub, with 95.84% of its coal production now coming from advanced capacity [2][3] - The city has established 12 smart mines, utilizing 5G technology to enhance operational efficiency, resulting in a 50% reduction in underground personnel and a 50% increase in efficiency [3] - Yangquan has become the first city in China to fully open up for autonomous driving, implementing smart traffic management systems that have reduced average vehicle delay rates by 45% and parking frequency by 70% [5] Group 2 - The local government has prioritized the development of the digital economy, establishing platforms such as the China Electric Digital Economy Industrial Park and "Jinchuan Valley·Yangquan," which have accelerated the growth of industries like smart terminals, data security, and big data [7] - In 2024, the core revenue of Yangquan's digital economy is projected to grow by 13.3%, and the city has been recognized as one of the "Top 100 New Smart Cities in China" for 2023-2024 [7]
自动驾驶一周论文精选!端到端、VLA、感知、决策等~
自动驾驶之心· 2025-08-20 03:28
Core Viewpoint - The article emphasizes the recent advancements in autonomous driving research, highlighting various innovative approaches and frameworks that enhance the capabilities of autonomous systems in dynamic environments [2][4]. Group 1: End-to-End Autonomous Driving - The article discusses several notable papers focusing on end-to-end autonomous driving, including GMF-Drive, ME³-BEV, SpaRC-AD, IRL-VLA, and EvaDrive, which utilize advanced techniques such as gated fusion, deep reinforcement learning, and evolutionary adversarial strategies [8][10]. Group 2: Perception and VLM - The VISTA paper introduces a vision-language model for predicting driver attention in dynamic environments, showcasing the integration of visual and language processing for improved situational awareness [7]. - The article also mentions the development of safety-critical perception technologies, such as the progressive BEV perception survey and the CBDES MoE model for functional module decoupling [10]. Group 3: Simulation Testing - The article highlights the ReconDreamer-RL framework, which enhances reinforcement learning through diffusion-based scene reconstruction, indicating a trend towards more sophisticated simulation testing methodologies [11]. Group 4: Datasets - The STRIDE-QA dataset is introduced as a large-scale visual question answering resource aimed at spatiotemporal reasoning in urban driving scenarios, reflecting the growing need for comprehensive datasets in autonomous driving research [12].
都在做端到端了,轨迹预测还有出路么?
自动驾驶之心· 2025-08-19 03:35
Core Viewpoint - The article emphasizes the importance of trajectory prediction in the context of autonomous driving and highlights the ongoing relevance of traditional two-stage and modular methods despite the rise of end-to-end approaches. It discusses the integration of trajectory prediction models with perception models as a form of end-to-end training, indicating a significant area of research and application in the industry [1][2]. Group 1: Trajectory Prediction Methods - The article introduces the concept of multi-agent trajectory prediction, which aims to forecast future movements based on the historical trajectories of multiple interacting agents. This is crucial for applications in autonomous driving, intelligent monitoring, and robotic navigation [1]. - It discusses the challenges of predicting human behavior due to its uncertainty and multimodality, noting that traditional methods often rely on recurrent neural networks, convolutional networks, or graph neural networks for social interaction modeling [1]. - The article highlights the advancements in diffusion models for trajectory prediction, showcasing models like Leapfrog Diffusion Model (LED) and Mixed Gaussian Flow (MGF) that have significantly improved accuracy and efficiency in various datasets [2]. Group 2: Course Objectives and Structure - The course aims to provide a systematic understanding of trajectory prediction and diffusion models, helping participants to integrate theoretical knowledge with practical coding skills, ultimately leading to the development of new models and research papers [6][8]. - It is designed for individuals at various academic levels who are interested in trajectory prediction and autonomous driving, offering insights into cutting-edge research and algorithm design [8]. - Participants will gain access to classic and cutting-edge papers, coding implementations, and methodologies for writing and submitting research papers [8][9]. Group 3: Course Highlights and Requirements - The course features a "2+1" teaching model with experienced instructors and dedicated support staff to enhance the learning experience [16][17]. - It requires participants to have a foundational understanding of deep learning and proficiency in Python and PyTorch, ensuring they can engage with the course material effectively [10]. - The course structure includes a comprehensive curriculum covering data sets, baseline codes, and essential research papers, facilitating a thorough understanding of trajectory prediction techniques [20][21][23].
自动驾驶秋招交流群成立了!
自动驾驶之心· 2025-08-18 23:32
Core Viewpoint - The article emphasizes the convergence of autonomous driving technology, indicating a shift from numerous diverse approaches to a more unified model, which raises the technical barriers in the industry [1] Group 1 - The industry is witnessing a trend where previously many directions requiring algorithm engineers are now consolidating into unified models such as one model, VLM, and VLA [1] - The article encourages the establishment of a large community to support individuals in the industry, highlighting the limitations of individual efforts [1] - A new job and industry-related community is being launched to facilitate discussions on industry trends, company developments, product research, and job opportunities [1]
性能暴涨4%!CBDES MoE:MoE焕发BEV第二春,性能直接SOTA(清华&帝国理工)
自动驾驶之心· 2025-08-18 23:32
Core Viewpoint - The article discusses the CBDES MoE framework, a novel modular expert mixture architecture designed for BEV perception in autonomous driving, addressing challenges in adaptability, modeling capacity, and generalization in existing methods [2][5][48]. Group 1: Introduction and Background - The rapid development of autonomous driving technology has made 3D perception essential for building safe and reliable driving systems [5]. - Existing solutions often use fixed single backbone feature extractors, limiting adaptability to diverse driving environments [5][6]. - The MoE paradigm offers a new solution by enabling dynamic expert selection based on learned routing mechanisms, balancing computational efficiency and representational richness [6][9]. Group 2: CBDES MoE Framework - CBDES MoE integrates multiple structurally heterogeneous expert networks and employs a lightweight self-attention router (SAR) for dynamic expert path selection [3][12]. - The framework includes a multi-stage heterogeneous backbone design pool, enhancing scene adaptability and feature representation [14][17]. - The architecture allows for efficient, adaptive, and scalable 3D perception, outperforming strong single backbone baseline models in complex driving scenarios [12][14]. Group 3: Experimental Results - In experiments on the nuScenes dataset, CBDES MoE achieved a mean Average Precision (mAP) of 65.6 and a NuScenes Detection Score (NDS) of 69.8, surpassing all single expert baselines [37][39]. - The model demonstrated faster convergence and lower loss throughout training, indicating higher optimization stability and learning efficiency [39][40]. - The introduction of load balancing regularization significantly improved performance, with the mAP increasing from 63.4 to 65.6 when applied [42][46]. Group 4: Future Work and Limitations - Future research may explore patch-wise or region-aware routing for finer granularity in adaptability, as well as extending the method to multi-task scenarios [48]. - The current routing mechanism operates at the image level, which may limit its effectiveness in more complex environments [48].
Pony.ai Attracts Premium Capital as Funds Chase the Next Tech Transformation
Prnewswire· 2025-08-18 13:53
Core Insights - Leading investment management firms, including ARK Invest, have invested significantly in Pony.ai, marking a notable interest in the Chinese autonomous driving sector [1][2] - Pony.ai has reported substantial growth in robotaxi revenues and is on a clear path to profitability, attracting attention from major institutional investors [4][8] Investment Activity - ARK Invest invested approximately US$12.9 million in Pony.ai, marking its first investment in a Chinese firm focused on Level 4 autonomous driving technology [1] - At least 14 major global institutional investors backed Pony.ai in Q2, including Baillie Gifford and Nikko Asset Management, despite a general trend of U.S. investors moving away from Chinese assets [2] Market Potential - ARK's "Big Ideas 2025" report projects the ride-hailing market could reach US$10 trillion by 2030, with global robotaxi fleets potentially hitting around 50 million vehicles [3] - UBS analysts expect the robotaxi market value to reach US$183 billion in China and US$394 billion internationally by the late 2030s [9] Company Performance - Pony.ai reported a 158% year-on-year increase in robotaxi revenues in Q2, driven by the production of its seventh-generation robotaxi models [4] - The company aims to scale its fleet to 1,000 robotaxis by year-end, which is expected to achieve positive unit economics [5] Operational Efficiency - The Gen-7 vehicle has a 70% lower cost compared to its predecessor, with significant reductions in operational costs, including an 18% decrease in insurance costs [5] - Pony.ai has received commercial permits for fare-charging services in Shanghai and operates 24/7 in Guangzhou and Shenzhen [6][7] Analyst Sentiment - Following the Q2 earnings release, major institutions like Goldman Sachs and UBS rated Pony.ai's stock as "buy," with Goldman setting a price target of US$24.5, indicating a 54.5% upside [8]
What will it take for robotaxis to go global? | FT
Financial Times· 2025-08-18 04:00
Robo taxis are proving popular in cities like San Francisco, moving from concept to reality with the likes of Alphabet, Amazon, and Tesla all making significant investments in this space. Following the shuttering of General Motors cruise project, the US robo taxi market has fallen into the hands of just a few of the richest, most determined tech giants and a handful of startups bold enough to challenge them. Whimo, an autonomous driving tech company owned by Alphabet, Google's parent company, is now operati ...
文远知行获Grab数千万美元投资,加速在东南亚大规模部署Robotaxi
Sou Hu Cai Jing· 2025-08-18 01:40
Group 1 - WeRide, an autonomous driving technology company, announced a multi-million dollar equity investment from Southeast Asian super app platform Grab [1][3] - The investment is part of a strategic partnership aimed at accelerating the large-scale deployment of Level 4 Robotaxis and other autonomous vehicles in Southeast Asia [3] - The investment is expected to be completed by the first half of 2026, with the exact timing dependent on WeRide's chosen conditions [3] Group 2 - Grab's investment will support WeRide's international growth strategy, expanding its commercial autonomous vehicle fleet in Southeast Asia and promoting AI-driven mobility solutions [3] - WeRide's CEO, Han Xu, expressed the vision of gradually deploying thousands of Robotaxis in Southeast Asia, considering local regulations and societal acceptance [3] - The partnership leverages WeRide's advanced autonomous driving technology and operational experience alongside Grab's platform advantages to provide safe and efficient Robotaxi services [3]