Workflow
自动驾驶之心
icon
Search documents
今年CVPR,自动驾驶还能冲什么方向?
自动驾驶之心· 2025-10-28 00:03
Core Viewpoint - The article emphasizes the importance of targeted guidance and mentorship for students aiming to publish high-quality papers in top conferences like CVPR and ICLR, highlighting the need for strategic efforts in the final stages of the submission process [1][2][4]. Group 1: Submission Guidance - The article mentions that the majority of accepted papers in past conferences focus on localized breakthroughs and verifiable improvements, aligning closely with the main themes of the respective years [1]. - It suggests that the main theme for CVPR 2026 is likely to be "world models," indicating a strategic direction for potential submissions [1]. - The article encourages students to leverage the experiences of predecessors to enhance their submission quality, particularly in the final stages of preparation [2]. Group 2: Mentorship and Support - The organization, "Automated Driving Heart," is described as the largest AI technology media platform in China, with extensive academic resources and a deep understanding of the challenges in interdisciplinary fields like autonomous driving and robotics [3]. - The article highlights the success rate of their mentorship program, with a 96% acceptance rate for students over the past three years, indicating the effectiveness of their guidance [5]. - It outlines the personalized support provided, including assistance with research thinking, familiarization with research processes, and practical application of theoretical models [7][13]. Group 3: Program Structure and Offerings - The article details the structured support offered, including personalized paper guidance, real-time interaction with mentors, and unlimited access to recorded sessions for review [13]. - It specifies that the program caters to various academic levels and goals, from foundational courses for beginners to advanced mentorship for experienced researchers [17][19]. - The organization also provides opportunities for outstanding students, such as recommendations to prestigious institutions and direct referrals to leading tech companies [19].
自动驾驶春秋的终点
自动驾驶之心· 2025-10-28 00:03
Core Insights - The autonomous driving industry is transitioning from a "Spring and Autumn" period to a "Warring States" phase, indicating a shift from competitive acknowledgment to a struggle for dominance, where only leading players will survive [2][3]. Technical Route Dispute - The competition in autonomous driving has evolved from a ranking system to a life-and-death battle, with losers losing access to resources for continuous R&D [3]. - The 2022 Tesla AI Day II has significantly influenced the development direction of autonomous driving technology, leading to a divergence in technical paths among companies [4]. - Companies are exploring differentiated technical routes, with some abandoning LiDAR in favor of pure vision solutions, while others are experimenting with various mapping and planning algorithms [4][5]. Supplier Model Counterattack - As the technology experience reaches a plateau, the gap between leading autonomous driving teams is narrowing, leading to a price war in the automotive industry [6]. - Traditional automakers and smaller brands are increasingly opting for supplier solutions to reduce costs and enhance product capabilities, indicating a trend of "handing over their soul" to survive [6]. Data Barrier as a Key to Reversal - The current plateau in autonomous driving technology is attributed to the immaturity of data-driven solutions, with a heavy reliance on rule-based algorithms [7][9]. - The release of Tesla's FSD V14 highlights the importance of real-world data in enhancing autonomous driving AI, despite advancements in generative AI technologies [7][9].
TeraSim World:用开源方式重建「特斯拉式」世界模型
自动驾驶之心· 2025-10-28 00:03
Core Viewpoint - Tesla has showcased its internal World Model, a neural network-driven virtual world generator that synthesizes high-resolution videos from eight camera perspectives based on vehicle states and control inputs, enabling real-time environmental predictions and closed-loop validation [2][6]. Group 1: Tesla's World Model - Tesla's World Model allows for the replay of historical problem scenarios and the injection of new adversarial events in a virtual environment for testing and reinforcement learning [2]. - The model learns a general mapping of "perception-action-world change," making it applicable to other platforms like robotics, thus forming a basis for general physical intelligence [2]. Group 2: TeraSim World Framework - A research team from the University of Michigan, SaferDrive AI, the University of Hong Kong, and Tsinghua University has developed TeraSim World, an open-source framework that achieves similar generation and evaluation capabilities as Tesla's World Model without requiring real maps or sensor backgrounds [5][6]. - TeraSim World is designed to automatically generate city environments and traffic behaviors using AI, creating a fully data-driven, reproducible, and scalable world model platform [5]. Group 3: System Features - TeraSim World features a modular, fully automated data synthesis pipeline for generating realistic and safety-critical data for end-to-end autonomous driving [7]. - The system retrieves real-world road maps and converts them into simulation-ready formats, allowing for the automatic generation of digital maps based on user input [10][11]. - It can simulate realistic traffic conditions by automatically obtaining real-time traffic data, thus reflecting local traffic patterns [13]. Group 4: Agent and Sensor Simulation - The agent simulation component enables virtual vehicles, pedestrians, and cyclists to behave like their real-world counterparts, incorporating human driving characteristics [16]. - TeraSim World introduces safety-critical scenarios based on real-world accident probabilities, ensuring the generated events are both risky and realistic [17]. - The sensor simulation aspect generates realistic camera inputs and can be extended to other sensor types, utilizing NVIDIA's open-source Cosmos models for high-resolution, time-synchronized multi-view video generation [19][22][25]. Group 5: Automated Stress Testing - TeraSim World supports automated full-stack stress testing, generating and validating various risk scenarios to assess the stability and safety boundaries of autonomous driving systems [30]. - The framework can inject dynamic and static risks, such as sudden stops or environmental changes, to evaluate system responses under diverse conditions [30]. Group 6: Conclusion and Future Plans - TeraSim World combines agent and sensor simulation to provide a comprehensive data generation process for training and testing autonomous driving systems without the need for real-world data collection [31]. - The system aims to create a large-scale synthetic driving dataset and expand to multi-modal sensor simulations, establishing an open virtual testing ground for researchers and developers [32].
最新一篇长达76页的Agentic AI综述
自动驾驶之心· 2025-10-28 00:03
Core Insights - The article discusses the evolution of Agentic AI from pipeline-based systems to model-native paradigms, emphasizing the internalization of reasoning, memory, and action capabilities within the models themselves [1][44]. - It highlights the role of reinforcement learning (RL) as a driving force in transforming static models into adaptive, goal-oriented entities capable of learning from interactions with their environment [1][44]. Background - The rapid advancement of generative AI has primarily focused on reactive outputs, lacking long-term reasoning and environmental interaction. The shift towards Agentic AI emphasizes three core capabilities: planning, tool usage, and memory [3]. - Early systems relied on pipeline paradigms where these capabilities were externally orchestrated, leading to passive models that struggled in unexpected scenarios. The new model-native paradigm integrates these capabilities directly into the model parameters, allowing for proactive decision-making [3][6]. Reinforcement Learning for LLMs - The scarcity of programmatic data and vulnerability to out-of-distribution scenarios necessitate the use of result-driven RL to internalize planning and other capabilities, moving away from prompt-induced behaviors [6][7]. - RL offers advantages over supervised fine-tuning (SFT) by enabling dynamic exploration and relative value learning, transforming models from passive imitators to active explorers [8][9]. Unified Paradigm and Algorithm Evolution - Early RLHF methods excelled in single-turn alignment but struggled with long-term, multi-turn, and sparse rewards. Newer result-driven RL methods like GRPO and DAPO enhance training stability and efficiency [12]. - The evolution of algorithms involves leveraging foundational models to provide priors while refining capabilities through interaction and rewards in task environments [12]. Core Capabilities: Planning - The pipeline paradigm views planning as automated reasoning and action sequence search, which is limited in flexibility and stability under complex tasks [14][15]. - The model-native paradigm integrates planning capabilities directly into model parameters, enhancing flexibility and robustness in open environments [15][18]. Core Capabilities: Tool Usage - Early systems embedded models in fixed nodes, lacking flexibility. The model-native transition internalizes decision-making regarding tool usage, forming a multi-objective decision problem [21][22]. - Challenges remain in credit assignment and environmental noise, which can destabilize training. Modular training approaches aim to isolate execution noise and improve sample efficiency [22]. Core Capabilities: Memory - Memory capabilities have evolved from external modules to integral components of task execution, emphasizing action-oriented evidence governance [27][30]. - Short-term memory utilizes techniques like sliding windows and retrieval-augmented generation (RAG), while long-term memory focuses on external libraries and parameter-based internalization [30]. Future Directions - The trajectory of Agentic AI indicates a shift towards deeper integration between models and their environments, moving from systems designed to use intelligence to those that grow intelligence through experience and collaboration [44].
输出你的insights!寻找散落在各地的自动驾驶热爱者(产品/4D标注/世界模型等)
自动驾驶之心· 2025-10-27 09:14
Core Viewpoint - The article emphasizes the need for collaboration in the autonomous driving industry, inviting professionals to participate in training, course development, and research support to drive industry progress [2]. Group 1: Collaboration and Opportunities - The company is seeking partnerships with professionals in the autonomous driving field to enhance training and job assistance initiatives [2]. - High compensation and abundant industry resources will be provided to collaborators [3]. - The main focus areas for collaboration include roles such as autonomous driving product managers, 4D annotation/data loop, world models, VLA, autonomous driving large models, reinforcement learning, and end-to-end systems [4]. Group 2: Training and Development - The training collaboration is aimed at both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) audiences [5]. - The company is also interested in course development and original article creation as part of its training initiatives [5]. Group 3: Contact Information - Interested parties can reach out via WeChat for further consultation [6].
今日暴论:Deepseek-OCR干翻了所有架构
自动驾驶之心· 2025-10-27 00:03
Core Viewpoint - DeepSeek has introduced a new model, DeepSeek-OCR, which significantly reduces the number of tokens required to store and process information by utilizing images as memory carriers instead of relying solely on text tokens [3][6][12]. Group 1: Model Capabilities - DeepSeek-OCR can store nearly the same amount of information using only one-tenth of the tokens compared to traditional models [40][41]. - In tests, DeepSeek-OCR achieved superior performance, using only 100 visual tokens to surpass the 256 tokens required by GOT-OCR 2.0, and less than 800 visual tokens to outperform MinerU 2.0, which typically requires over 6000 tokens [13][14]. - The model supports various resolutions and compression modes, allowing it to adapt to different document complexities, such as using only 64 visual tokens for simple documents [18][21]. Group 2: Data Collection and Utilization - DeepSeek-OCR can capture previously uncollected data from two-dimensional information, such as graphs and images in academic papers, which traditional models could not interpret [32][33]. - The model can generate over 200,000 pages of training data in a day on an A100 GPU, indicating its efficiency in data collection [35]. Group 3: Resource Efficiency - By using images for memory, DeepSeek-OCR reduces the computational load, allowing for a significant decrease in token usage without sacrificing performance [40][41]. - The model can maintain 96.5% accuracy while using only one-tenth of the original token count, demonstrating its effectiveness in resource management [41][42]. Group 4: Open Source and Community Contributions - The development of DeepSeek-OCR is a collaborative effort, utilizing various open-source resources, including Huawei's Wukong dataset and Meta's SAM for image feature extraction [51][53]. - The integration of multiple open-source models has enabled DeepSeek to create an AI capable of "thinking in images," showcasing the power of community-driven innovation [53].
北大World-in-World:闭环下的具身世界模型评估框架!
自动驾驶之心· 2025-10-27 00:03
Core Insights - The article discusses the need to redefine the evaluation of world models in embodied intelligence, emphasizing that visual quality does not equate to task effectiveness [5][26]. - The introduction of the "World-in-World" platform aims to assess world models through closed-loop interactions, focusing on their practical utility rather than just visual fidelity [6][26]. Evaluation of World Models - Current evaluation systems prioritize visual clarity and scene rationality, neglecting whether these models can assist agents in decision-making for real tasks [5][6]. - The platform introduces a closed-loop system that integrates observation, decision-making, execution, and re-observation, ensuring fair and practical assessments [6][7]. Model Compatibility and Decision-Making - A unified action API is established to standardize input across different world models, allowing them to process the same tasks effectively [7]. - The decision-making process is structured into three phases: proposal generation, simulation of outcomes, and selection of the optimal action based on task goals [8][13]. Experimental Findings - Experiments with 12 mainstream world models revealed that visual realism does not guarantee task success; instead, action alignment is crucial [18][20]. - Fine-tuning smaller models with task-specific data proved more effective than simply using larger pre-trained models, highlighting a cost-effective optimization strategy [21][23]. - Increasing computational effort for simulations significantly improved task success rates, suggesting that more extensive predictive modeling leads to better decision-making [23]. Limitations and Future Directions - While models excel in perception and navigation, they struggle with physical manipulation tasks due to a lack of physical modeling considerations [25]. - The article concludes that future developments should focus on enhancing controllability, utilizing task data for fine-tuning, and incorporating physical modeling to improve the practical application of world models in robotics [26].
正式结课!工业界大佬带队三个月搞定端到端自动驾驶
自动驾驶之心· 2025-10-27 00:03
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry is witnessing rapid development in end-to-end methods, particularly the one-stage approach exemplified by UniAD, which directly models vehicle trajectories from sensor inputs [1][3]. - There are two main paradigms in the industry: one-stage and two-stage methods, with the one-stage approach gaining traction and leading to various derivatives based on perception, world models, diffusion models, and VLA [3][5]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, focusing on cutting-edge algorithms in both one-stage and two-stage end-to-end methods, aimed at bridging academic and industrial advancements [5][15]. - The course is structured into several chapters, covering the history and evolution of end-to-end methods, background knowledge on VLA, and detailed discussions on both one-stage and two-stage approaches [9][10][12]. Group 3: Key Technologies - The course emphasizes critical technologies such as BEV perception, visual language models (VLM), diffusion models, and reinforcement learning, which are essential for mastering the latest advancements in autonomous driving [5][11][19]. - The second chapter of the course is highlighted as containing the most frequently asked technical keywords for job interviews in the next two years [10]. Group 4: Practical Applications - The course includes practical assignments, such as RLHF fine-tuning, allowing participants to apply their knowledge in real-world scenarios and understand how to build and experiment with pre-trained and reinforcement learning modules [13][19]. - The curriculum also covers various subfields of one-stage end-to-end methods, including those based on perception, world models, diffusion models, and VLA, providing a comprehensive understanding of the current landscape in autonomous driving technology [14][19].
2025年的理想还在不断突破,年度成果一览......
自动驾驶之心· 2025-10-27 00:03
Core Insights - Li Auto has successfully entered the domestic smart driving tier one since the mass production of its end-to-end + VLM dual system last year, maintaining a leading position in both academic work and mass production solutions [3][4] - The company is transitioning from a new energy vehicle brand to an AI enterprise, driven by advancements in embodied intelligence and large models [3] - The VLA driver model, featuring innovative architecture, enhances capabilities in spatial understanding, reasoning, communication, memory, and behavior [3][4] VLA & VLM - ReflectDrive introduces discrete diffusion for reflective vision-language-action models in autonomous driving, aiming for scalable and efficient trajectory generation [8][13] - OmniReason establishes a temporal-guided framework for VLA, emphasizing causal reasoning in diverse driving scenarios [11][16] - LightVLA presents a differentiable token pruning framework to enhance efficiency in VLA models, achieving significant reductions in computational load while improving success rates [14][17] - DriveAgent-R1 focuses on human-like driving decisions, introducing a hybrid thinking architecture that adapts to complex environments [19] End-to-End Trajectory Generation - World4Drive is an open-source VLA dataset covering diverse driving scenarios across 148 cities in China, ensuring high-quality and representative data [21][25] - TransDiffuser enhances trajectory generation through a novel end-to-end framework that integrates multimodal driving intentions without relying on perception annotations [23][26] World Models - RLGF proposes a reinforcement learning framework for generating driving videos, addressing geometric distortion issues in autonomous driving [29][34] - GeoDrive innovatively incorporates 3D point cloud rendering into the generation paradigm, improving spatial consistency and controllability [40] Other Innovations - TokenFLEX introduces a unified training framework for dynamic visual token inference, enhancing model robustness across varying token counts [50] - RuscaRL addresses exploration bottlenecks in reinforcement learning, promoting independent learning through structured external support [56]
摇人!寻找散落在各地的自动驾驶热爱者(产品/4D标注/世界模型等)
自动驾驶之心· 2025-10-25 16:03
Core Viewpoint - The article emphasizes the need for collaboration in the autonomous driving industry, inviting professionals to participate in training, course development, and research support to drive industry progress [2]. Group 1: Collaboration and Opportunities - The company is seeking partnerships with professionals in the autonomous driving field to enhance training and job guidance services [2]. - High compensation and abundant industry resources will be provided to collaborators [3]. - The main focus areas for collaboration include roles such as autonomous driving product managers, 4D annotation/data loop, world models, VLA, autonomous driving large models, reinforcement learning, and end-to-end systems [4]. Group 2: Training and Development - The positions are primarily aimed at B2B training for enterprises, universities, and research institutions, as well as C2C training for students and job seekers [5]. - The company encourages interested individuals to reach out for further consultation via WeChat [6].