自动驾驶之心 - filings, earnings calls, financial reports, news

自动驾驶之心

Search documents

自动驾驶之心· 2025-12-24 00:58

点击下方卡片，关注" 3D视觉之心 "公众号第一时间获取 3D视觉干货三维扫描技术正以前所未有的速度渗透到工程、研究与创作等多个领域，但如何在精度、便携性与开放性之间找到平衡点，始终是行业面临的核心挑战。今天，我们就带你认识一款打破常规的设备——它不仅具备厘米级实时重建能力，更以轻巧一体化的设计、多模态融合的传感器架构，以及深度开放的二次开发支持，为专业级三维采集开辟了全新可能！这款能兼顾精准与灵活的工具，正是你释放创意的最佳搭档。别让工具限制你的创造力！欢迎进一步咨询 vision_tech3d ，获取详细资料：产品亮点 GeoScan S1 最强性价比实景三维激光扫描仪——以"轻量、智能、开放"为核心，为你带来全新体验：厘米级实时重构｜多模态传感器融合算法，实现场景高精度实时建模高效采集｜每秒10万+点云，70米测距，360°全域覆盖，支持5万平米以上的大场景一键启动｜轻量化设计，即开即用，大幅降低操作门槛开放扩展｜支持跨平台集成，配备高带宽网口及双USB 3.0，助力二次开发，开启更多可能可选3D高斯模块｜实现高保真实景还原，满足专业级建模需求手持一体供电｜自带Ubun ...

自动驾驶之心· 2025-12-24 00:58

Core Insights - The article discusses the evolution of "Agent" technology, highlighting the emergence of "Deep Agent" and "Claude Agent SDK" as leading architectures in the field [3][57]. - It emphasizes that 2025 marks a pivotal year for agents, where technology readiness is evident, but full replacement of traditional methods has not yet been achieved [5][6]. Technical Perspectives - The architecture of agents has converged towards a general form represented by Claude Code and Deep Agent, focusing on their capabilities beyond programming [3][4]. - The article notes that the core capabilities of Claude Code, such as planning and context management, are applicable to various tasks beyond coding, leading to its rebranding as Claude Agent SDK [9]. Industry Recognition - The article asserts that while agent products have generated significant revenue in sectors like recruitment and marketing, the impact is less visible domestically due to a concentration of business in overseas markets [10]. - It identifies a shift in focus from technical architecture to business restructuring, emphasizing the need for industry professionals to adapt traditional workflows to be agent-friendly [10]. Definition and Characteristics of Deep Agent - A "Deep Agent" is characterized by its industry-specific knowledge and long-running capabilities, ensuring stability and reliability in task execution [11][12]. - The article outlines that a Deep Agent must demonstrate high levels of specialization and the ability to perform complex, multi-step tasks without failure [12]. Skills and Context Management - The introduction of "Agent Skills" allows for a more dynamic and efficient way to integrate business knowledge into agents, enhancing their capabilities [22][30]. - The concept of progressive disclosure is highlighted as a key design principle, enabling agents to load information as needed rather than all at once, improving context management [32][34]. Planning and Task Management - Planning is identified as a crucial component for agents to execute long-term tasks effectively, with the ability to decompose tasks into manageable sub-tasks [47][50]. - The article discusses the importance of context isolation and parallel execution in sub-agents, which enhances efficiency and reduces context confusion [50]. System Prompt and File Management - The article emphasizes the significance of detailed system prompts in guiding agent behavior and ensuring effective task execution [52]. - A well-structured file system is proposed as a means to manage context and facilitate collaboration among agents, allowing for long-term memory and efficient information retrieval [53][56]. Conclusion on Agent Technology - The article concludes that the agent technology landscape has reached a point of convergence, with established architectures like Claude Agent SDK and Deep Agent leading the way [57][58]. - It suggests that the future of agent technology will involve further specialization and adaptation to specific business needs, leveraging the strengths of existing frameworks [69][71].

Agent

Context Engineering

Artificial Intelligence

Artificial Intelligence

Claude Code

Claude Agent SDK

Deep Agent

走向融合统一的VLA和世界模型......

自动驾驶之心· 2025-12-23 09:29

Core Viewpoint - The article discusses the integration of two advanced directions in autonomous driving: Vision-Language-Action (VLA) and World Model, highlighting their complementary nature and the trend towards their fusion for enhanced decision-making capabilities in autonomous systems [2][51]. Summary by Sections Introduction to VLA and World Model - VLA, or Vision-Language-Action, is a multimodal model that interprets visual inputs and human language to make driving decisions, aiming for natural human-vehicle interaction [8][10]. - World Model is a generative spatiotemporal neural network that simulates future scenarios based on high-dimensional sensor data, enabling vehicles to predict outcomes and make safer decisions [12][14]. Comparison of VLA and World Model - VLA focuses on human interaction and interpretable end-to-end autonomous driving, while World Model emphasizes future state prediction and simulation for planning [15]. - The input for VLA includes sensor data and explicit language commands, whereas World Model relies on sequential sensor data and vehicle state [13][15]. - VLA outputs direct action control signals, while World Model provides future scene states without direct driving actions [15]. Integration and Future Directions - Both technologies share a common background in addressing the limitations of traditional modular systems and aim to enhance autonomous systems' cognitive and decision-making abilities [16][17]. - The ultimate goal for both is to enable machines to understand environments and make robust plans, with a focus on addressing corner cases in driving scenarios [18][19]. - The article suggests that the future of autonomous driving may lie in the deep integration of VLA and World Model, creating a comprehensive system that combines perception, reasoning, simulation, decision-making, and explanation [51]. Examples of Integration - The article mentions several research papers that explore the fusion of VLA and World Model, such as 3D-VLA, which aims to enhance 3D perception and planning capabilities [24][26]. - Another example is WorldVLA, which combines action generation with environmental understanding, addressing the semantic and functional gaps between the two models [28][31]. - The IRL-VLA framework proposes a closed-loop reinforcement learning approach for training VLA models without heavy reliance on simulation, enhancing their practical application [34][35]. Conclusion - The article concludes that the integration of VLA and World Model is a promising direction for the next generation of autonomous driving technologies, with ongoing developments from various industry players [51].

Autonomous Driving

Vision-Language-Action

World Model

Model Fusion

Autonomous Driving

VLA (Vision-Language-Action)

Autonomous Driving

Vision-Language-Action

World Model

Model Fusion

Autonomous Driving

VLA (Vision-Language-Action)

研二上了，想咨询下实验到什么程度可以写小论文？

自动驾驶之心· 2025-12-23 09:29

Core Viewpoint - The article emphasizes the importance of timely submission of academic papers, particularly for graduate students, highlighting that a complete story in research is more valuable than novelty [1]. Group 1: Academic Guidance Services - The company offers a paper guidance service aimed at efficiently producing research results within a limited timeframe, helping students avoid common pitfalls in self-writing [2]. - The guidance covers various advanced topics such as reinforcement learning, 3D object detection, and multi-sensor fusion, among others, ensuring comprehensive support for students [3]. - The service is designed to assist students with unclear directions, idea generation, code reproduction difficulties, and writing challenges [5]. Group 2: Instructor Qualifications - All instructors associated with the service are from globally recognized universities ranked in the top 100 QS, with multiple publications in A-level conferences and extensive project experience [6]. Group 3: Comprehensive Academic Support - The company provides a full range of academic support services, including journal papers, conference papers, and thesis assistance, ensuring a holistic approach to academic success [8]. - The service is results-oriented, offering continuous support until the paper is submitted, with a focus on enhancing coding skills [8]. Group 4: FAQs and Additional Benefits - The company assures that even students with no prior experience can publish papers by following structured courses, with the potential to produce a small paper within six months [11]. - Outstanding students may receive recommendation letters from prestigious institutions and opportunities for internships at leading companies, indicating that publishing papers is just the beginning of their academic journey [11].

正式开售！面向科研的自动驾驶全栈小车......

自动驾驶之心· 2025-12-23 03:43

Core Viewpoint - The article introduces the "Black Warrior 001," a cost-effective and easy-to-use autonomous driving educational vehicle designed for research and teaching purposes, priced at 36,999 yuan, which includes various advanced features and training courses [2][4]. Group 1: Product Overview - The Black Warrior 001 is a lightweight solution that supports multiple functionalities such as perception, localization, fusion, navigation, and planning, built on an Ackermann chassis [4]. - It is suitable for undergraduate learning, graduate research, and as a teaching tool for educational institutions and training companies [4]. Group 2: Performance Demonstration - The vehicle has been tested in various environments, including indoor, outdoor, and parking garage scenarios, showcasing its capabilities in perception, localization, fusion, navigation, and planning [6][8][12][14][16][18][20]. Group 3: Hardware Specifications - Key sensors include a Mid 360 3D LiDAR, a 2D LiDAR from Lidar, a depth camera from Orbbec, and a main control chip Nvidia Orin NX with 16GB RAM [22][23]. - The vehicle weighs 30 kg, has a battery power of 50W, operates at 24V, and has a maximum speed of 2 m/s [25][26]. Group 4: Software and Functionality - The software framework includes ROS, C++, and Python, with features for one-click startup and a comprehensive development environment [28][36]. - The vehicle supports various functionalities such as 2D and 3D SLAM, point cloud processing, vehicle navigation, and obstacle avoidance [29]. Group 5: After-Sales and Support - The company offers one year of after-sales support for non-human damage, with free repairs for damages caused by user errors during the warranty period [52].

几家新势力都陷入了三万俱乐部的疲态......

自动驾驶之心· 2025-12-23 03:43

晚点LatePost . 晚一点，好一点本文经授权转自《晚点LatePost》（ID：postlate）作者：Evan 点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线以下文章来源于晚点LatePost ，作者晚点团队 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文根据乘联会的初步统计，2025 年 11 月，全国乘用车新能源市场零售 135.4 万辆，同比去年同期增长 7%，较上月增长 6%。根据车企公开的数据，11 月，理想汽车交付 33181 辆；小鹏汽车交付 36728 辆（含海外）；蔚来公司交付 36,275 辆，其中，蔚来品牌交付新车 18393 台，乐道品牌交付新车 11794 台，firefly 萤火虫品牌交付新车 6088 台。 | | | | | 小鹏 | | | | | | | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 车型 | M03 | P7+ | લ્હ ...

自动驾驶之心· 2025-12-23 03:43

Core Insights - The article emphasizes the importance of Reinforcement Learning (RL) in enhancing the generalization capabilities of Vision-Language-Action (VLA) models, with some experiments showing performance improvements of up to 42.6% on out-of-distribution tasks [2]. Group 1: VLA and RL Integration - VLA models are currently reliant on RL to overcome limitations in real-world out-of-distribution scenarios, where imitation learning alone proves insufficient [2]. - Recent advancements in VLA+RL frameworks have led to significant breakthroughs, with several notable papers published this year [2]. - Tools supporting VLA+RL frameworks, such as Rlinf, are becoming increasingly comprehensive, offering a variety of methods for researchers [2]. Group 2: Notable Research Papers - A summary of representative VLA+RL research papers from the past two years is provided, indicating a growing body of work in this area [5]. - Specific papers mentioned include "NORA-1.5," "Balancing Signal and Variance," and "CO-RFT," which focus on different aspects of VLA and RL integration [5][10]. - The article encourages further research in these areas and offers assistance for those looking to explore VLA, real2sim2real, and RL [3].

自动驾驶之心· 2025-12-23 00:53

Core Viewpoint - The article discusses the application of navigation information in autonomous driving, emphasizing its importance in providing lane guidance, waypoint information, and reference lines to enhance vehicle path planning and control [2][4][31]. Group 1: Navigation Information Application - Navigation information SD/SD Pro is already utilized in many production solutions, offering a rough global and local view for drivers [2]. - The core responsibilities of the navigation module include providing reference lines, which significantly reduce planning pressure by offering a predefined driving path [4]. - Additional functionalities include providing planning constraints and priorities, as well as path monitoring and replanning [5]. Group 2: Path Planning and Behavior Guidance - Global path planning at the lane level involves searching for the optimal lane sequence to reach the target lane [6]. - Behavior planning is enhanced by providing clear semantic guidance, allowing vehicles to prepare for lane changes, deceleration, and yielding in advance [6]. Group 3: Course Overview - The course titled "End-to-End Practical Class for Mass Production" focuses on practical applications in autonomous driving, covering topics from one-stage and two-stage frameworks to trajectory optimization and production experience sharing [23]. - The curriculum includes chapters on end-to-end task overview, two-stage and one-stage algorithms, navigation information applications, reinforcement learning in autonomous driving, trajectory output optimization, fallback solutions, and mass production experience [28][30][31][32][33][34][35]. Group 4: Target Audience and Course Details - The course is aimed at advanced learners with a background in autonomous driving algorithms, reinforcement learning, and programming [36][38]. - The course will commence on November 30, with a duration of three months, featuring offline video teaching and online Q&A sessions [36][39].

AI Day直播！免位姿前馈4D自动驾驶世界DGGT

自动驾驶之心· 2025-12-23 00:53

论文标题： DGGT:Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images 点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 >>直播和内容获取转到 → 自动驾驶之心知识星球点击按钮预约直播自动驾驶的训练与评估需要快速、可扩展的4D重建与重新仿真能力，然而现有大多数针对动态驾驶场景的方法仍依赖于逐场景优化、已知相机标定或短时间窗口，导致速度缓慢、实用性受限。本文从前馈视角重新审视该问题，提出了 Driving Gaussian Grounded Transformer（DGGT），一个统一的、无需位姿的动态场景重建框架。本文注意到，现有方法通常将相机位姿作为必需输入，限制了灵活性与可扩展性。相反，本文将位姿重新定义为模型的输出，从而能够直接从稀疏、无位姿的图像进行重建，并支持长序列中任意数量的视角。该方法联合预测每帧的3D高斯图与相机参数，通过轻量级动态头解耦动态元素，并利用寿命头调制随时间变化的可见性以保持时序一致性。此外，基于扩散的渲 ...

Autonomous Driving

4D Reconstruction

Autos

Driving Gaussian Grounded Transformer (DGGT)

Autonomous Driving

4D Reconstruction

Autos

Driving Gaussian Grounded Transformer (DGGT)

深扒特斯拉ICCV的分享，我们找到了几个业内可能的解决方案......

自动驾驶之心· 2025-12-23 00:53

Core Insights - The article discusses Tesla's end-to-end autonomous driving solution, highlighting the challenges and innovative solutions developed to address them [3] Group 1: Challenges and Solutions - Challenge 1: Curse of dimensionality, requiring breakthroughs in both input and output layers to enhance computational efficiency and decision accuracy [4] - Solution: UniLION, a unified autonomous driving framework based on linear group RNN, efficiently processes multi-modal data and eliminates the need for intermediate perception and prediction results [4][7] - UniLION's key features include a unified 3D backbone network and the ability to handle various tasks simultaneously, achieving significant performance metrics such as 75.4% NDS and 73.2% mAP in detection tasks [11] Group 2: Interpretability and Safety - Challenge 2: The need for interpretability and safety guarantees in autonomous driving systems, which traditional models struggle to provide [12] - Solution: DrivePI, a unified spatial-aware 4D multi-modal large language model (MLLM) framework that integrates visual and language inputs to enhance system interpretability and safety [13][14] - DrivePI demonstrates superior performance in 3D occupancy prediction and trajectory planning, significantly reducing collision rates compared to existing models [13][17] Group 3: Evaluation - Challenge 3: The complexity of evaluating autonomous driving systems due to the unpredictability of human driving behavior and diverse interaction scenarios [18] - Solution: GenieDrive, a world model framework that uses 4D occupancy representation to generate physically consistent multi-view video sequences, enhancing the evaluation environment for autonomous systems [21][22] - GenieDrive achieves a 7.2% improvement in mIoU for 4D occupancy prediction and reduces FVD metrics by 20.7%, establishing new performance benchmarks [21][27] Group 4: Integrated Ecosystem - The three innovations—UniLION, DrivePI, and GenieDrive—form a synergistic ecosystem that enhances perception, decision-making, and evaluation in autonomous driving [30][31] - This integrated approach addresses key challenges in the industry, paving the way for safer, more reliable, and efficient autonomous driving systems, ultimately accelerating the transition to L4/L5 level autonomy [31]