自动驾驶之心

Search documents
硬核夜话:和一线量产专家深入聊聊自驾数据闭环工程
自动驾驶之心· 2025-08-01 16:03
Core Viewpoint - The article emphasizes the importance of a complete data closed-loop system in autonomous driving, which includes data collection, annotation, training, simulation validation, and OTA updates. As autonomous driving evolves from Level 2 to higher levels, the volume of data increases exponentially, making the breadth and depth of scenario coverage crucial for system safety [3]. Group 1: Data Closed-Loop Challenges - The data closed-loop engineering faces three core challenges: 1. The "long tail problem," which refers to the difficulty in capturing and incorporating rare but critical extreme scenarios (e.g., extreme weather, complex road conditions, sudden obstacles) into the training system [3]. 2. Data processing efficiency, as each vehicle generates terabytes of data daily due to increased sensor quantity and precision, necessitating effective filtering, annotation, and utilization of this data [3]. 3. Verification difficulties, where traditional testing methods cannot cover all possible scenarios, highlighting the need for a scientific complement between simulation testing and real-world validation [3]. Group 2: Industry Transition - The industry is transitioning from a focus on "function stacking" to "safety-centric" approaches. The challenges of data closed-loop engineering extend beyond technology to include establishing scientific verification standards, improving data processing efficiency, and balancing iteration speed with system stability to maintain a positive feedback loop in data utilization [3]. Group 3: Expert Insights - The article mentions an invitation to a data expert, Ethan, to discuss the deep challenges faced during the mass production process of autonomous driving, focusing on the essence of engineering rather than flashy technology [3].
智元机器人罗剑岚老师专访!具身智能的数采、仿真、场景与工程化~
自动驾驶之心· 2025-08-01 16:03
1. 大家都知道数数据是提升智能燃料,然后传感器又是采集数据的关键,想问一下智元在传感器的研发采 购上有什么规划?如何增加产品数据的使用性? 罗剑岚:我们已与多家传感器供应商展开合作,重点聚焦视觉触觉与高密度传感器的联合研发。同时,我 们正在构建跨平台的数据采集 API,实现任务语义的统一映射,为模型训练提供标准化、可训练的数据输 入。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 具身智能之心受邀参加WAIC 2025智启具身论坛,并有幸采访到了智元机器人首席科学家罗剑岚博 士。以下为采访过程中罗博重点提到和探讨的问题。 具身智能数据讨论 2. 因为你刚才说的世界模型挺有用的,加入世界模型以后,加一些采集数据可以让它变好了,我想知道完 成这一步之后距离应用还有多远,从采集完数据到应用之间还有什么门槛? 罗剑岚:还有性能,机器人的性能要很高,真正变得有用,在你家里,给一个机器人扫地也好,或者装洗 碗机的机器人,要有95%的成功率,在100万家庭里面,这是很难的问题。 3. Sergey Levine他有发过最新的一篇文章,提出了一个Sporks of AGI观点。仿真会阻碍具身智能的scale。 我想知 ...
ACM MM'25 | 自驾2D目标检测新SOTA!超越最新YOLO Series~
自动驾驶之心· 2025-08-01 16:03
Core Viewpoint - The article discusses a new detection framework called Butter, designed to improve target detection in autonomous driving scenarios by addressing the challenges of multi-scale semantic information modeling and enhancing detection robustness and deployment efficiency [3][11]. Group 1: Framework Innovations - Butter introduces two core innovations in the Neck layer: the Frequency Consistency Enhancement Module (FAFCE) and the Progressive Hierarchical Feature Fusion Network (PHFFNet) [3][15]. - FAFCE enhances boundary resolution by integrating high-frequency detail enhancement with low-frequency noise suppression, while PHFFNet progressively fuses semantic information to strengthen multi-scale feature representation [3][15]. Group 2: Performance Metrics - Butter outperforms existing state-of-the-art (SOTA) methods in detection accuracy with significantly lower parameter counts, achieving a mean Average Precision (mAP@50) of 94.4% on the KITTI dataset, surpassing the previous best by 1.2 percentage points while using only about one-third of the computational load [32][34]. - On the BDD100K and Cityscapes datasets, Butter achieved mAP@50 scores of 53.7% and 53.2%, respectively, demonstrating superior performance compared to other lightweight models, particularly with a 1.6 percentage point improvement on Cityscapes [32][34]. Group 3: Structural Challenges - Existing Neck structures often face issues such as frequency aliasing and rigid fusion processes, which compromise feature expression and detection accuracy, particularly for small targets in complex environments [9][10]. - Butter's design addresses these structural bottlenecks by decoupling frequency modeling and multi-scale fusion, achieving a balance between accuracy and efficiency [11][12]. Group 4: Methodology Overview - The Butter framework begins with a 640×640 monocular image, extracting initial features through a lightweight Backbone module, followed by refinement through various lightweight blocks before entering the Neck module [16][17]. - The model employs a four-output head in the Head layer to generate final detection results, including class labels, confidence scores, and bounding boxes [16][17]. Group 5: Feature Fusion Techniques - FAFCE enhances feature fusion accuracy and robustness by employing high-frequency amplification and low-frequency damping mechanisms, which improve the consistency and precision of multi-scale feature integration [20][27]. - PHFFNet implements a hierarchical fusion strategy that alleviates semantic discrepancies between non-adjacent layers, significantly enhancing detection accuracy and alignment in scenarios requiring precise boundary detection [29][30].
告别被动感知!DriveAgent-R1:主动视觉探索的混合思维高级Agent
自动驾驶之心· 2025-08-01 07:05
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 上海期智研究院、理想、同济和清华团队最新的工作— DriveAgent-R1 ! 自动驾驶Agent时代来临, 以混合思维 和主动感知推动基于VLM的自动驾驶发展 。如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与 技术交流群加入 ,也欢迎添加小助理微信AIDriver005 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Weicheng Zheng等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 DriveAgent-R1 是一款为解决长时程、高层级行为决策挑战而设计的先进自动驾驶智能体。当前VLM在自动驾驶领域的潜力,因其短视的决策模式和被动的感知方 式而受到限制,尤其在复杂环境中可靠性不足。 为应对这些挑战,DriveAgent-R1 引入了两大核心创新: 因此,我们的核心任务是:赋能智能体进行长时程、高层级的行为决策,同时,当面临不确定性时,能像人类驾驶员一样主动地从环境中寻求关键信息。 上图生动展示了DriveAgent-R1在 ...
智源研究院具身智能大模型研究员岗位开放了 ,社招、校招、实习都可!
自动驾驶之心· 2025-08-01 07:05
Core Viewpoint - The article announces the recruitment of researchers for embodied intelligent large models at Zhiyuan Research Institute, offering various employment formats including social recruitment, campus recruitment, and internships [1]. Group 1: Job Responsibilities - Responsible for research and development of embodied intelligent large models (VLA models or hierarchical architectures) [4]. - Design and optimize model architectures, handle data processing, training, and deployment on real machines [4]. - Conduct in-depth research on cutting-edge technologies in the field of embodied intelligence, track the latest developments in the large model industry, and explore the application of new technologies in this field [4]. Group 2: Job Requirements - Master's degree or above in relevant fields such as computer science, artificial intelligence, robotics, automation, or mathematics [4]. - Proficiency in Python with a solid foundation in deep learning, familiar with deep learning frameworks like TensorFlow and PyTorch [4]. - Research experience in the large model field with a deep understanding of mainstream visual and language large models, including experience in pre-training, fine-tuning, and deployment processes [4]. - Experience in robot control and familiarity with mainstream embodied model training and deployment is preferred [4]. - Excellent learning ability, English proficiency, hands-on skills, and good team communication and collaboration skills; publication of relevant papers in top conferences (RSS, ICRA, CVPR, CoRL, ICLR, NeurIPS, ACL, etc.) is preferred [4]. Group 3: Community and Resources - AutoRobo Knowledge Planet serves as a community for job seekers in autonomous driving, embodied intelligence, and robotics, currently with nearly 1,000 members from various companies [6]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and internal referrals [6][7]. - The platform also shares job openings in algorithms, development, and product roles, including campus recruitment, social recruitment, and internships [7]. Group 4: Industry Reports - The community compiles various industry reports to help members understand the current state, development trends, market opportunities, and the landscape of the embodied intelligence industry [15]. - Reports include topics such as the World Robotics Report, China's Embodied Intelligence Venture Capital Report, and the development of humanoid robots [16].
万字长文!首篇智能体自进化综述:迈向超级人工智能之路~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the transition from static large language models (LLMs) to self-evolving agents that can adapt and learn continuously from interactions with their environment, aiming for artificial superintelligence (ASI) [3][5][52] - It emphasizes three fundamental questions regarding self-evolving agents: what to evolve, when to evolve, and how to evolve, providing a structured framework for understanding and designing these systems [6][52] Group 1: What to Evolve - Self-evolving agents can improve various components such as models, memory, tools, and workflows to enhance performance and adaptability [14][22] - The evolution of agents is categorized into four pillars: cognitive core (model), context (instructions and memory), external capabilities (tool creation), and system architecture [22][24] Group 2: When to Evolve - Self-evolution occurs in two main time modes: intra-test-time self-evolution, which happens during task execution, and inter-test-time self-evolution, which occurs between tasks [26][27] - The article outlines three basic learning paradigms relevant to self-evolution: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement learning (RL) [27][28] Group 3: How to Evolve - The article discusses various methods for self-evolution, including reward-based evolution, imitation and demonstration learning, and population-based approaches [32][36] - It highlights the importance of continuous learning from real-world interactions, seeking feedback, and adjusting strategies based on dynamic environments [30][32] Group 4: Evaluation of Self-evolving Agents - Evaluating self-evolving agents presents unique challenges, requiring assessments that capture adaptability, knowledge retention, and long-term generalization capabilities [40] - The article calls for dynamic evaluation methods that reflect the ongoing evolution and diverse contributions of agents in multi-agent systems [51][40] Group 5: Future Directions - The deployment of personalized self-evolving agents is identified as a critical goal, focusing on accurately capturing user behavior and preferences over time [43] - Challenges include ensuring that self-evolving agents do not reinforce existing biases and developing adaptive evaluation metrics that reflect their dynamic nature [44][45]
聊聊算法秋招岗该如何准备?2025我的秋招总结~
自动驾驶之心· 2025-07-31 23:33
最近邀请了几个星球 嘉宾录制了一些求职类的视频课程,希望能帮助到正在秋招/社招的小伙伴。 主要关于小 厂、大厂面试,秋招的校招如何准备、公司选择等主要问题,以及大模型、自动标注、端到端一些岗位的介绍和 分析。 每年都有同学吐槽说秋招算法岗大爆炸,都来咨询我们如何准备。所以今年我们打算做一些实打实的教学视频, 从行业、岗位和工作内容的角度为大家剖析,应该怎么选,什么样子的最适合自己。 更多内容欢迎加入我们的求职星球了解,一个转为自动驾驶、机器人和大模型求职打造的社区。 AutoRobo知识星球 这是一个给自动驾驶、具身智能、机器人方向同学求职交流的地方,目前近1000名成员了,成员范围包含已经 工作的社招同学,如智元机器人、宇树科技、地瓜机器人、地平线、理想汽车、华为、小米汽车、momenta、元 戎启行等公司。同时也包含2024年秋招、2025年秋招的小伙伴,方向涉及自动驾驶与具身智能绝大领域。 星球内部有哪些内容?这一点结合我们已有的优势,给大家汇总了面试题目、面经、行业研报、谈薪技巧、还有 各类内推公司、简历优化建议服务。 招聘信息 星球内部日常为大家分享已有的算法、开发、产品等岗位,基本都是公司第一时间 ...
一文尽览!扩散模型在自动驾驶基础模型中的应用汇总,30+工作都在这里了~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the significant role of diffusion models in the development of autonomous driving technologies, highlighting their ability to enhance data diversity, improve perception system robustness, and assist decision-making under uncertainty [2][3]. Group 1: Diffusion Models in Autonomous Driving - Diffusion models have shown promising applications in autonomous driving, particularly in generating diverse and physically constrained results from complex data distributions [2]. - The introduction of the Dual-Conditioned Temporal Diffusion Model (DcTDM) allows for the generation of realistic long-duration driving videos, addressing challenges such as limited data quality and high costs [3][4]. - The performance of DcTDM has been evaluated, demonstrating over 25% improvement in consistency and frame quality compared to other video diffusion models [3]. Group 2: Applications in Perception and Decision-Making - In perception, diffusion models significantly outperform traditional methods in 3D occupancy prediction, especially in occluded or low-visibility areas, thereby supporting downstream planning tasks [4]. - The Stable Diffusion Model effectively predicts vehicle trajectories, enhancing the predictive capabilities of autonomous driving systems [4]. - The DiffusionDrive framework utilizes diffusion models to model multimodal action distributions, innovating end-to-end autonomous driving applications by addressing uncertainties in driving decisions [4]. Group 3: Data Generation and Quality Improvement - Diffusion models are crucial for generating high-quality synthetic data, addressing the challenges of insufficient diversity and authenticity in natural driving datasets [4]. - The introduction of controllable generation techniques is particularly important for overcoming 3D data annotation challenges, with future explorations into video generation aimed at further enhancing data quality [4]. Group 4: Advanced Frameworks and Innovations - LD-Scene combines large language models with latent diffusion models to generate adversarial driving scenarios, enhancing the controllability and robustness of generated scenes [9]. - DualDiff introduces a dual-branch diffusion model designed to improve multi-view driving scene generation, utilizing occupancy ray sampling for rich semantic information [30]. - DiVE employs a diffusion transformer framework to generate high-fidelity, temporally coherent multi-view videos, achieving state-of-the-art performance in multi-view video generation [19][20]. Group 5: Safety and Critical Scenario Generation - AVD2 enhances understanding of accident scenarios by generating videos aligned with detailed natural language descriptions, contributing to accident analysis and prevention [36]. - AdvDiffuser generates adversarial safety-critical driving scenarios, improving transferability across different systems while maintaining authenticity and diversity [68][69]. - The introduction of Causal Composition Diffusion Model (CCDiff) enhances controllability and realism in generating closed-loop traffic scenarios, significantly outperforming existing methods [41].
高保真实景还原!最强性价比3D激光扫描仪~
自动驾驶之心· 2025-07-31 23:33
Core Viewpoint - GeoScan S1 is presented as the most cost-effective handheld 3D laser scanner in China, featuring lightweight design, easy one-button operation, and high efficiency in 3D scene reconstruction with centimeter-level accuracy [1][4]. Group 1: Product Features - The GeoScan S1 can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large scenes over 200,000 square meters [1][24]. - It integrates multiple sensors and supports cross-platform integration, providing flexibility for scientific research and development [1][39]. - The device is equipped with a handheld Ubuntu system and various sensor devices, allowing for easy power supply and operation [1][4]. Group 2: Performance and Specifications - The system supports real-time 3D point cloud mapping, color fusion, and real-time preview, with relative accuracy better than 3 cm and absolute accuracy better than 5 cm [17]. - The device dimensions are 14.2 cm x 9.5 cm x 45 cm, weighing 1.3 kg without the battery and 1.9 kg with the battery, with a battery capacity of 88.8 Wh providing approximately 3 to 4 hours of operation [17][18]. - It features a microsecond-level synchronization for multi-sensor data, ensuring high precision in complex indoor and outdoor environments [29][30]. Group 3: Market Position and Pricing - The initial launch price for the GeoScan S1 starts at 19,800 yuan, with various versions available to meet different user needs, including basic, depth camera, and 3DGS versions [4][53]. - The product is positioned as offering the best price-performance ratio in the industry, integrating multiple sensors and advanced technology [2][53]. Group 4: Applications and Use Cases - GeoScan S1 is suitable for various applications, including urban planning, construction monitoring, and environmental surveying, capable of accurately constructing 3D scene maps in diverse settings such as office buildings, industrial parks, and tunnels [33][42]. - The device supports high-fidelity real-world restoration through an optional 3D Gaussian data collection module, allowing for complete digital replication of real-world environments [46].
从今年的WAIC25看具身智能的发展方向!
自动驾驶之心· 2025-07-31 10:00
Core Viewpoint - The article highlights the development direction of embodied intelligence showcased at the World Artificial Intelligence Conference (WAIC) 2025, emphasizing the increasing diversity of products and companies in the field, particularly in embodied intelligence and autonomous driving [1]. Summary by Sections Embodied Intelligence Showcase - The event featured a significant number of companies and diverse product forms related to embodied intelligence, with a notable demonstration of a robot named "Iron Fist King" showcasing agility and stability [1]. - Many service and industrial robots were displayed, indicating a growing trend in mobile operations, although challenges in cognitive recognition under human intervention were noted [3]. Technological Advancements - Companies are transitioning from merely showcasing demos to establishing industrial closed-loop models, indicating progress in commercializing embodied intelligence technologies [8]. - A focus on integrating data, strategy, and system deployment into a cohesive process is emerging, with many companies now prioritizing a unified model approach [8]. Community and Knowledge Sharing - The article introduces the "Embodied Intelligence Heart" knowledge community, which aims to facilitate technical exchanges among nearly 200 companies and institutions in the field [10]. - The community offers resources such as technical routes, project solutions, and access to industry experts, enhancing learning and collaboration opportunities [10][21]. Job Opportunities and Industry Insights - The community provides job sharing and recruitment opportunities, connecting members with potential employers in the embodied intelligence sector [20][25]. - It also compiles various resources, including research reports, open-source projects, and datasets relevant to embodied intelligence, aiding members in their professional development [30][41].