自动驾驶之心
Search documents
自动驾驶之心求职与行业交流群来啦~
自动驾驶之心· 2025-08-02 06:00
微信扫码添加小助理邀请进群, 备注自驾+昵称+求职 ; 最近和很多准备校招的小伙伴接触,发现大家在学校学习的东西和工作的差距越来越大。有不少工作多年 的小伙伴表示也在看机会,感知转大模型、世界模型,传统规控想转具身。但却不知道业内实际在做什 么,导致秋招的时候没有什么优势...... 峰哥一直在鼓励大家坚持、多和其他人交流,但归根结底个人的力量是有限的。我们希望共建一个大的社 群和大家一起成长,真正能够帮助到一些有需要的小伙伴,成为一个汇集全行业人才的综合型平台,真正 做一个链接学校和公司的桥梁。所以我们也开始正式运营求职与行业相关的社群。社群内部主要讨论相关 产业、公司、产品研发、求职与跳槽相关内容。如果您想结交更多同行业的朋友,第一时间了解产业。欢 迎加入我们! ...
打算在招募一些自动驾驶大佬,共创平台!
自动驾驶之心· 2025-08-01 16:03
Core Viewpoint - The intelligent driving industry is transitioning from Level 2 (L2) to Level 3 (L3), with significant technological advancements improving user experience [2]. Group 1: Industry Development - The intelligent driving sector is gaining momentum, with companies like Xiaomi achieving impressive sales, such as the YU7 model reaching over 200,000 pre-orders in just three minutes [2]. - The industry is entering a more complex phase, requiring deeper engagement and collaboration among stakeholders to tackle challenges [2]. Group 2: Educational Initiatives - The company is inviting experts in the autonomous driving field to contribute to the development of online courses and consulting services [3]. - There is a focus on advanced topics such as large models, reinforcement learning, and 3D simulation, encouraging participation from individuals with relevant expertise [3][4]. Group 3: Collaboration and Opportunities - The company aims to create a platform for collaboration among global developers and researchers in the autonomous driving sector [2]. - It offers flexible working arrangements, including part-time and full-time opportunities, along with significant profit-sharing and resource sharing within the industry [6].
硬核夜话:和一线量产专家深入聊聊自驾数据闭环工程
自动驾驶之心· 2025-08-01 16:03
Core Viewpoint - The article emphasizes the importance of a complete data closed-loop system in autonomous driving, which includes data collection, annotation, training, simulation validation, and OTA updates. As autonomous driving evolves from Level 2 to higher levels, the volume of data increases exponentially, making the breadth and depth of scenario coverage crucial for system safety [3]. Group 1: Data Closed-Loop Challenges - The data closed-loop engineering faces three core challenges: 1. The "long tail problem," which refers to the difficulty in capturing and incorporating rare but critical extreme scenarios (e.g., extreme weather, complex road conditions, sudden obstacles) into the training system [3]. 2. Data processing efficiency, as each vehicle generates terabytes of data daily due to increased sensor quantity and precision, necessitating effective filtering, annotation, and utilization of this data [3]. 3. Verification difficulties, where traditional testing methods cannot cover all possible scenarios, highlighting the need for a scientific complement between simulation testing and real-world validation [3]. Group 2: Industry Transition - The industry is transitioning from a focus on "function stacking" to "safety-centric" approaches. The challenges of data closed-loop engineering extend beyond technology to include establishing scientific verification standards, improving data processing efficiency, and balancing iteration speed with system stability to maintain a positive feedback loop in data utilization [3]. Group 3: Expert Insights - The article mentions an invitation to a data expert, Ethan, to discuss the deep challenges faced during the mass production process of autonomous driving, focusing on the essence of engineering rather than flashy technology [3].
智元机器人罗剑岚老师专访!具身智能的数采、仿真、场景与工程化~
自动驾驶之心· 2025-08-01 16:03
1. 大家都知道数数据是提升智能燃料,然后传感器又是采集数据的关键,想问一下智元在传感器的研发采 购上有什么规划?如何增加产品数据的使用性? 罗剑岚:我们已与多家传感器供应商展开合作,重点聚焦视觉触觉与高密度传感器的联合研发。同时,我 们正在构建跨平台的数据采集 API,实现任务语义的统一映射,为模型训练提供标准化、可训练的数据输 入。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 具身智能之心受邀参加WAIC 2025智启具身论坛,并有幸采访到了智元机器人首席科学家罗剑岚博 士。以下为采访过程中罗博重点提到和探讨的问题。 具身智能数据讨论 2. 因为你刚才说的世界模型挺有用的,加入世界模型以后,加一些采集数据可以让它变好了,我想知道完 成这一步之后距离应用还有多远,从采集完数据到应用之间还有什么门槛? 罗剑岚:还有性能,机器人的性能要很高,真正变得有用,在你家里,给一个机器人扫地也好,或者装洗 碗机的机器人,要有95%的成功率,在100万家庭里面,这是很难的问题。 3. Sergey Levine他有发过最新的一篇文章,提出了一个Sporks of AGI观点。仿真会阻碍具身智能的scale。 我想知 ...
ACM MM'25 | 自驾2D目标检测新SOTA!超越最新YOLO Series~
自动驾驶之心· 2025-08-01 16:03
Core Viewpoint - The article discusses a new detection framework called Butter, designed to improve target detection in autonomous driving scenarios by addressing the challenges of multi-scale semantic information modeling and enhancing detection robustness and deployment efficiency [3][11]. Group 1: Framework Innovations - Butter introduces two core innovations in the Neck layer: the Frequency Consistency Enhancement Module (FAFCE) and the Progressive Hierarchical Feature Fusion Network (PHFFNet) [3][15]. - FAFCE enhances boundary resolution by integrating high-frequency detail enhancement with low-frequency noise suppression, while PHFFNet progressively fuses semantic information to strengthen multi-scale feature representation [3][15]. Group 2: Performance Metrics - Butter outperforms existing state-of-the-art (SOTA) methods in detection accuracy with significantly lower parameter counts, achieving a mean Average Precision (mAP@50) of 94.4% on the KITTI dataset, surpassing the previous best by 1.2 percentage points while using only about one-third of the computational load [32][34]. - On the BDD100K and Cityscapes datasets, Butter achieved mAP@50 scores of 53.7% and 53.2%, respectively, demonstrating superior performance compared to other lightweight models, particularly with a 1.6 percentage point improvement on Cityscapes [32][34]. Group 3: Structural Challenges - Existing Neck structures often face issues such as frequency aliasing and rigid fusion processes, which compromise feature expression and detection accuracy, particularly for small targets in complex environments [9][10]. - Butter's design addresses these structural bottlenecks by decoupling frequency modeling and multi-scale fusion, achieving a balance between accuracy and efficiency [11][12]. Group 4: Methodology Overview - The Butter framework begins with a 640×640 monocular image, extracting initial features through a lightweight Backbone module, followed by refinement through various lightweight blocks before entering the Neck module [16][17]. - The model employs a four-output head in the Head layer to generate final detection results, including class labels, confidence scores, and bounding boxes [16][17]. Group 5: Feature Fusion Techniques - FAFCE enhances feature fusion accuracy and robustness by employing high-frequency amplification and low-frequency damping mechanisms, which improve the consistency and precision of multi-scale feature integration [20][27]. - PHFFNet implements a hierarchical fusion strategy that alleviates semantic discrepancies between non-adjacent layers, significantly enhancing detection accuracy and alignment in scenarios requiring precise boundary detection [29][30].
告别被动感知!DriveAgent-R1:主动视觉探索的混合思维高级Agent
自动驾驶之心· 2025-08-01 07:05
Core Insights - DriveAgent-R1 is an advanced autonomous driving agent designed to tackle long-term, high-level behavioral decision-making challenges, leveraging a hybrid thinking framework and active perception to enhance decision-making capabilities in complex environments [3][4][32]. Innovation and Methodology - DriveAgent-R1 introduces two core innovations: a novel three-stage progressive reinforcement learning strategy and a mode grouping algorithm (MP-GRPO) that enhances the agent's dual-mode specificity, laying the groundwork for autonomous exploration [4][13]. - The agent's decision-making process is driven by active perception, allowing it to proactively seek information to reduce uncertainty, which is crucial for safe and reliable driving [5][6][32]. Performance Metrics - DriveAgent-R1 achieved state-of-the-art (SOTA) performance on the challenging SUP-AD dataset, surpassing leading multimodal models such as Claude Sonnet 4 and Gemini 2.5 Flash [4][13][27]. - The model demonstrated significant improvements in accuracy metrics, with first frame accuracy increasing by 14.2% and sequence average accuracy by 15.9% when utilizing visual tools [27][28]. Training Strategy - The training strategy consists of three phases: dual-mode supervised fine-tuning (DM-SFT), forced comparative mode reinforcement learning (FCM-RL), and adaptive mode selection reinforcement learning (AMS-RL), which collectively enhance the agent's ability to choose the optimal thinking mode based on context [24][30]. - The gradual training approach effectively transformed potential distractions from visual tools into performance amplifiers, significantly improving the agent's decision-making capabilities [28][30]. Active Perception and Visual Tools - Active perception is integrated into DriveAgent-R1, equipping it with a robust visual toolkit that allows the agent to actively explore its environment, thereby enhancing its perceptual robustness [5][19]. - The visual toolkit includes features such as high-resolution view retrieval, region of interest inspection, depth estimation, and 3D object detection, which collectively improve the agent's ability to make informed decisions in uncertain conditions [19][20]. Experimental Results - The experiments confirmed that reinforcement learning (RL) is critical for unlocking the agent's potential, with RL-trained variants significantly outperforming those trained solely through supervised fine-tuning [29][30]. - The results indicated that DriveAgent-R1's performance is heavily reliant on visual inputs, with a drastic drop in accuracy when visual information is removed, underscoring the importance of its active perception mechanism [31].
智源研究院具身智能大模型研究员岗位开放了 ,社招、校招、实习都可!
自动驾驶之心· 2025-08-01 07:05
Core Viewpoint - The article announces the recruitment of researchers for embodied intelligent large models at Zhiyuan Research Institute, offering various employment formats including social recruitment, campus recruitment, and internships [1]. Group 1: Job Responsibilities - Responsible for research and development of embodied intelligent large models (VLA models or hierarchical architectures) [4]. - Design and optimize model architectures, handle data processing, training, and deployment on real machines [4]. - Conduct in-depth research on cutting-edge technologies in the field of embodied intelligence, track the latest developments in the large model industry, and explore the application of new technologies in this field [4]. Group 2: Job Requirements - Master's degree or above in relevant fields such as computer science, artificial intelligence, robotics, automation, or mathematics [4]. - Proficiency in Python with a solid foundation in deep learning, familiar with deep learning frameworks like TensorFlow and PyTorch [4]. - Research experience in the large model field with a deep understanding of mainstream visual and language large models, including experience in pre-training, fine-tuning, and deployment processes [4]. - Experience in robot control and familiarity with mainstream embodied model training and deployment is preferred [4]. - Excellent learning ability, English proficiency, hands-on skills, and good team communication and collaboration skills; publication of relevant papers in top conferences (RSS, ICRA, CVPR, CoRL, ICLR, NeurIPS, ACL, etc.) is preferred [4]. Group 3: Community and Resources - AutoRobo Knowledge Planet serves as a community for job seekers in autonomous driving, embodied intelligence, and robotics, currently with nearly 1,000 members from various companies [6]. - The community provides resources such as interview questions, industry reports, salary negotiation tips, and internal referrals [6][7]. - The platform also shares job openings in algorithms, development, and product roles, including campus recruitment, social recruitment, and internships [7]. Group 4: Industry Reports - The community compiles various industry reports to help members understand the current state, development trends, market opportunities, and the landscape of the embodied intelligence industry [15]. - Reports include topics such as the World Robotics Report, China's Embodied Intelligence Venture Capital Report, and the development of humanoid robots [16].
万字长文!首篇智能体自进化综述:迈向超级人工智能之路~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the transition from static large language models (LLMs) to self-evolving agents that can adapt and learn continuously from interactions with their environment, aiming for artificial superintelligence (ASI) [3][5][52] - It emphasizes three fundamental questions regarding self-evolving agents: what to evolve, when to evolve, and how to evolve, providing a structured framework for understanding and designing these systems [6][52] Group 1: What to Evolve - Self-evolving agents can improve various components such as models, memory, tools, and workflows to enhance performance and adaptability [14][22] - The evolution of agents is categorized into four pillars: cognitive core (model), context (instructions and memory), external capabilities (tool creation), and system architecture [22][24] Group 2: When to Evolve - Self-evolution occurs in two main time modes: intra-test-time self-evolution, which happens during task execution, and inter-test-time self-evolution, which occurs between tasks [26][27] - The article outlines three basic learning paradigms relevant to self-evolution: in-context learning (ICL), supervised fine-tuning (SFT), and reinforcement learning (RL) [27][28] Group 3: How to Evolve - The article discusses various methods for self-evolution, including reward-based evolution, imitation and demonstration learning, and population-based approaches [32][36] - It highlights the importance of continuous learning from real-world interactions, seeking feedback, and adjusting strategies based on dynamic environments [30][32] Group 4: Evaluation of Self-evolving Agents - Evaluating self-evolving agents presents unique challenges, requiring assessments that capture adaptability, knowledge retention, and long-term generalization capabilities [40] - The article calls for dynamic evaluation methods that reflect the ongoing evolution and diverse contributions of agents in multi-agent systems [51][40] Group 5: Future Directions - The deployment of personalized self-evolving agents is identified as a critical goal, focusing on accurately capturing user behavior and preferences over time [43] - Challenges include ensuring that self-evolving agents do not reinforce existing biases and developing adaptive evaluation metrics that reflect their dynamic nature [44][45]
聊聊算法秋招岗该如何准备?2025我的秋招总结~
自动驾驶之心· 2025-07-31 23:33
最近邀请了几个星球 嘉宾录制了一些求职类的视频课程,希望能帮助到正在秋招/社招的小伙伴。 主要关于小 厂、大厂面试,秋招的校招如何准备、公司选择等主要问题,以及大模型、自动标注、端到端一些岗位的介绍和 分析。 每年都有同学吐槽说秋招算法岗大爆炸,都来咨询我们如何准备。所以今年我们打算做一些实打实的教学视频, 从行业、岗位和工作内容的角度为大家剖析,应该怎么选,什么样子的最适合自己。 更多内容欢迎加入我们的求职星球了解,一个转为自动驾驶、机器人和大模型求职打造的社区。 AutoRobo知识星球 这是一个给自动驾驶、具身智能、机器人方向同学求职交流的地方,目前近1000名成员了,成员范围包含已经 工作的社招同学,如智元机器人、宇树科技、地瓜机器人、地平线、理想汽车、华为、小米汽车、momenta、元 戎启行等公司。同时也包含2024年秋招、2025年秋招的小伙伴,方向涉及自动驾驶与具身智能绝大领域。 星球内部有哪些内容?这一点结合我们已有的优势,给大家汇总了面试题目、面经、行业研报、谈薪技巧、还有 各类内推公司、简历优化建议服务。 招聘信息 星球内部日常为大家分享已有的算法、开发、产品等岗位,基本都是公司第一时间 ...
一文尽览!扩散模型在自动驾驶基础模型中的应用汇总,30+工作都在这里了~
自动驾驶之心· 2025-07-31 23:33
Core Insights - The article discusses the significant role of diffusion models in the development of autonomous driving technologies, highlighting their ability to enhance data diversity, improve perception system robustness, and assist decision-making under uncertainty [2][3]. Group 1: Diffusion Models in Autonomous Driving - Diffusion models have shown promising applications in autonomous driving, particularly in generating diverse and physically constrained results from complex data distributions [2]. - The introduction of the Dual-Conditioned Temporal Diffusion Model (DcTDM) allows for the generation of realistic long-duration driving videos, addressing challenges such as limited data quality and high costs [3][4]. - The performance of DcTDM has been evaluated, demonstrating over 25% improvement in consistency and frame quality compared to other video diffusion models [3]. Group 2: Applications in Perception and Decision-Making - In perception, diffusion models significantly outperform traditional methods in 3D occupancy prediction, especially in occluded or low-visibility areas, thereby supporting downstream planning tasks [4]. - The Stable Diffusion Model effectively predicts vehicle trajectories, enhancing the predictive capabilities of autonomous driving systems [4]. - The DiffusionDrive framework utilizes diffusion models to model multimodal action distributions, innovating end-to-end autonomous driving applications by addressing uncertainties in driving decisions [4]. Group 3: Data Generation and Quality Improvement - Diffusion models are crucial for generating high-quality synthetic data, addressing the challenges of insufficient diversity and authenticity in natural driving datasets [4]. - The introduction of controllable generation techniques is particularly important for overcoming 3D data annotation challenges, with future explorations into video generation aimed at further enhancing data quality [4]. Group 4: Advanced Frameworks and Innovations - LD-Scene combines large language models with latent diffusion models to generate adversarial driving scenarios, enhancing the controllability and robustness of generated scenes [9]. - DualDiff introduces a dual-branch diffusion model designed to improve multi-view driving scene generation, utilizing occupancy ray sampling for rich semantic information [30]. - DiVE employs a diffusion transformer framework to generate high-fidelity, temporally coherent multi-view videos, achieving state-of-the-art performance in multi-view video generation [19][20]. Group 5: Safety and Critical Scenario Generation - AVD2 enhances understanding of accident scenarios by generating videos aligned with detailed natural language descriptions, contributing to accident analysis and prevention [36]. - AdvDiffuser generates adversarial safety-critical driving scenarios, improving transferability across different systems while maintaining authenticity and diversity [68][69]. - The introduction of Causal Composition Diffusion Model (CCDiff) enhances controllability and realism in generating closed-loop traffic scenarios, significantly outperforming existing methods [41].