Workflow
自动驾驶之心
icon
Search documents
端到端和VLA占据自动驾驶前沿方向的主流了。。。
自动驾驶之心· 2025-10-13 04:00
Core Insights - The article discusses the evolution of end-to-end algorithms in autonomous driving, highlighting the transition from modular production algorithms to end-to-end approaches and the recent focus on Vision-Language Models (VLA) [1][3]. Group 1: End-to-End Algorithms - End-to-end algorithms are central to the current mass production of autonomous driving technology, involving a rich technology stack [1]. - There are two main paradigms in the industry: single-stage and two-stage approaches, with UniAD being a representative of the single-stage paradigm [1]. - The single-stage approach can be further categorized into several subfields, including perception-based, diffusion model-based, world model-based, and VLA-based end-to-end algorithms [1]. Group 2: VLA and Course Offerings - The article mentions the recent surge in interest regarding how to efficiently learn about end-to-end and VLA technologies, leading to the creation of specialized courses [3]. - The "End-to-End and VLA Autonomous Driving Course" focuses on VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA approaches [3]. - The course includes a detailed theoretical foundation and practical assignments to help participants build their own VLA models and datasets [3]. Group 3: Course Instructors - The course features a team of instructors with significant academic and practical experience in multi-modal perception, autonomous driving VLA, and large model frameworks [7][9]. - Instructors have published numerous papers in top international conferences and have hands-on experience in developing and implementing cutting-edge algorithms in the field [7][9][10]. Group 4: Target Audience and Requirements - The courses are designed for individuals with a foundational understanding of autonomous driving and familiarity with key technologies such as transformer models, reinforcement learning, and BEV perception [13]. - Participants are expected to have a basic knowledge of probability theory, linear algebra, and proficiency in Python and PyTorch [13].
30场重磅报告|第三届自主机器人技术研讨会火热报名中,探访两大Top企业!
自动驾驶之心· 2025-10-12 23:33
今年不容错过的重磅学术会议,ARTS 2025 将带领参会者走进「 宇树科技 」与「 微分智飞 」进行实地参访交流。 2025年10月18日-19日, 第三届自主机器人技术研讨会 (ARTS 2025)将在中 国· 浙江大学 (玉泉校区)盛大举办 。 在常规学术议程基础之上,ARTS 2025 进一步推出 ARTS奖学金 、 学术辩论赛 、 学术吐槽大会 (脱口秀) 及 企业参观。 旨在打破传统会议单向输 出的模式,让思想的碰撞不止于论文宣讲,构建产学融合与坦诚对话的立体化交流场景。 会议将组织参会者走进一线企业: 「 宇树科技 」「 微分智飞 」 等多元活动。 我们诚挚邀请国内外学界同仁、科研工作者及行业工程师踊跃报名, 共筑新知。 了解会议详情,扫码加入 【 ARTS 2025 交流群】 ARTS 2025 组织机构 主办单位 : 中国自动化学会 承办单位 : 浙江大学控制科学与工程学院 上海交通大学自动化与感知学院 协办单位 : 深蓝学院 ARTS 2025 会议议程 | 09:00-09:10 | 开幕式 | | --- | --- | | 09:10-09:20 | 企业颁奖 | | 09:20-0 ...
Waymo提出Drive&Gen:用生成视频评估端到端自动驾驶(IROS'25)
自动驾驶之心· 2025-10-12 23:33
作者 | Jiahao Wang 来源 | 我爱计算机视觉 传统的自动驾驶系统像一个部门林立的大公司,感知、预测、规划等模块各司其职,虽然稳定,但流程繁琐,一个环节出错就可能影响全局。而E2E模型就 像一个全能的创业团队,直接从摄像头画面等原始输入,一步到位输出驾驶决策,简洁高效,潜力巨大。 但问题也随之而来:AI生成的视频真的足够"真实",能骗过自动驾驶系统,并用来做严肃的评估吗?我们又该如何深入了解E2E驾驶模型的"脾气",修复它 的短板,让它在没见过的新场景(比如突然的暴雨天)里也能从容应对? 为了回答这些问题,来自约翰霍普金斯大学、Waymo和谷歌DeepMind的研究者们联手,在即将于IROS 2025会议上发表的论文中,提出了一个名为 Drive&Gen 的新框架。这个名字很直白,就是将 驾驶(Drive) 和 生成(Gen) 结合起来,旨在连接E2E驾驶模型和生成式世界模型,共同评估和提升彼 此。 背景:当E2E驾驶遇上生成式AI 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术 ...
聊聊 AI Agent 到底有多大创新?
自动驾驶之心· 2025-10-12 23:33
作者 | sunnyzhao 编辑 | 大模型之心Tech 1,planing阶段带来了巨大的耗时,当tool变多后,turbo系列模型的准确率堪忧,因此不得不使用旗舰模型,这让延时进一步增 加。 2,planing的质量不够高,原来的task bot做任务所使用的workflow是人工决定的,现在改成了模型自助决定,从目前的测试来 看,由模型构建的复杂工作流的可用率远远不及人类水平。简单工作流使用判别式小模型反而性能更好。 3,reflection是一种时间换准确度的策略,然而这个策略非常容易重复进行自我内耗,和死循环。 这几个问题,确实是目前AI Agent技术的通病。如果把Agent当成"LLM+工具调用"的简单组合,没有认真处理工程细节,实际的 效果也确实未必比工作流编排就更好。主要结合看到一些论文,和一点实际经验,按题主说到的三点谈一下自己的看法。 本文只做学术分享,如有侵权,联系删文 ,自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 Planning慢的本质原因 原文链接: https://www.zhihu.com/question/657739588/ ...
工业界大佬带队!三个月搞定端到端自动驾驶
自动驾驶之心· 2025-10-12 23:33
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry is witnessing rapid development in end-to-end production, particularly in one-stage and two-stage paradigms, with one-stage methods like UniAD being prominent [1][3]. - Various one-stage methods have emerged, including perception-based, world model-based, diffusion model-based, and VLA-based approaches, indicating a strong push from both autonomous driving companies and vehicle manufacturers towards self-research and mass production of end-to-end autonomous driving [3][5]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, focusing on cutting-edge algorithms in both one-stage and two-stage end-to-end methods, aimed at bridging academic and industrial advancements [5][15]. - The course is structured into several chapters, covering topics such as the history and evolution of end-to-end algorithms, background knowledge on VLA, and detailed discussions on two-stage and one-stage end-to-end methods [9][10][12]. Group 3: Key Technologies and Techniques - The course emphasizes key technologies such as BEV perception, visual language models (VLM), diffusion models, and reinforcement learning, which are essential for mastering the latest advancements in autonomous driving [5][11]. - The second chapter of the course is highlighted as crucial for understanding the most frequently asked technical keywords in job interviews over the next two years [10]. Group 4: Practical Applications and Outcomes - The course includes practical assignments, such as RLHF fine-tuning, allowing participants to apply their knowledge in real-world scenarios and understand how to build and experiment with reinforcement learning modules [13][19]. - By completing the course, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, gaining a comprehensive understanding of various methodologies and their applications [19].
今晚截止!自动驾驶之心国庆&中秋节活动倒计时
自动驾驶之心· 2025-10-12 06:58
驾 + 具 身 所有课程 频售课程除夕 ▲星球福利 / xING QIU FU LI 7折优惠,立减80 7折优惠, 立减99 节后将再次涨价 赠送7门精品课程 具身智能之心 自动驾驶之心 ▲ 福利专区 /FU LI ZHUAN QU O 1.大模型星球 99元一年,(技术 + 行 业 + 求职) 2. 1v1辅导辅导最高1000抵扣 5000 3.1v6论文辅导立减*1000 4. 超级折扣卡:*299元 自驾课程七折 优惠 (一年期) ▲ 伊件福利 /YIN JIAN FU LI 星球优惠!新人七折续费五折 星球核心内容一览! 自动驾驶之心 知识星球 技 最前沿的 自驾技术社区 术 f 7 P 7 5 r 6 自动驾驶VLA 世界模型 闭环仿真 扩散模型 BEV感知 --- 近40+学习路线 保持活力,持续学习 交 学术界&工业界 大佬面对面交流 4 r r VLA和WA的路线之争 未来自驾的发展方向 世界模型到底model了个館? 关于端到端的讨论 星友面对面 直击行业第一线 直 → 顶会作者亲临 播 6 f t r r Impromptu VLA NavigScene LangCoop DriveBe ...
无图端到端智驾到底用什么样的图
自动驾驶之心· 2025-10-11 16:03
Core Viewpoint - The article discusses the various types of maps used in autonomous driving, highlighting the evolution from traditional navigation maps to more advanced SD maps and their implications for driving technology [1][6]. Group 1: Types of Maps - Cockpit Navigation Map: Provides static information such as navigation trajectory link points and dynamic information like lane actions, originally designed for human drivers [1]. - SD Map: An upgraded version of cockpit navigation maps that includes road junction information, allowing for better navigation through complex intersections [1][3]. - SDPro Map: Incorporates lane topology, detailing connections between lanes, which aids in efficient lane changes and navigation [3]. - Light Map: A simplified version of HD maps, relying on visual crowdsourcing for updates, retaining only essential road features [4][5]. - Crowdsourced Maps: Developed by OEMs to keep pace with autonomous driving advancements, allowing for tailored map data that meets specific internal needs [5]. Group 2: Map Evolution and Usage - The evolution of maps shows an increase in features but a decrease in applicability and freshness, indicating a shift in focus towards more specialized maps for autonomous driving [6]. - For end-to-end autonomous driving, SD maps are generally sufficient, but having SDPro or crowdsourced maps is beneficial for accurate prior information [6][7]. - The architecture of end-to-end autonomous driving systems relies heavily on maps as essential inputs for models and rule strategies [7].
我们正在寻找自动驾驶领域的合伙人...
自动驾驶之心· 2025-10-11 16:03
Group 1 - The article announces the recruitment of 10 outstanding partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The main areas of expertise sought include large models, multimodal models, diffusion models, end-to-end systems, embodied interaction, joint prediction, SLAM, 3D object detection, world models, closed-loop simulation, and model deployment and quantization [3] - Candidates are preferred from QS200 universities with a master's degree or higher, especially those with significant contributions to top conferences [4] Group 2 - The compensation package includes resource sharing for job seeking, doctoral studies, and overseas study recommendations, along with substantial cash incentives and opportunities for entrepreneurial project collaboration [5] - Interested parties are encouraged to add WeChat for consultation, specifying "organization/company + autonomous driving cooperation inquiry" [6]
任少卿的智驾非共识:世界模型、长时序智能体与 “变态” 工程主义
自动驾驶之心· 2025-10-11 16:03
Core Viewpoint - The article discusses the innovative approach of NIO in the field of autonomous driving, emphasizing the importance of world models and reinforcement learning in achieving advanced AI capabilities, particularly in the context of self-driving technology [5][11][13]. Group 1: Company Background and Leadership - NIO is led by Ren Shaoqing, a young technical leader with a strong background in AI and deep learning, having co-founded the autonomous driving company Momenta before joining NIO [6][8]. - Ren Shaoqing has taken on the challenge of developing NIO's second-generation platform from scratch, focusing on building a robust data system to support autonomous driving capabilities [6][8]. Group 2: Technological Innovations - NIO's approach combines high computing power, multiple sensors, and a new architecture based on world models and reinforcement learning, which is considered a more challenging but potentially more effective path [8][9]. - The world model aims to establish a high-bandwidth cognitive system that can understand and predict physical interactions in the real world, addressing the limitations of language models [20][25]. Group 3: Reinforcement Learning and Data Systems - The company emphasizes the significance of reinforcement learning in developing long-term planning capabilities for autonomous driving, moving beyond traditional imitation learning [7][60]. - NIO has developed a three-tier data system to enhance data quality and training efficiency, which is crucial for building effective autonomous driving models [74][76]. Group 4: Market Position and Future Outlook - NIO aims to lead the industry by integrating world models into its autonomous driving technology, positioning itself ahead of competitors who primarily rely on language models [66][67]. - The company is focused on achieving open-set interaction capabilities, allowing users to communicate with the vehicle in a more natural and flexible manner [36][39].
揭秘特斯拉FSD V14 “车位到车位”核心算法:高保真3D Occ占用预测
自动驾驶之心· 2025-10-11 16:03
Core Insights - The article discusses Tesla's FSD V14 and its innovative "space occupancy detection" algorithm, which allows for high-precision 3D spatial reconstruction using only 2D image data from cameras, achieving accuracy within 10 cm [4][11][20]. Group 1: Overview of the High-Fidelity 3D Occupancy Algorithm - The high-fidelity 3D occupancy algorithm utilizes AI to accurately perceive and make decisions in complex dynamic environments, focusing on the occupancy attributes of surrounding space [5][6]. - Key components of the algorithm include the occupancy grid algorithm, which predicts the occupancy status of voxels (3D pixels) around the vehicle [5][6]. Group 2: Technical Mechanisms - The algorithm employs a Signed Distance Function (SDF) to predict the distance to the nearest occupied voxel, enhancing spatial perception and enabling more refined shape recognition [7][18]. - The system processes images from multiple cameras using convolutional neural networks (CNN) to extract meaningful features, which are then transformed into 3D spatial representations [12][20]. Group 3: Applications and Use Cases - The high-fidelity occupancy network can be applied in advanced parking assistance systems, enabling the identification of available parking spaces and assessing their suitability based on various factors [23][24]. - The algorithm is also applicable in autonomous robots for indoor navigation, allowing them to distinguish between obstacles and navigable areas [29]. Group 4: Advantages and Innovations - The SDF-based rendering approach provides richer detail and smoother visuals compared to traditional point cloud or binary voxel occupancy rendering methods [21]. - The algorithm's reliance solely on 2D visual data, without the need for depth cameras or LiDAR, represents a significant innovation in the field of autonomous driving [11][12].