自动驾驶之心
Search documents
传统的感知被嫌弃,VLA逐渐成为新秀...
自动驾驶之心· 2025-09-10 23:33
Core Viewpoint - The article discusses the evolution of autonomous driving technology, emphasizing the transition from traditional modular architectures to end-to-end models, and highlights the emergence of Vision-Language-Action (VLA) models as a new paradigm in the field [2][3]. Summary by Sections VLA Research Paper Guidance - The course aims to provide systematic knowledge on VLA, addressing gaps in understanding and practical application, and helping students develop their own research ideas and writing skills [4][5][6]. Course Objectives - The program seeks to help students who lack a clear knowledge framework, have difficulty in practical implementation, and struggle with writing and submitting papers [4][5][6]. Course Structure - The course consists of 12 weeks of online group research, followed by 2 weeks of paper guidance and a 10-week maintenance period, focusing on classic and cutting-edge papers, coding skills, and writing methodologies [5][10][12]. Enrollment Details - The program is limited to 6-8 students per session, targeting individuals with a background in deep learning and basic knowledge of autonomous driving algorithms [9][11][14]. Course Highlights - The curriculum includes foundational courses in Python and deep learning, with a focus on enhancing coding abilities and understanding key algorithms and their advantages [18][21][22]. Key Papers and Resources - The course provides access to essential papers and datasets relevant to VLA and autonomous driving, facilitating a comprehensive understanding of the subject matter [23][24][30].
研三了,找工作的时候卡在了论文上......
自动驾驶之心· 2025-09-10 12:00
Core Viewpoint - The article emphasizes the importance of high-quality research papers for graduate students, especially those aiming for doctoral programs or competitive job positions in the tech industry. It highlights the challenges faced by students in producing quality research and offers professional guidance to help them succeed in their academic endeavors [1]. Group 1: Challenges Faced by Students - Many students struggle to secure jobs due to average research outcomes and seek to pursue doctoral studies to alleviate employment pressure [1] - Students often face difficulties in selecting research topics, structuring their papers, and providing strong arguments, leading to delays in producing satisfactory work [1][8] Group 2: Services Offered - The company provides specialized guidance for students in writing research papers, particularly in the fields of autonomous driving, embodied intelligence, and robotics [3][5] - A structured 12-week program is outlined, which includes determining research direction, literature review, experimental design, drafting, and submission processes [4] Group 3: Expertise and Success Rate - The company boasts a team of over 300 dedicated instructors from top global universities, with a high acceptance rate of 96% for students they have guided in the past three years [5] - The service aims to help students build research thinking, familiarize themselves with research processes, and enhance their practical abilities [8][10] Group 4: Target Audience - The services are tailored for graduate students in computer science, those with no guidance from advisors, and individuals seeking to improve their academic credentials for job applications or further studies [9][10] Group 5: Additional Benefits - Outstanding students may receive recommendation letters from prestigious institutions and direct referrals to leading tech companies for internships or job positions [15]
阿里AgentScope发布,掀翻了国产Agent的餐桌
自动驾驶之心· 2025-09-09 23:33
Core Viewpoint - The article discusses the launch of Alibaba's AgentScope 1.0, a comprehensive enterprise-level intelligent agent development framework that integrates message-driven and hierarchical architecture, aimed at enhancing the development and deployment of intelligent agent applications [1][2]. Group 1: AgentScope 1.0 Features - AgentScope 1.0 represents a significant improvement in supporting flexible and efficient interactions between agents and their environments, leveraging advancements in large language models (LLMs) [3]. - The framework includes core components and a unified interface that allows developers to easily utilize the latest technological advancements, such as new models and model context protocols (MCPs) [3]. - It is built on the ReAct paradigm, which combines reasoning and action, providing three core functionalities: Reply, Observe, and Handle Interrupt, along with features like real-time control and dynamic tool supply [9]. Group 2: Built-in Agents and Collaboration - AgentScope includes three types of scenario-based agents: deep research agents, browsing agents, and meta-planning agents, each designed for specific tasks and capable of integrating various functionalities [9]. - The framework supports multi-agent collaboration through two core paradigms: "agents as tools" and "agent dialogue," ensuring seamless integration and context synchronization among multiple agents [9]. Group 3: Developer Experience and Tools - The framework offers a developer-friendly experience with tools that cover the entire development process, including evaluation, visualization, and runtime systems [6][7]. - A visual studio interface is provided for managing long-trajectory agent applications, along with a runtime sandbox to ensure safe execution of agents in production environments [3][9]. - The evaluation module features a layered architecture that supports both single-process and distributed computing evaluations, allowing for comprehensive performance assessment [9].
自动驾驶VLA再升级!博世IRL-VLA:打造全新闭环强化学习框架
自动驾驶之心· 2025-09-09 23:33
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 今天自动驾驶之心为大家分享 清华&博世等团队 最新的工作! IRL-VLA:基于逆向强化学习奖励世界模型的视觉-语言-动作策略闭环训练框 架! 如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与 技术交流群加入 ,也欢迎添加小助理微信AIDriver005 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Anqing Jiang等 编辑 | 自动驾驶之心 自动驾驶VLA深入行业视野以来,一直面临两个关键的问题: 1. 现有的VLA架构通常基于开环设置中的模仿学习,倾向于捕捉数据集中的记录行为,性能在一定程度上收到了限制; 2. 闭环训练严重依赖高保真的传感器仿真,但仿真环境和真实环境的domain gap和计算效率的问题阻碍了VLA的泛化。 针对这两个问题,博世、上海大学、上交和清华AIR的团队提出了IRL-VLA,一个全新的闭环强化学习方法,通过逆向强化学习奖励世界模型结合设计的VLA方法。IRL- VLA采用三阶段范式:在第一阶段,提出了一种VLA架构,并通过模仿学习对VL ...
花了很久,才整理好的自动驾驶学习路线......
自动驾驶之心· 2025-09-09 23:33
最近秋招大规模开启了,很多业内公司联系我们发布招聘岗位,也感叹满足需求需求的同学越来越少。。。 因为一直在做自驾自媒体,我们也分析了自动驾驶的从业同学,主要有几类:机械/通信出身转码(算法接触 少)、自动化/计算机/电子信息专业、传统机器人领域。自动驾驶行业发展太快了,在学校里面学的东西根本跟 不上业界的发展,培养根本跟不上。22年入学的刚接触BEV,25年毕业就都是端到端、大模型了,好多入门的 同学都是野路子。这也不怪他们,毕竟很多高校老师转向也没那么快。 根因就是没有系统的培养体系,导致这方面的高质量人才严重不足。前面我们在社区内给大家梳理了很多自动驾 驶技术子领域的学习路线,大家可以好好学习下,助力成为一个真正懂自驾的从业者。 如果您还不是我们的成 员,欢迎加入我们和近4000名星球成员一起交流。 『自动驾驶之心知识星球』目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类的自驾社 区,已经超过4000人了。 我们期望未来2年内做到近万人的规模。给大家打造一个交流+技术分享的聚集地,是 许多初学者和进阶的同学经常逛的地方。 社区内部还经常为大家解答各类实用问题:端到端如何入门? ...
超越GPT-4o!AgentThink: 清华&小米融合推理&工具调用的自动驾驶框架(EMNLP25)
自动驾驶之心· 2025-09-09 23:33
近年来,视觉语言模型(VLM)在自动驾驶领域展现出巨大潜力。凭借出色的场景理解与推理能力,VLM有望显著简化传统自动驾驶系统中依赖人工设计的感 知、预测与决策模块。然而,现有方法仍在不确定性建模、泛化性能与可解释性等方面存在明显局限。如何让自动驾驶VLM不仅"看得懂",更能像人类一样"思 考"——在复杂的自动驾驶环境中自主调用工具、进行推理与判断? 近日,由 清华大学、小米、麦吉尔大学 等团队联合提出的AgentThink框架,被自然语言处理顶会EMNLP 2025 Findings接收。该工作首次将动态工具调用与思维链 推理深度融合,极大提升了VLM在自动驾驶任务中的推理可靠性和泛化能力。目前,代码与项目网站均已开源。 现状与挑战 随着小米等车企在自动驾驶技术上的快速迭代,行业突破点正从基础感知与控制层面向 高阶语义场景理解 与 复杂拓扑关系 等问题收敛。比如一些大路口的复杂 红绿灯问题,以及一些复杂标牌的语义理解问题。此外,在探索和使用VLM的过程中,我们发现VLM模型存在严重的幻觉问题 (即模型给出的答案格式是对的, 但答案内容都是错的)。这就如同一个看似聪明的导航员,却总是给出错误的路线,让人哭笑不得 ...
自动驾驶论文速递 | 端到端、Diffusion、VLM、OCC等方向~
自动驾驶之心· 2025-09-09 07:51
Core Insights - The article discusses advancements in autonomous driving technology, focusing on knowledge-driven diffusion strategies and multimodal frameworks for improved performance in various driving scenarios [3][5][16][19][27][29][37][38]. Group 1: Knowledge-Driven Diffusion Policy (KDP-AD) - A new knowledge-driven diffusion policy (KDP) has been proposed, achieving success rates of 100%, 94%, and 90% in ramp merging, intersections, and roundabouts respectively, outperforming traditional methods [3][5]. - The KDP framework utilizes a mixture of experts (MoE) approach, allowing for modular and combinatorial strategy learning, enhancing knowledge reuse across different driving scenarios [5]. - Empirical validation shows that KDP significantly outperforms baseline methods in success rates, collision risk control, and generalization capabilities [5][9]. Group 2: SliceSemOcc Framework - The SliceSemOcc framework introduces a vertical slice-based multimodal 3D semantic occupancy prediction, improving mean Intersection over Union (mIoU) from 24.7% to 28.2%, a relative increase of 14.2% [16][19]. - It employs a dual-scale vertical slicing strategy to enhance feature extraction for both global and local contexts, addressing the representation of small objects effectively [19]. - Experimental results demonstrate significant improvements in mIoU across various datasets, particularly for small object categories [19][22]. Group 3: LatticeWorld Framework - The LatticeWorld framework integrates a lightweight multimodal large language model with Unreal Engine 5, achieving over 90 times efficiency improvement in industrial-grade scene generation [27][29]. - It supports multimodal inputs and dynamic multi-agent interactions, filling gaps in existing methods regarding interactivity and industrial efficiency [29]. - Performance validation indicates superior layout generation accuracy and visual fidelity compared to existing models, maintaining high creative quality [29][35]. Group 4: Ego3D-Bench and Ego3D-VLM - The Ego3D-Bench benchmark is introduced for ego-centric multi-view 3D spatial reasoning, enhancing visual language models' performance in various tasks [37][38]. - The Ego3D-VLM post-training framework improves 3D spatial reasoning capabilities, achieving an average accuracy increase of 12% in multiple-choice questions and a 56% reduction in RMSE for absolute distance estimation [38][41]. - The framework demonstrates adaptability across various state-of-the-art visual language models, confirming its generalization capabilities [38][41].
π0.5开源了!!!
自动驾驶之心· 2025-09-09 07:51
Core Viewpoint - The article discusses the release of the π0.5 model, an upgraded version of the π0 model, which enhances open-world generalization capabilities through knowledge insulation [3]. Group 1 - The π0.5 model has been open-sourced and is available on GitHub, providing a base model pre-trained on over 10,000 hours of robotic data [5][4]. - The repository includes practical examples and guidelines for fine-tuning custom datasets, making it accessible for users [5]. - The PyTorch version of both π0 and π0.5 models has been implemented and validated on the LIBERO benchmark, which includes inference and fine-tuning [10]. Group 2 - The π0-FAST model features mixed precision training, fully sharded data parallel (FSDP) training, low-rank adaptation (LoRA) training, and exponential moving average (EMA) weights during training [12].
当老师给我指了个VLA作为研究方向后.......
自动驾驶之心· 2025-09-09 03:42
小林是某C9高校的研二同学,目前实验室主要是做自动驾驶和机器人方向的。这两周刚开学,忙完 了寝室和班里里面杂七杂八的事情,该去实验室和老板Meeting一下了。老板这个暑假没闲着啊,看 了企业不少VLA都量产上车了,说咱们实验室也可以搞搞看,发发论文。 确实自动驾驶最近的热点都在大模型和VLA靠拢,然而VLA并不是那么好做的,对于一个新手或者 转行的同学,开展研究蛮难受的。踩了一年坑,也不一定能有效果。这时候,峰哥给他推荐了自动 驾驶之心的1v6论文辅导。 ⼀、VLA科研论文辅导课题来啦⭐ 端到端(End-to-End)自动驾驶旨在构建一个统一的智能模型,直接将传感器原始输入(如摄像头图 像)映射到车辆的驾驶控制指令(如转向、油门、刹车),从而替代传统的多模块、级联式架构 (感知、预测、规划、控制)。这一演进过程大致可分为以下几个阶段,而VLA模型的出现正是为 了解决前序阶段的瓶颈,标志着一个新范式的开启。 1. 传统模块化架构的时代: 早期的自动驾驶系统(L2-L4级)普遍采用模块化设计。每个模块(如 物体检测、轨迹预测、路径规划)被独立开发和优化。 优势: 逻辑清晰,各模块可独立调试和 验证,具有较好的可 ...
悄悄搞了个大模型技术社区......
自动驾驶之心· 2025-09-09 03:42
一个认真做内容的社区,一个培养未来领袖的地方。 自动驾驶VLA这么火,想借这个机会了解更多大模型相关的技术知识,有哪些方向可以做,现在热点在哪 里? 为此我们筹备了大模型之心Tech社区,平台主要关注大模型RAG、大模型AI Agent、多模态大模型 (预训练、微调、强化学习)和大模型部署推理优化等等。 欢迎对大模型技术感兴趣的小伙伴关注我们~ 如果您想做进一步学习,也欢迎加入我们的 大模型之心Tech知识星球 。大模型之心Tech知识星球,我们目 标是构建一个国内最大的大模型技术社区,一直在给行业和个人输送各类人才、产业学术信息。目标星球 正在快速搭建相关模块,欢迎加入我们与大模型同行。 ...