自动驾驶之心

Search documents
2000人了,这个具身领域的黄埔军校有哪些料?
自动驾驶之心· 2025-08-09 08:21
昨天下午有个同学找峰哥吐槽,公司让调试机器人,不知道怎么做数据采集和调试,自由度太多了。如何 分析问题也是一头雾水,在校跑跑demo还可以,真的上手真机了,坑还是很多。 这类问题前面在咱们的具身社区里面已经碰到过多次了,如何使用设备?如何有效采集数据?如何部署 VA、VLA模型等。是采集背景太复杂还是数据比较dirty? 后面我们也很快给他相关答复,快速用到项目里 面了。 一个社区能在大家最需要帮助的时候解决问题,无疑是非常有价值的。具身智能之心知识星球(国内首个 具身全栈技术社区),目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。遇到什么问题就 分享什么解决方案,哪块研究最前沿,就给大家源源不断提供解决思路,还有求职岗位第一时间对接给大 家!除了上面的问题,我们还为大家梳理了很多其它的内容: 机器人仿真和数据采集有哪些平台? 人形机器人怎么做模仿学习?VLA为什么难做? VLA在机器人抓取与规划任务中是怎么用的? VLA+RL是怎么做的?为什么work? sim2real效果不好怎么办?real2sim2real是怎么work的? 分层决策一般是怎么做的?和端到端比优势劣势有哪些? 具身机器人的研 ...
给自动驾驶感知工程师的规划速成课
自动驾驶之心· 2025-08-08 16:04
Core Insights - The article discusses the evolution and importance of planning modules in autonomous driving, emphasizing the need for engineers to understand both traditional and machine learning-based approaches to effectively address challenges in the field [5][8][10]. Group 1: Importance of Planning - Understanding planning is crucial for engineers, especially in the context of autonomous driving, as it allows for better service to downstream customers and enhances problem-solving capabilities [8][10]. - The transition from rule-based systems to machine learning systems in planning will likely see a coexistence of both methods for an extended period, with a gradual shift in their usage ratio from 8:2 to 2:8 [8][10]. Group 2: Planning System Overview - The planning system in autonomous vehicles is essential for generating safe, comfortable, and efficient driving trajectories, relying on inputs from perception outputs [11][12]. - Traditional planning modules consist of global path planning, behavior planning, and trajectory planning, with behavior and trajectory planning often working in tandem [12]. Group 3: Challenges in Planning - A significant challenge in the planning technology stack is the lack of standardized terminology, leading to confusion in both academic and industrial contexts [15]. - The article highlights the need for a unified approach to behavior planning, as the current lack of consensus on semantic actions limits the effectiveness of planning systems [18]. Group 4: Planning Techniques - The article outlines three primary tools used in planning: search, sampling, and optimization, each with its own methodologies and applications in autonomous driving [24][41]. - Search methods, such as Dijkstra and A* algorithms, are popular for path planning, while sampling methods like Monte Carlo are used for evaluating numerous options quickly [25][32]. Group 5: Industrial Practices - The article discusses the distinction between decoupled and joint spatiotemporal planning methods, with decoupled solutions being easier to implement but potentially less optimal in complex scenarios [52][54]. - The Apollo EM planner is presented as an example of a decoupled planning approach, which simplifies the problem by breaking it into two-dimensional issues [56][58]. Group 6: Decision-Making in Autonomous Driving - Decision-making in autonomous driving focuses on interactions with other road users, addressing uncertainties and dynamic behaviors that complicate planning [68][69]. - The use of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP) frameworks is essential for handling the probabilistic nature of interactions in driving scenarios [70][74].
自动驾驶中常提的VLM是个啥?与VLA有什么区别?
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The article discusses the significance of Vision-Language Models (VLM) in the context of autonomous driving, highlighting their ability to integrate visual perception and natural language processing to enhance vehicle understanding and interaction with complex road environments [4][19]. Summary by Sections What is VLM? - VLM stands for Vision-Language Model, which combines the capabilities of understanding images and text within a single AI system. It enables deep comprehension of visual content and natural language interaction, enhancing applications like image retrieval, writing assistance, and robotic navigation [6]. How to Make VLM Work Efficiently? - VLM processes raw road images into feature representations using visual encoders, such as Convolutional Neural Networks (CNN) and Vision Transformers (ViT). Language encoders and decoders handle natural language input and output, learning semantic relationships between tokens [8]. Key Mechanism of VLM - The alignment of visual features and language modules is crucial for VLM. Cross-attention mechanisms allow the language decoder to focus on relevant image areas when generating text, ensuring high consistency between generated language and actual scenes [9]. Training Process of VLM - The training process for VLM typically involves pre-training on large datasets followed by fine-tuning with specific datasets related to autonomous driving scenarios, ensuring the model can accurately recognize and respond to traffic signs and conditions [11]. Applications of VLM - VLM supports various intelligent functions, including real-time scene alerts, interactive semantic Q&A, and recognition of road signs and text. It can generate natural language prompts based on visual inputs, enhancing driver awareness and decision-making [12]. Real-time Operation of VLM - VLM operates in a "cloud-edge collaboration" architecture, where large-scale pre-training occurs in the cloud, and optimized lightweight models are deployed in vehicles for real-time processing. This setup allows for quick responses to safety alerts and complex analyses [14]. Data Annotation and Quality Assurance - Data annotation is critical for VLM deployment, requiring detailed labeling of images under various conditions. This process ensures high-quality training data, which is essential for the model's performance in real-world scenarios [14]. Safety and Robustness - Safety and robustness are paramount in autonomous driving. VLM must quickly assess uncertainties and implement fallback measures when recognition errors occur, ensuring reliable operation under adverse conditions [15]. Differences Between VLA and VLM - VLA (Vision-Language-Action) extends VLM by integrating action decision-making capabilities. While VLM focuses on understanding and expressing visual information, VLA encompasses perception, cognition, and execution, making it essential for real-world applications like autonomous driving [18]. Future Developments - The continuous evolution of large language models (LLM) and large vision models (LVM) will enhance VLM's capabilities in multi-modal integration, knowledge updates, and human-machine collaboration, leading to safer and more comfortable autonomous driving experiences [16][19].
从自动驾驶到具身智能,这几个社区撑起了半边天!
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The furniture and autonomous driving industries are experiencing significant growth in production, financing, and recruitment, leading to a highly competitive job market where skilled professionals are in high demand [1]. Group 1: Industry Trends - The industry is focusing on practical technologies, with companies competing to secure talent with relevant skills [1]. - The job market is described as "highly competitive," making it difficult for candidates to secure positions despite the availability of openings [1]. Group 2: Recommended Learning Communities - "Smart Driving Frontier" is a comprehensive media platform dedicated to the autonomous driving sector, providing technical insights and industry news [1]. - "Computer Vision Research Institute" focuses on AI research and practical applications, sharing the latest algorithms and project experiences [3]. - "Visual Language Navigation" aims to create a professional platform for navigation technologies, sharing technical insights and industry news [5]. - "Embodied Intelligence Research Lab" emphasizes core areas such as reinforcement learning and multi-agent collaboration, providing research updates and practical case studies [6]. - "Embodied Intelligence Heart" is the largest community for embodied intelligence, covering various technical directions and encouraging collaboration among developers [7]. - "arXiv Daily Academic Express" offers daily updates on academic papers across multiple fields, including AI and robotics, facilitating quick access to relevant research [8]. - "Autonomous Driving Heart" is a community for developers in the autonomous driving field, focusing on various technical aspects and job opportunities [10].
基于开源Qwen2.5-VL实现自动驾驶VLM微调
自动驾驶之心· 2025-08-08 16:04
Core Viewpoint - The article discusses the advancements in autonomous driving technology, particularly focusing on the LLaMA Factory framework and the Qwen2.5-VL model, which enhance the capabilities of vision-language-action models for autonomous driving applications [4][5]. Group 1: LLaMA Factory Overview - LLaMA Factory is an open-source low-code framework for fine-tuning large models, gaining popularity in the open-source community with over 40,000 stars on GitHub [3]. - The framework integrates widely used fine-tuning techniques, making it suitable for developing autonomous driving assistants that can interpret traffic conditions through natural language [3]. Group 2: Qwen2.5-VL Model - The Qwen2.5-VL model serves as the foundational model for the project, achieving significant breakthroughs in visual recognition, object localization, document parsing, and long video understanding [4]. - It offers three model sizes, with the flagship Qwen2.5-VL-72B performing comparably to advanced models like GPT-4o and Claude 3.5 Sonnet, while smaller versions excel in resource-constrained environments [4]. Group 3: CoVLA Dataset - The CoVLA dataset, comprising 10,000 real driving scenes and over 80 hours of video, is utilized for training and evaluating vision-language-action models [5]. - This dataset surpasses existing datasets in scale and annotation richness, providing a comprehensive platform for developing safer and more reliable autonomous driving systems [5]. Group 4: Model Training and Testing - Instructions for downloading and installing LLaMA Factory and the Qwen2.5-VL model are provided, including commands for setting up the environment and testing the model [6][7]. - The article details the process of fine-tuning the model using the SwanLab tool for visual tracking of the training process, emphasizing the importance of adjusting parameters to avoid memory issues [11][17]. - After training, the fine-tuned model demonstrates improved response quality in dialogue scenarios related to autonomous driving risks compared to the original model [19].
准备扩大自驾团队了,欢迎加入我们~
自动驾驶之心· 2025-08-08 03:20
目前自动驾驶和具身智能两个方向我们已经和业内主流的公司及相关高校建立起深度的合作,大模型方向 也正在快速搭建。我们不止聚焦在技术本身,更愿意和大家一起共创整个AI领域,分享认知成长的喜悦。 对于热门事件,我同样希望我们提供全网独一份的内容价值。 不积跬步无以至千里,我们深知一个人的力量是有限的,所以我们期待更多优秀的小伙伴与我们一起同行~ 内容运营 - 实习生 工作内容: 岗位要求: 1. 自驾、大模型、具身相关研究方向,本科及以上学历,硕士优先; 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 大家好,我们是自动驾驶之心/具身智能/大模型之心Tech团队。非常高兴在这里和你相遇,如果你也认同技 术内容可以改变世界,那你可能就是我们在找的人! 我们在做什么? 我们希望通过技术内容连接学术界和工业界,成为企业和学校沟通的桥梁,更乃至数十万的AI开发者和创 业者。我们致力于为大家带来全网最新最权威的技术信息,团队聚焦在自动驾驶、具身智能、大模型等AI 最前沿的技术领域,涵盖学术论文解读、业内量产方案分析、大模型评测、商业动态、行业招聘、开源项 目等,并通过公众 ...
死磕技术的自动驾驶黄埔军校,4000人了!
自动驾驶之心· 2025-08-08 03:20
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 自动驾驶之心的星友已经正式突破四千人了,三年了不容易。知识星球截止到目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。几个运营的小伙 伴每天都在复盘,什么样的社区才是大家需要的?我们有没有什么地方没有考虑到?花拳绣腿的不行、没人交流的也不行、找不到工作的更不行。 未来我们计划继续优化星球内容,今天也和大家汇报一下。打算开展一个星友面对面的模块,争取每个月线上和大家一起聊聊,针对共性的问题一起探讨下。未 来还将持续邀请邀请学术界和工业界的大佬做一些有深度的圆桌访谈! 我们是一个认真做内容的社区,一个培养未来领袖的地方。自动驾驶之心一直致力在推动行业发展,成为企业和高校沟通的桥梁。我们的愿景是让AI与自动驾驶 走进每个有需要的同学! 目前星球内部为大家梳理了近40+技术路线,无论你是咨询行业应用、还是要找最新的VLA benchmark、综述和学习入门路线,都能极大缩短检索时间。星球还 为大家邀请了数十位自动驾驶领域嘉宾,都是活跃在一线产业界和工业界的大佬(经常出现的顶会和各类访谈中哦)。欢迎随时提问,他 ...
手持激光雷达即可在线实时重建点云!超高性价比3D扫描仪来了~
自动驾驶之心· 2025-08-07 23:32
Core Viewpoint - The GeoScan S1 is presented as the most cost-effective 3D laser scanner in China, designed for various applications such as campus and indoor scene reconstruction, featuring lightweight design and user-friendly operation [1][7]. Group 1: Product Features - The GeoScan S1 offers centimeter-level precision in real-time 3D scene reconstruction using a multi-modal sensor fusion algorithm [1]. - It can generate point clouds at a rate of 200,000 points per second, with a maximum measurement distance of 70 meters and 360° coverage, supporting large scenes over 200,000 square meters [1][27]. - The device is equipped with a built-in Ubuntu system and various sensor devices, allowing for flexible power supply and integration with other equipment [3][10]. - It supports real-time mapping and high-precision modeling, with relative accuracy better than 3 cm and absolute accuracy better than 5 cm [20]. Group 2: User Experience - The device is designed for ease of use, allowing users to start scanning with a single button and export results without complex setups [5]. - It features a compact design with integrated sensors and expandable interfaces, enhancing hardware performance [10][36]. - The GeoScan S1 supports offline and online rendering, providing users with immediate visualization of scanning results [6]. Group 3: Market Positioning - The product is marketed at a competitive price, starting from 19,800 yuan for the basic version, with additional versions available to meet various needs [7][56]. - The company emphasizes its strong background and project validation through collaborations with academic institutions, showcasing its credibility in the industry [7]. Group 4: Application Scenarios - The GeoScan S1 is suitable for a wide range of environments, including office buildings, parking lots, industrial parks, tunnels, forests, and mines, effectively completing 3D scene mapping [36][45]. - It can be integrated with various platforms such as drones, unmanned vehicles, and robots, facilitating unmanned operations [42].
DriveBench:VLM在自动驾驶中真的可靠吗?(ICCV'25)
自动驾驶之心· 2025-08-07 23:32
Core Insights - The article discusses the advancements in Visual Language Models (VLMs) and their potential application in autonomous driving, particularly focusing on the reliability and interpretability of driving decisions generated by VLMs [3][5]. Group 1: DriveBench Overview - DriveBench is introduced as a benchmark dataset designed to evaluate the reliability of VLMs in 17 different settings, comprising 19,200 frames and 20,498 question-answer pairs [3]. - The framework covers four core tasks in autonomous driving: perception, prediction, planning, and behavior, and incorporates 15 types of Out-of-Distribution (OoD) scenarios to systematically test VLMs in complex driving environments [7][9]. Group 2: Presentation Details - The article highlights a live presentation by Shaoyuan Xie, a PhD student at the University of California, Irvine, who will discuss the empirical study on VLMs and their readiness for autonomous driving [9]. - The presentation will cover an overview of VLMs in autonomous driving, the reliability assessment of DriveBench, and future prospects for VLM applications in the industry [9].
快慢双系统评测!Bench2ADVLM:专为自动驾驶VLM设计(南洋理工)
自动驾驶之心· 2025-08-07 23:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 今天自动驾驶之心为大家分享XX最新的工作!如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与 技术交流群加入 ,也欢迎添加小助理微信AIDriver005 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Tianyuan Zhang等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 视觉-语言模型(VLMs)最近已成为自动驾驶(AD)中一个有前景的范式。然而当前对基于VLM的自动驾驶系统(ADVLMs)的性能评估协议主要局限于具有静 态输入的开环设置,忽略了更具现实性和信息性的闭环设置,后者能够捕捉交互行为、反馈弹性和真实世界的安全性。为了解决这一问题,我们引入了 BENCH2ADVLM,这是一个统一的分层闭环评估框架,用于在仿真和物理平台上对ADVLMs进行实时、交互式评估。受认知的双过程理论启发,我们首先通过双 系统适应架构将多种ADVLMs适配到仿真环境中。在此设计中,由目标ADVLMs(快速系统)生成的异构高级驾驶命令被通用VLM(慢速系统)解释为适合在仿 真中执 ...