自动驾驶之心

Search documents
自动驾驶之心端到端VLA技术交流群成立了~
自动驾驶之心· 2025-08-07 23:32
感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称+VLA加群。 自动驾驶之心大模型VLA技术交流群成立了,欢迎大家加入一起交流端到端VLA相关的内容:包括VLA数 据集制作、一段式VLA、分层VLA、基于大模型的端到端方案、基于VLM+DP的方案、量产落地、求职等 内容。 ...
自动驾驶之心内容运营实习生招聘!合伙人1v1培养(仅限一人)
自动驾驶之心· 2025-08-07 12:00
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 大家好,我们是自动驾驶之心/具身智能/大模型之心Tech团队。非常高兴在这里和你相遇,如果你也认同技 术内容可以改变世界,那你可能就是我们在找的人! 我们在做什么? 我们希望通过技术内容连接学术界和工业界,成为企业和学校沟通的桥梁,更乃至数十万的AI开发者和创 业者。我们致力于为大家带来全网最新最权威的技术信息,团队聚焦在自动驾驶、具身智能、大模型等AI 最前沿的技术领域,涵盖学术论文解读、业内量产方案分析、大模型评测、商业动态、行业招聘、开源项 目等,并通过公众号、社群、视频号、知乎、小红书、B站等平台进行内容分享、粉丝交流及企业联系。 目前自动驾驶和具身智能两个方向我们已经和业内主流的公司及相关高校建立起深度的合作,大模型方向 也正在快速搭建。我们不止聚焦在技术本身,更愿意和大家一起共创整个AI领域,分享认知成长的喜悦。 对于热门事件,我同样希望我们提供全网独一份的内容价值。 不积跬步无以至千里,我们深知一个人的力量是有限的,所以我们期待更多优秀的小伙伴与我们一起同行~ 内容运营 - 实习生 工作内容: ...
自动驾驶之心项目与论文辅导来了~
自动驾驶之心· 2025-08-07 12:00
Core Viewpoint - The article announces the launch of the "Heart of Autonomous Driving" project and paper guidance, aimed at assisting students facing challenges in their research and development efforts in the field of autonomous driving [1]. Group 1: Project and Guidance Overview - The project aims to provide support for students who encounter difficulties in their research, such as environmental configuration issues and debugging challenges [1]. - Last year's outcomes were positive, with several students successfully publishing papers in top conferences like CVPR and ICRA [1]. Group 2: Guidance Directions - **Direction 1**: Focus on multi-modal perception and computer vision, end-to-end autonomous driving, large models, and BEV perception. The guiding teacher has published over 30 papers in top AI conferences with a citation count exceeding 6000 [3]. - **Direction 2**: Emphasis on 3D Object Detection, Semantic Segmentation, Occupancy Prediction, and multi-task learning based on images or point clouds. The guiding teacher is a top-tier PhD with multiple publications in ECCV and CVPR [5]. - **Direction 3**: Concentration on end-to-end autonomous driving, OCC, BEV, and world model directions. The guiding teacher is also a top-tier PhD with contributions to several mainstream perception solutions [6]. - **Direction 4**: Focus on NeRF / 3D GS neural rendering and 3D reconstruction. The guiding teacher has published four CCF-A class papers, including two in CVPR and two in IEEE Transactions [7].
这个2000人的具身社区,有点料......
自动驾驶之心· 2025-08-07 09:52
昨天下午有个同学找峰哥吐槽,刚入职某具身公司,老大让调试机器人,不知道怎么做数据采集和调试, 自由度太多了。如何分析问题也是一头雾水,在校跑跑demo还可以,真的上手真机了,坑还是很多。 这类问题前面在咱们的具身社区里面已经碰到过多次了,如何使用设备?如何有效采集数据?如何部署 VA、VLA模型等。是采集背景太复杂还是数据比较dirty? 后面我们也很快给他相关答复,快速用到项目里 面了。 一个社区能在大家最需要帮助的时候解决问题,无疑是非常有价值的。具身智能之心知识星球(国内首个 具身全栈技术社区),目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。遇到什么问题就 分享什么解决方案,哪块研究最前沿,就给大家源源不断提供解决思路,还有求职岗位第一时间对接给大 家!除了上面的问题,我们还为大家梳理了很多其它的内容: 机器人仿真和数据采集有哪些平台? 人形机器人怎么做模仿学习?VLA为什么难做? VLA在机器人抓取与规划任务中是怎么用的? VLA+RL是怎么做的?为什么work? sim2real效果不好怎么办?real2sim2real是怎么work的? 分层决策一般是怎么做的?和端到端比优势劣势有哪些 ...
万字长文!RAG实战全解析:一年探索之路
自动驾驶之心· 2025-08-07 09:52
Core Viewpoint - The article discusses the Retrieval Augmented Generation (RAG) method, which combines retrieval-based models and generative models to enhance the quality and relevance of generated text. It addresses issues such as hallucination, knowledge timeliness, and long text processing in large models [1]. Group 1: Background and Challenges - RAG was proposed by Meta in 2020 to enable language models to access external information beyond their internal knowledge [1]. - RAG faces three main challenges: retrieval quality, enhancement process, and generation quality [2]. Group 2: Challenges in Retrieval Quality - Semantic ambiguity can arise from vector representations, leading to irrelevant results [5]. - User input has become more complex, transitioning from keywords to natural dialogue, which complicates retrieval [5]. - Document segmentation methods can affect the matching degree between document blocks and user queries [5]. - Extracting and representing multimodal content (e.g., tables, charts) poses significant challenges [5]. - Integrating context from retrieved paragraphs into the current generation task is crucial for coherence [5]. - Redundancy and repetition in retrieved content can lead to duplicated information in generated outputs [5]. - Determining the importance of multiple retrieved paragraphs for the generation task is challenging [5]. - Over-reliance on retrieval content can exacerbate hallucination issues [5]. - Irrelevance of generated answers to the query is a concern [5]. - Toxicity or bias in generated answers is another issue [5]. Group 3: Overall Architecture - The product architecture consists of four layers, including model layer, offline understanding layer, online Q&A layer, and scenario layer [7]. - The RAG framework is divided into three main components: query understanding, retrieval model, and generation model [10]. Group 4: Query Understanding - The query understanding module aims to improve retrieval by interpreting user queries and generating structured queries [14]. - Intent recognition helps select relevant modules based on user queries [15]. - Query rewriting utilizes LLM to rephrase user queries for better retrieval [16]. - Query expansion breaks complex questions into simpler sub-questions for more effective retrieval [22]. Group 5: Retrieval Model - The retrieval model's effectiveness depends on the accuracy of embedding models [33]. - Document loaders facilitate loading document data from various sources [38]. - Text converters prepare documents for retrieval by segmenting them into smaller, semantically meaningful chunks [39]. - Document embedding models create vector representations of text to enable semantic searches [45]. - Vector databases support efficient storage and search of embedded data [47]. Group 6: Generation Model - The generation model utilizes retrieved information to generate coherent responses to user queries [60]. - Different strategies for prompt assembly are employed to enhance response generation [62][63]. Group 7: Attribution Generation - Attribution in RAG is crucial for aligning generated content with reference information, ensuring accuracy [73]. - Dynamic computation methods can enhance the generation process by matching generated text with reference sources [76]. Group 8: Evaluation - The article emphasizes the importance of defining metrics and evaluation methods for assessing RAG system performance [79]. - Various evaluation frameworks, such as RGB and RAGAS, are introduced to benchmark RAG systems [81]. Group 9: Conclusion - The article summarizes key modules in RAG practice and highlights the need for continuous research and development to refine these technologies [82].
自动驾驶运动规划发展到了什么阶段?
自动驾驶之心· 2025-08-06 23:34
Core Insights - The article discusses the advancements in end-to-end (end2end) autonomous driving systems, highlighting the prominence of Behavior-Driven End-to-End (BEV) frameworks while noting the ongoing challenges in planning due to interaction modeling complexities [2][40]. Group 1: Interaction Modeling - Interaction modeling is identified as a critical area in planning, involving game theory and uncertainty modeling, which current supervised learning methods struggle to address effectively [2][5]. - The report emphasizes the importance of incorporating ego and agent trajectories into loss functions or constraints to enhance planning outcomes [2][5]. Group 2: Planning Frameworks - Various frameworks for interactive planning are discussed, including POMDP, contingency planners, and game theory approaches, focusing on how to integrate interaction within the planning pipeline [5][40]. - The article outlines a typical interactive planning process that includes perturbing ego trajectories, predicting all agents' movements, and employing dynamic programming to derive optimal policies [6][12]. Group 3: Loss Functions and Constraints - The loss function for planning is detailed, incorporating terms for collision avoidance between ego and agent trajectories, with specific components for prediction accuracy and collision penalties [9][16]. - The article explains how interaction is modeled within the loss function, ensuring that agent predictions do not lead to collisions with the ego vehicle [9][16]. Group 4: Real-Time Optimization - The article discusses latency issues in planning and proposes using Alternating Direction Method of Multipliers (ADMM) to achieve real-time performance, achieving up to 125Hz with multiple agents [19][18]. - It highlights the need for efficient optimization techniques to reduce computation time, with a focus on achieving real-time capabilities in autonomous driving systems [19][18]. Group 5: Future Considerations - The article raises questions about the effectiveness of prediction-oriented methods in dynamic scenarios, suggesting that these methods may not adequately address counterfactual situations where agent behavior diverges from predictions [41][42]. - It discusses the necessity for improved prediction models and the potential for modular frameworks to enhance trajectory optimization in autonomous vehicles [45][44].
自动驾驶大模型方案:视觉语言模型VLM工作一览,面向量产和研究~
自动驾驶之心· 2025-08-06 23:34
Core Insights - The article emphasizes the transformative potential of Vision-Language Models (VLMs) in enhancing the perception and cognitive capabilities of autonomous driving systems, enabling them to not only "see" but also "understand" complex driving environments [2][3]. Group 1: VLM Applications in Autonomous Driving - VLMs can surpass traditional visual models by integrating camera images or video streams to comprehend semantic information in traffic scenes, such as recognizing complex scenarios like "a pedestrian waving to cross the street" [6]. - VLMs facilitate the conversion of intricate visual scenes into clear natural language descriptions, enhancing the interpretability of decisions made by autonomous systems, which aids in debugging and increases trust among passengers and regulators [6]. - VLMs are crucial for natural language interactions in future smart cabins, allowing passengers to communicate intentions to vehicles through spoken commands [6]. Group 2: Scenario Generation and Testing - The article introduces CrashAgent, a multi-agent framework that utilizes multi-modal large language models to convert accident reports into structured scenarios for simulation environments, addressing the long-tail distribution issue in existing datasets [7]. - CurricuVLM is proposed as a personalized curriculum learning framework that leverages VLMs to analyze agent behavior and dynamically generate tailored training scenarios, improving safety in autonomous driving [13]. - TRACE is a framework that generates key test cases from real accident reports, significantly enhancing the efficiency of defect detection in autonomous driving systems [17]. Group 3: Out-of-Distribution (OOD) Scenario Generation - A framework utilizing large language models is proposed to generate diverse OOD driving scenarios, addressing the challenges posed by the sparsity of such scenarios in urban driving datasets [21][22]. - The article discusses the development of a method to automatically convert real-world driving videos into detailed simulation scenarios, enhancing the testing of autonomous driving systems [26]. Group 4: Enhancing Safety and Robustness - WEDGE is introduced as a synthetic dataset created from generative vision-language models, aimed at improving the robustness of perception systems in extreme weather conditions [39][40]. - LKAlert is a predictive alert system that utilizes VLMs to forecast potential lane-keeping assist (LKA) risks, enhancing driver situational awareness and trust [54][55]. Group 5: Advancements in Decision-Making Frameworks - The CBR-LLM framework combines semantic scene understanding with case retrieval to enhance decision-making in complex driving scenarios, improving accuracy and reasoning consistency [44][45]. - ORION is presented as a holistic end-to-end autonomous driving framework that integrates visual-language instructed action generation, achieving superior performance in closed-loop evaluations [69][70].
喧嚣过后, 理想i8后续口碑会非常高
自动驾驶之心· 2025-08-06 23:34
理想TOP2 . 找对社群,深度交流理想长期基本面 作者 | 理想TOP2 来源 | 理想TOP2 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 以下文章来源于理想TOP2 ,作者理想TOP2 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文逻辑链与根基前提将很不常规,与主流观点有巨大非共识,预期大多数人会不同意。 本文经过非常深入的思考,由于现实世界的复杂性与多样性,深入思考后的观点依然不低可能是错误的,强烈不建议读者默认TOP2观点为真,强烈建议 读者充分批判性阅读。 核心观点: 1.i8不会等太久,市场口碑与反应都会挺好,不会太久,具体多久不好说,但会明显短于MEGA口碑发酵时间。 2.i8会促进MEGA订单,会吃掉一部分L789订单。下一代增程车又会重新吃回一部分纯电订单。今年L系列订单会相当一般。 互联网上出现了海量的对i8 SKU设置或其他方面的吐槽,本质是没有满足相当多人认为i8应该一下子就让人立即想买,超短期立马爆单的预期。这个预 期落空后,找的接口。超级多人认为是SKU设置或取消免息或其他各种原因导致这个预期落空。 本文只做学术 ...
具身智能数采方案:全身动捕工作一览
自动驾驶之心· 2025-08-06 23:34
最近很多同学咨询我们全身动捕数据的方案,相比于遥操作/VR+动捕手套,这种方案技术难度上更大,今 天也为大家汇总几篇行业里面比较知名的全身动捕方案。更多内容欢迎移步到具身智能之心知识星球,一 个交流技术和方案的地方。 OpenWBC 项目链接:https://github.com/jiachengliu3/WBC_Deploy 本项目实现了对 Unitree G1 机器人的全身控制:使用 Apple Vision Pro 结合 avp_teleoperate 控制机器人上 半身,使用 OpenHomie 算法控制下半身运动。同时支持 全身数据采集 功能。 主要功能特性: TWIST TWIST: Teleoperated Whole-Body Imitation System 项目链接:https://yanjieze.com/TWIST/ 斯坦福大学团队出品的工作。全身远程操控人形机器人标志着迈向通用机器人智能的关键一步,而人体运 动为控制所有自由度提供了理想接口。然而,当前大多数人形机器人遥操作系统难以实现协调的全身行 为,通常仅限于孤立的移动或操作任务。团队提出"全身运动模仿遥操作系统"(TWIST), ...
新势力提前批,跪了。。。
自动驾驶之心· 2025-08-06 11:25
Core Viewpoint - The article emphasizes the importance of preparing for non-technical interview questions in the autonomous driving industry, highlighting the need for candidates to articulate their interests, communication skills, and learning abilities effectively [6][10][11]. Group 1: Interview Preparation - Candidates should reflect on their interests and experiences to answer open-ended questions, as interviewers are often looking for personal insights and opinions [6][10]. - Communication skills are crucial; candidates should demonstrate their ability to engage with mentors and express their thought processes when seeking guidance [6][10]. - Learning ability is assessed through candidates' approaches to acquiring new technical knowledge, emphasizing the importance of establishing a comprehensive understanding before diving into specifics [7][10]. Group 2: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" community provides a platform for technical exchange, featuring members from renowned universities and leading companies in the autonomous driving sector [23][11]. - The community offers a wealth of resources, including over 40 technical routes and numerous open-source projects, aimed at facilitating learning and career development in the autonomous driving field [11][19]. - Members can access job opportunities and industry insights, fostering a complete ecosystem for autonomous driving professionals [21][22]. Group 3: Learning and Development - The community has curated a comprehensive learning path for beginners and advanced researchers, covering various aspects of autonomous driving technology [17][19]. - Regular discussions and Q&A sessions are held to address common industry challenges and share knowledge on emerging technologies [24][87]. - The platform also features live sessions with industry experts, providing members with direct access to cutting-edge research and practical applications [86][11].