自动驾驶之心
Search documents
史上最全robot manipulation综述,多达1200篇!八家机构联合发布
自动驾驶之心· 2025-10-14 23:33
Core Insights - The article discusses the rapid advancements in artificial intelligence, particularly in embodied intelligence, which connects cognition and action, emphasizing the importance of robot manipulation in achieving general artificial intelligence (AGI) [5][9]. Summary by Sections Overview of Robot Manipulation - The paper titled "Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey" provides a comprehensive overview of the field of robot manipulation, detailing the evolution from rule-based control to intelligent control systems that integrate reinforcement learning and large models [6][10]. Key Challenges in Embodied Intelligence - Robot manipulation is identified as a core challenge in embodied intelligence due to its requirement for seamless integration of perception, planning, and control, which is essential for real-world interactions in diverse and unstructured environments [9][10]. Unified Framework - A unified understanding framework is proposed, which expands the traditional high-level planning and low-level control paradigm to include language, code, motion, affordance, and 3D representation, enhancing the semantic decision-making role of high-level planning [11][21]. Classification of Learning Control - A novel classification method for low-level learning control is introduced, dividing it into input modeling, latent learning, and policy learning, providing a systematic perspective for research in low-level control [24][22]. Bottlenecks in Robot Manipulation - The article identifies two major bottlenecks in robot manipulation: data collection and utilization, and system generalization capabilities, summarizing existing research progress and solutions for these challenges [27][28]. Future Directions - Four key future directions are highlighted: building a true "robot brain" for general cognition and control, breaking data bottlenecks for scalable data generation and utilization, enhancing multimodal perception for complex object interactions, and ensuring human-robot coexistence safety [35][33].
复旦SeerDrive:一种轨迹规划和场景演化的双向建模端到端框架
自动驾驶之心· 2025-10-14 23:33
Core Insights - The article discusses the advancements in end-to-end autonomous driving, specifically focusing on the SeerDrive model, which aims to improve trajectory planning by incorporating bidirectional modeling of trajectory planning and scene evolution [1][3][4]. Group 1: SeerDrive Overview - SeerDrive introduces a bidirectional modeling paradigm that captures scene dynamics while allowing planning results to optimize scene predictions, creating a closed-loop iteration [3][4]. - The overall pipeline of SeerDrive consists of four main modules: feature encoding, future BEV world modeling, future perception planning, and iterative optimization [4]. Group 2: Challenges in Current Systems - Current one-shot paradigms in autonomous driving overlook dynamic scene evolution, leading to inaccurate planning in complex interactions [5]. - Existing systems fail to model the impact of vehicle behavior on the surrounding environment, which is crucial for accurate trajectory planning [5]. Group 3: Technical Components - Feature encoding transforms multimodal sensor inputs and vehicle states into structured features, laying the groundwork for subsequent modeling [8][9]. - Future BEV world modeling predicts scene dynamics by generating future BEV features, balancing efficiency and structured representation [10][13]. Group 4: Planning and Optimization - SeerDrive employs a decoupled strategy for planning, allowing current and future scenes to guide planning separately, thus avoiding representation entanglement [15]. - The iterative optimization process enhances the bidirectional dependency between trajectory planning and scene evolution, leading to improved performance [17]. Group 5: Experimental Results - SeerDrive achieved a PDMS score of 88.9 on the NAVSIM test set, outperforming several state-of-the-art methods [23]. - In the nuScenes validation set, SeerDrive demonstrated an average L2 displacement error of 0.43m and a collision rate of 0.06%, significantly better than competing methods [24]. Group 6: Component Effectiveness - The removal of future perception planning or iterative optimization resulted in a decrease in PDMS scores, indicating the importance of these components for performance enhancement [26]. - The design choices, such as the decoupled strategy and the use of anchored endpoints for future ego feature initialization, proved to be critical for achieving optimal results [30]. Group 7: Limitations and Future Directions - The BEV world model does not leverage the generalization capabilities of foundational models, which could enhance performance in complex scenarios [41]. - Future research may explore the integration of foundational models with planning to improve generalization while maintaining efficiency [41].
学术和量产的分歧,技术路线的持续较量!从技术掌舵人的角度一览智驾的十年路....
自动驾驶之心· 2025-10-14 23:33
Core Insights - The article discusses the significant technological advancements in autonomous driving over the past decade, highlighting key innovations such as Visual Transformers, BEV perception, multi-sensor fusion, end-to-end autonomous driving, large models, VLA, and world models [3][4]. Group 1: Technological Milestones - The past ten years have seen remarkable technological developments in autonomous driving, with various solutions emerging through the collision and fusion of different technologies [3]. - A roundtable discussion is set to reflect on the technological milestones in the industry, focusing on the debate between world models and VLA [4][13]. Group 2: Industry Perspectives - The roundtable will feature insights from top industry leaders, discussing the evolution of autonomous driving technology and providing career advice for newcomers in the field [4][5]. - The discussion will also cover the perspectives of academia and industry regarding L3 autonomous driving, emphasizing the convergence of research directions and the practical implementation in engineering [13]. Group 3: Future Directions - The article raises questions about the future direction of autonomous driving technology, particularly the role of end-to-end systems as a foundational element of intelligent driving technology [13]. - It highlights the ongoing competition between academic research and engineering practices in the field, suggesting a need for new entrants to adapt and innovate [13].
提供最专业的平台和运营团队!我们正在招募运营的同学~
自动驾驶之心· 2025-10-14 07:12
Core Viewpoint - The company has evolved from a small workshop to a platform with significant technical depth and breadth, indicating a growing demand in the industry for embodied intelligence and related technologies [1]. Group 1: Team and Operations - The team has spent over two years developing four key IPs: Embodied Intelligence, Autonomous Driving, 3D Vision, and Large Model Tech, with a total online following of nearly 360,000 across various platforms [1]. - The company is currently hiring for full-time and part-time positions in operations and sales to support its expanding business lines [2]. Group 2: Job Responsibilities and Requirements - The operations role includes managing course progress, enhancing platform engagement, planning commercialization projects, and creating content related to the AI industry [4]. - The sales role involves creating promotional content for online and hardware products and liaising with hardware manufacturers and academic/enterprise clients [5][6]. - Candidates for both roles are expected to have strong execution, communication skills, and a background in computer science, AI, or robotics, with familiarity in social media operations being a plus [12]. Group 3: Growth Opportunities - The company offers exposure to top-tier operational teams, providing opportunities to learn operational techniques and sales strategies, leading to rapid personal growth [7]. - Employees will engage with cutting-edge content in autonomous driving, embodied intelligence, 3D vision, and large models, broadening their technical perspective [8]. - There are opportunities for further academic pursuits, such as research and doctoral studies, which can enhance personal development [9].
观点分享:VLA解决的是概念认知,无法有效的建模真实世界的四维时空?
自动驾驶之心· 2025-10-14 07:12
Core Viewpoint - The article discusses the importance of world models in intelligent driving, emphasizing that true understanding of the environment requires a high-bandwidth cognitive system rather than merely extending language models [2][3][5]. Summary by Sections World Model vs. Language Model - The world model focuses on spatiotemporal cognition, while the language model addresses conceptual cognition. Language models have low bandwidth and sparsity, making them ineffective for modeling the real world's four-dimensional space-time [2][3]. - The world model aims to establish capabilities directly at the video level, rather than converting information into language first [3][4]. VLA and WA - VLA (Vision-Language Architecture) is essentially an extension of language models, adding new modalities but still rooted in language. In contrast, the world model seeks to create a comprehensive cognitive system [3][5]. - The ultimate goal of autonomous driving is to achieve open-set interactions, allowing users to express commands freely without being limited to a fixed set of instructions [3][4]. Importance of Language - Language remains crucial for three main reasons: 1. Incorporating physical laws such as gravity and inertia into the model [6]. 2. Understanding and predicting object movements in three-dimensional space over time [6]. 3. Absorbing vast amounts of data from the internet, which aids in training autonomous driving systems [7]. Integration of Models - The combination of language models (conceptual cognition) and world models (spatiotemporal cognition) is essential for advancing towards Artificial General Intelligence (AGI) [8]. Industry Trends - The autonomous driving industry is experiencing intense competition, with many professionals considering transitioning to embodied AI due to the saturation of current technologies [9]. - The ongoing debate between VLA and WA represents a larger industry transformation, highlighting the need for innovative solutions to break through current limitations [9]. Community and Resources - A community platform has been established to facilitate knowledge sharing and collaboration among professionals in the autonomous driving field, featuring resources such as learning routes, technical discussions, and job opportunities [25][26].
FutureSightDrive:世界模型&VLM 统一训练
自动驾驶之心· 2025-10-13 23:33
作者 | 么么牛 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1961012043571266494 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 | https://arxiv.org/pdf/2505.17685 | | --- | | Q1: 这篇论文试图解决什么问题? | 这篇论文试图解决自动驾驶中视觉语言模型(VLMs)在进行轨迹规划和场景理解时存在的时空关系模糊和细粒度信息丢失的问题。现有的VLMs通常使用离散 的文本链式思考(Chain-of-Thought, CoT)来处理当前场景,这种方法本质上是对视觉信息的高度抽象和符号化压缩,可能导致时空关系不明确、细粒度信息丢 失以及模态转换的差距。论文提出了一种新的时空链式思考(spatio-temporal CoT)方法,使模型能够通过视觉方式思考,从而更有效地进行轨迹规划和场景理 解。 Q2: 有哪些相关研究? 论文中提到了以下相关研究: 统一多模态理解 ...
开放几个自动驾驶技术交流群(世界模型/端到端/VLA)
自动驾驶之心· 2025-10-13 23:33
Group 1 - The establishment of a technical exchange group focused on autonomous driving technology has been announced, covering areas such as world models, end-to-end systems, and VLA [1] - The company invites interested individuals to join the discussion by adding a designated assistant on WeChat with specific instructions for group entry [1]
地平线残差端到端是如何实现的?ResAD:残差学习让自动驾驶决策更接近人类逻辑
自动驾驶之心· 2025-10-13 23:33
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Zhiyu Zheng等 编辑 | 自动驾驶之心 想让车子自己开,传统方法得像搭积木:先"看"(感知),再"猜"(预测),最后"做决定"(规划)。这套流程环环相扣,一个环节出错,后面全跟着错, 既不高效,也不安全。 于是, 端到端自动驾驶 成了一条新路。它想让AI像老司机一样,直接把看到的(传感器数据)变成要走的路线(未来轨迹)。想法很美好,但现实很骨 感:现有的端到端模型,大多在死磕一个问题—— "未来的轨迹长啥样?" 为了解决这些问题,地平线、华科和武大的团队提出了 ResAD 框架。核心思想很简单: 不直接预测整条轨迹,而是先给一个"惯性参考线"——就是车子如 果不动方向盘会走的路线。然后,让模型只学习一个"调整量"(残差),即为了安全行驶,需要偏离这根参考线多少。 这样一来,学习目标就从 "轨迹是什么?" 变成了 "为什么要调整方向?" 。模型被迫去关注那些导致调整的真实原因,比如障碍物、交通规则等,而不是死 记硬背数据里的巧合。 我们 ...
工业界大佬带队!自动驾驶4D标注全流程实战(动静态/OCC)
自动驾驶之心· 2025-10-13 23:33
Core Insights - The article emphasizes the importance of automated 4D annotation data in enhancing autonomous driving capabilities, driven by the need for complex training data formats [1] - It highlights the challenges faced in automated annotation, including sensor calibration, occlusion handling, and quality control of annotations [3] Group 1: Automated 4D Annotation - The backbone of autonomous driving capabilities is the vast amount of training data generated through automated 4D annotation processes [1] - The complexity of training data requirements has increased, necessitating synchronized annotations of dynamic and static elements, occlusions, and trajectories [1] - The significance of automated 4D annotation is growing due to the rising complexity of annotation demands [1] Group 2: Challenges in Automated Annotation - Key challenges in automated annotation include calibrating and synchronizing different sensors across various driving scenarios [3] - Issues such as occlusion between sensors and maintaining algorithm generalization are critical pain points in the industry [3] - The need for high-quality annotation results and effective automated quality checks is paramount [3] Group 3: Course Offering - A course titled "Automated Driving 4D Annotation Algorithm Employment Class" is being offered to address these challenges, featuring insights from industry leaders [3][4] - The course aims to provide a comprehensive understanding of the entire process of 4D automated annotation and core algorithms, along with practical exercises [6] - Key topics include dynamic obstacle detection, static element annotation, and mainstream paradigms for end-to-end annotation [6]
小米第三款车要来了!雷军胡峥楠亲赴新疆试车,多张谍照曝光
自动驾驶之心· 2025-10-13 04:00
Core Viewpoint - Xiaomi is actively testing its third vehicle model, tentatively named Xiaomi YU9, in Xinjiang, indicating significant progress in its automotive development [2][3][40]. Group 1: Testing and Development - Xiaomi founder Lei Jun is personally overseeing high-altitude tests for the Xiaomi YU9 in Xinjiang, with a team of over 20 engineers involved in various testing aspects, including range and charging capabilities [15][25]. - Multiple Xiaomi executives, including senior advisor Hu Zhengnan and VP Zhang Jianhui, are also participating in the testing, suggesting a major testing initiative [5][25][40]. - Videos and images of the Xiaomi YU9 undergoing road tests have surfaced, showing a camouflaged vehicle with a specific identification number [9][10][41]. Group 2: Vehicle Specifications - The Xiaomi YU9 is expected to feature a range-extended powertrain, as indicated by the presence of an exhaust system in leaked images [26][27]. - The vehicle's dimensions are projected to exceed 5.2 meters in length and approximately 1.8 meters in height, making it a large SUV [28][29][33]. - It is anticipated that the Xiaomi YU9 will have a battery capacity of 80 kWh, with an electric range exceeding 400 km [28][29]. Group 3: Market Impact and Future Plans - The introduction of the Xiaomi YU9 is expected to enhance Xiaomi's product lineup, potentially leading to increased sales targets following the recent record of over 40,000 units delivered in September [42][40]. - The vehicle is projected to be launched in 2026, aligning with Xiaomi's strategy to expand its automotive offerings [39][40].