自动驾驶之心

Search documents
3D/4D World Model(WM)近期发展的总结和思考
自动驾驶之心· 2025-09-16 23:33
Core Viewpoint - The article discusses the current state of embodied intelligence, focusing on data collection and utilization, and emphasizes the importance of 3D/4D world models in enhancing spatial understanding and interaction capabilities in autonomous driving and related fields [3][4]. Group 1: 3D/4D World Models - The development of 3D/4D world models has diverged into two main approaches: implicit and explicit models, each with its own limitations [4][7]. - Implicit models enhance spatial understanding by extracting 3D/4D content, while explicit models require detailed structural information to ensure system stability and usability [7][8]. - Current research primarily focuses on static 3D scenes, with methods for constructing and enriching environments being well-established and ready for practical application [8]. Group 2: Challenges and Solutions - Existing challenges in 3D geometry modeling include the rough optimization of physical surfaces and the visual gap between generated meshes and real-world applications [9][10]. - The integration of mesh supervision and structured processing is being explored to improve surface quality in 3D reconstruction [10]. - The need for cross-physics simulator platform deployment is highlighted, as existing solutions often rely on specific physics parameters from platforms like Mujoco [10]. Group 3: Video Generation and Motion Understanding - The emergence of large-scale data cleaning and annotation has improved motion prediction capabilities in 3D models, with advancements in 3DGS/4DGS and world model integration [11]. - Current video generation techniques struggle with understanding physical interactions and changes in the environment, indicating a gap in the ability to simulate realistic motion [15]. - Future developments may focus on combining simulation and video generation to enhance the understanding of physical properties and interactions [15]. Group 4: Future Directions - The article predicts that future work will increasingly incorporate physical knowledge into 3D/4D models, aiming for better direct physical understanding and visual reasoning capabilities [16]. - The evolution of world models is expected to become modular within embodied intelligence frameworks, depending on ongoing research and simplification of world model definitions [16].
面对已读乱回的AI,到底要如何分辨真假?哈工大&华为大模型幻觉综述!
自动驾驶之心· 2025-09-16 23:33
❝ 面对已读乱回的AI,到底要如何分辨真假? 随着大模型的兴起,越来越多的人开始接触和使用大模型,绝大多数使用大模型的人应该都碰到过大模型出错的情况。作为主流的大语言模型,我们主要通过文本QA或 者指令的方式来与大模型进行交互,而大模型幻觉就是大模型在回复中出现的一种比较常见的错误。 由华为和哈工大合作发表的论文综述 《A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open QuestionsA Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions》,很全面的讲清楚了大模型幻觉的定义和原因,及其缓解的方法。 论文地址:https://arxiv.org/abs/2311.05232 论文在开头部分介绍了大模型的相关背景知识。 相关知识 大模型的定义 LLMS指的是一系列通用模型,这些模型基本都是基于transformer网络结构 ...
国内首个自动驾驶VLA实战课程来了(模块化/一体化/推理增强VLA)
自动驾驶之心· 2025-09-16 10:49
VLA绝对是今年自动驾驶学术界和工业界的主流关键词。 去年的端到端+VLM,标志着智能驾驶从规则驱动向数据驱动的根本转变。在实际中使用我们发现,端到端虽然提供了一个打通上下游视角的能力,但面对复杂的困难场 景仍然受限。如果在自动驾驶公司工作过,就知道量产模型的迭代仍然被限制在无限corner case的循环中。 VLA本质上也可以算作是一种端到端,不过更加直白和干净,很多方法也取消了传统端到端的复杂的3D感知任务。借鉴VLM更强大的通用泛化能力,除了任务更简洁, VLA更重要的还是提供了一种解决corner case的可能性。 而随着学术界和工业界的目光投向端到端这个技术领域,我们发现了很多问题。自动驾驶VLA的技术栈仍然没有收敛!一系列算法如雨后春笋般冒出: 因此我们联合国内外的教研团队共同打造了《自动驾驶VLA实战教程》,针对自动驾驶VLA的技术栈进行了全面的梳理。 学习自动驾驶VLA,是一个一站式强化多领域 知识的好机会。视觉感知、语言模块、动作模块,配套大模型的前沿技术(RAG/CoT/强化学习/MoE)等等,涉及的技术栈非常广。但这样的学习路径往往非常痛苦。同 时掌握多个领域的知识已经足够困难,而各 ...
BEVTraj:一个端到端的无地图轨迹预测新框架
自动驾驶之心· 2025-09-16 07:22
作者 | 我爱计算机视觉 来源 | 我爱计算机视觉 原文链接: https://zhuanlan.zhihu.com/p/1950969540805656985 在自动驾驶技术中,准确预测道路上其他车辆和行人的未来轨迹,是保障行车安全、实现高效导航的核心 环节。当前,最先进的轨迹预测方法大多严重依赖于高精地图(HD Map),利用其提供的车道线、路口拓 扑等精细信息作为强大的先验知识。然而,高精地图的制作和维护成本高昂,覆盖范围有限,且无法应对 临时的道路施工或交通事故等动态变化,这极大地限制了自动驾驶技术的规模化应用。 为了摆脱对高精地图的依赖,来自韩国国民大学的研究团队提出了一个名为 BEVTraj 的全新轨迹预测框 架。该方法完全无需任何地图,直接在BEV空间中处理实时的原始传感器数据,实现了端到端的轨迹预 测。令人瞩目的是,实验证明,BEVTraj的性能足以媲美基于高精地图的SOTA模型,为构建更灵活、更具 扩展性的自动驾驶系统开辟了新道路。 研究背景:高精地图是"蜜糖"还是"砒霜"? 高精地图为轨迹预测提供了丰富的结构化信息,无疑是提升预测精度的"蜜糖"。但它的"保质期"太短(无 法实时更新),"售 ...
中国具身智能的技术一号位们
自动驾驶之心· 2025-09-16 03:34
Core Viewpoint - The article highlights the rapid development and commercialization of embodied intelligence, emphasizing the competitive landscape among global teams and the importance of technological breakthroughs in the field [4][5]. Group 1: Industry Overview - The last two years have seen significant advancements in hardware, data collection, and algorithms, leading to the expansion of embodied intelligence beyond laboratory settings [4]. - Embodied intelligence has become a recognized core direction for commercialization globally, with various teams competing intensely in this space [4]. - The next generation of technological breakthroughs will focus on general embodied intelligence and scene-adaptive learning [4]. Group 2: Key Players in Embodied Intelligence - **Yushu Technology**: Founded by Wang Xingxing, the company specializes in quadruped robots and has developed multiple models, including Laikago and AlienGo. Wang has over 10 years of experience in robot development and holds over 100 patents [8]. - **Xinghai Map**: Co-founded by Zhao Xing, the company focuses on embodied intelligence and multimodal learning, contributing to the development of the first mass-produced autonomous driving model based on large models [12][13]. - **Galaxy General**: Founded by Wang He, the company is dedicated to embodied intelligence and humanoid robots, with significant research contributions in 3D vision and robot learning [18]. - **Zhiyuan Robotics**: Led by Luo Jianlan, the company focuses on high-precision assembly tasks using reinforcement learning, achieving a 100% success rate in real-world applications [23]. - **Variable Robotics**: Co-founded by Wang Hao, the company aims to integrate large models with embodied intelligence, launching the WALL-A model, which is the largest operational model globally [26]. - **Zhuji Power**: Founded by Zhang Wei, the company is developing full-size humanoid robots and has launched the W1 commercial robot, with plans for mass production of humanoid robots by 2025 [30]. - **Stardust Intelligence**: Founded by Lai Jie, the company focuses on creating AI robots for household use, achieving breakthroughs in embodied intelligence data acquisition [32]. - **Cloud Deep**: Founded by Zhu Qiuguo, the company specializes in humanoid and quadruped robots, with a strong emphasis on self-research and development of core components [34]. - **Qianxun Intelligence**: Founded by Han Fengtao, the company has developed the Moz1 humanoid robot, which features advanced control capabilities and has raised over 1 billion yuan in funding [38]. - **Physical Intelligence**: Co-founded by Sergey Levine, the company focuses on creating advanced AI models for robots, achieving significant funding milestones and technological advancements [40][41]. - **Figure AI**: Founded by Brett Adcock, the company has developed humanoid robots for commercial applications, with significant advancements in collaborative robot control [44][45]. Group 3: Future Outlook - The article concludes that the vision and persistence of technology leaders are crucial for advancing the industry, with various paths being taken towards a flexible, adaptive, and highly interactive future in embodied intelligence [54][55].
蚂蚁集团大模型数据智能算法工程师招聘(可内推)
自动驾驶之心· 2025-09-15 23:33
Core Viewpoint - The article discusses the responsibilities and requirements for a position focused on developing advanced algorithms for large model data production, emphasizing the importance of data knowledge systems, automatic classification, authoritative evaluation sets, quality assessment, and innovative solutions in the field of artificial intelligence and deep learning [1][2][3]. Group 1: Responsibilities - The role involves designing and developing algorithms to address key issues in large model data production, including data knowledge system generation, automatic corpus classification, authoritative evaluation set construction, and quality assessment of training data [1][5]. - Specific tasks include researching automatic knowledge graph generation based on LLM, developing classification algorithms, and creating standardized evaluation sets to assess model performance [1][5]. - The position also requires establishing a data-driven system for quality assessment, identifying low-quality data, and synthesizing training data to improve model performance [1][5]. Group 2: Requirements - Candidates should possess a master's degree or higher in computer science, artificial intelligence, deep learning, or related fields, and be proficient in deep learning frameworks such as PyTorch and TensorFlow [2][6]. - Strong problem-solving skills, self-motivation, and the ability to analyze and address issues are essential, along with effective communication and coordination abilities [2][6]. - Preference is given to candidates with practical experience in large model data system design, corpus classification, evaluation set construction, and data annotation algorithms [3][4][6].
VLA空间理解的能力还远未被挖掘!OccVLA的新尝试(上海期智&清华&上交等)
自动驾驶之心· 2025-09-15 23:33
❝ 自动驾驶VLA的空间理解能力,亟需新的突破。 (1)在无需昂贵人工标注的情况下,构建可用且有效的3D表示存在难度; (2)由于缺乏大规模3D视觉-语言预训练,视觉-语言模型(VLMs)中的细粒度空间细节有所丢失。 论文标题:OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision 论文链接:https://arxiv.org/abs/2509.05578 现有研究已针对这一挑战开展了大量探索(如图1(a)所示)。在基于VLM的感知流水线中,监督依赖于文本描述的3D标注(例如坐标或边界框),这类标注本质上具 有稀疏性且信息量有限。生成此类标注需要大量人工标注工作,从而限制了模型的可扩展性。如图1(b)所示,近年有部分方法尝试整合3D输入,但它们受限于两个问 题:一是缺乏大规模3D视觉-语言预训练数据,二是缺乏针对复杂空间场景的详细描述文本。这类3D VLMs通常将重点放在文本输出的监督上,却忽略了丰富的3D视觉模 态信息,因此在自动驾驶的空间理解能力方面仍有提升空间。 在这一背景下,核心挑战主要体现在两方面:(1) ...
论文解读之港科PLUTO:首次超越Rule-Based的规划器!
自动驾驶之心· 2025-09-15 23:33
在端到端自动驾驶领域,这篇文章是一个典型的"两段式网络架构"中的Planner模型,而且它不是基于BEV feature map进行下游控制任务的,而是直接对于感知输出 的结构化的信息(bbox,lanes等等)进行编码,并作为sequence token输入到decoder中,今天就为大家分享一下。二段式端到端非常适合新人练手: 为了帮助大家理解,网络架构图上我们做了详细的模块注释: 我们先整体上看一下PLUTO有哪些关键点: PLUTO主要有三个损失,主任务的损失包含回归损失和分类损失,共同组成模仿学习的损失。而Agent轨迹预测的损失如下图所示: 同时,PLUTO也添加了几个辅助的损失帮助模型收敛: 1)直击痛点,快速入门 本课程基于Just-in-Time Learning理念,通过通俗易懂的语言和案例,帮助学员短时间内掌握核心技术栈。理解关键概念后,拓展特定领域知识将变得更加轻松。 2)构建领域框架,提升研究能力 本文均出自平台最新推出的 『端到端与VLA自动驾驶小班课』 ,我们联合国内TOP主机厂算法专家共同打造! 技术专家带你深入端到端与VLA算法原理与技术开 发,目前已经正式开课! 技术栈多? ...
关于大模型和自动驾驶的一切
自动驾驶之心· 2025-09-15 23:33
Group 1 - The article emphasizes the growing interest in large model technologies, particularly in areas such as RAG (Retrieval-Augmented Generation), AI Agents, multimodal large models (pre-training, fine-tuning, reinforcement learning), and optimization for deployment and inference [1] - A community named "Large Model Heart Tech" is being established to focus on these technologies and aims to become the largest domestic community for large model technology [1] - The community is also creating a knowledge platform to provide industry and academic information, as well as to cultivate talent in the field of large models [1] Group 2 - The article describes the community as a serious content-driven platform aimed at nurturing future leaders [2]
作为研究,VLA至少提供了一种摆脱无尽corner case的可能性!
自动驾驶之心· 2025-09-15 03:56
VLA绝对是今年自动驾驶的主流关键词,下半年新势力都在抢滩VLA的高地,工业界快速量产上 车,学术界不断刷新比赛榜单。 以往,业内迭代的方案都是增加issue case删除issue case的循环,而 这种方案显然是无穷无尽的,哪怕这个方案迭代的再成熟,也难以达到我们理想中那种自驾的水 准。 相比于端到端, 利用大模型更强的泛化能力, VLA至少提供了一种摆脱无尽corner case的可能性! 然而VLA并不是那么好做的,对于一个新手或者转行的同学,开展研究蛮难受的。踩了一年坑,也 不一定能有效果。这时候,峰哥给他推荐了自动驾驶之心的1v6论文辅导。 ⼀、VLA科研论文辅导课题来啦⭐ 端到端(End-to-End)自动驾驶旨在构建一个统一的智能模型,直接将传感器原始输入(如摄像头图 像)映射到车辆的驾驶控制指令(如转向、油门、刹车),从而替代传统的多模块、级联式架构 (感知、预测、规划、控制)。这一演进过程大致可分为以下几个阶段,而VLA模型的出现正是为 了解决前序阶段的瓶颈,标志着一个新范式的开启。 刹车",而不是理解"前车减速,所以要刹车"。 泛化能力受限: 对于训练数据中未出现过的长尾 场景,模型表 ...