自动驾驶之心

Search documents
清华最新SOTA!ArbiViewGen:自监督框架实现多车型任意视角可控图像生成~
自动驾驶之心· 2025-08-10 23:32
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 今天自动驾驶之心为大家分享 清华 最新的工作! ArbiViewGen:自监督框架实现多车型任意视点可控图像生成,性能达SOTA! 如 果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与 技术交流群加入 ,也欢迎添加小助理微信AIDriver005 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Yatong Lan等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 任意视角图像生成 在自动驾驶领域具有重要潜力,但由于缺乏外推视角的真实数据,这阻碍了高保真生成模型的训练,因此仍然是一个具有挑战性的任务。 在本工作中,我们提出了 ArbiViewGen ,一个基于扩散的新框架,用于从任意视角点生成可控的相机图像。为了解决未见视角中缺乏真实数据的问题,我们引入 了两个关键组件: 特征感知自适应视角拼接(FAVS) 和 跨视角一致性自监督学习(CVC-SSL) 。 FAVS 采用分层匹配策略,首先使用相机姿态建立粗略几何对应关系,然后通过改进的特征匹配算法进行细粒度对齐,并通过聚 ...
成立了一个自动驾驶求职交流群~
自动驾驶之心· 2025-08-10 23:32
Group 1 - The core viewpoint is that the autonomous driving technology stack is beginning to converge, moving away from the previously diverse approaches that required numerous algorithm engineers [1] - The emergence of unified models such as one model, VLM, and VLA indicates higher technical barriers in the industry [1] - The company aims to build a large community to support industry professionals, facilitating discussions on industry trends, company developments, product research, and job opportunities [1]
即将开课!端到端与VLA自动驾驶小班课来啦(扩散模型/VLA等)
自动驾驶之心· 2025-08-10 23:32
Core Viewpoint - End-to-End Autonomous Driving (E2E) is identified as the core algorithm for intelligent driving mass production, with significant advancements and competition emerging in the industry following the recognition of UniAD at CVPR [2][3] Group 1: E2E Autonomous Driving Overview - E2E systems directly model the relationship between sensor inputs and vehicle control information, avoiding error accumulation seen in traditional modular approaches [2] - The introduction of BEV perception has bridged gaps between modular methods, leading to a significant technological leap [2] - The emergence of various algorithms indicates that UniAD is not the ultimate solution for E2E, highlighting the rapid development in this field [2] Group 2: Learning Challenges in E2E - The fast-paced development in E2E technology has made previous educational resources inadequate, necessitating a comprehensive understanding of multiple domains such as multimodal large models, BEV perception, and reinforcement learning [3][4] - Beginners face challenges due to fragmented knowledge and the overwhelming volume of literature, often leading to abandonment before mastering the concepts [3] Group 3: Course Development - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address learning challenges, focusing on practical and theoretical integration [4][5][6] - The course aims to provide a structured framework for understanding E2E research and enhance research capabilities by categorizing papers and extracting innovative points [5] Group 4: Course Structure - The course includes five chapters covering topics from the introduction of E2E algorithms to practical applications involving RLHF fine-tuning [9][10][11][12][13] - Key areas of focus include the evolution of E2E paradigms, the significance of VLA in the current landscape, and practical implementations of diffusion models [11][12] Group 5: Expected Outcomes - Participants are expected to achieve a level equivalent to one year of experience as an E2E autonomous driving algorithm engineer, mastering various methodologies and key technologies [18] - The course aims to facilitate the application of learned concepts in real-world projects, enhancing employability in the autonomous driving sector [18]
万字长文聊具身智能“成长史”:具身智能跨越了哪些山海,又将奔向哪里
自动驾驶之心· 2025-08-10 03:31
Core Viewpoint - The article discusses the rapid advancements in embodied intelligence and robotics, emphasizing the need for robots to integrate AI with physical capabilities to perform tasks that are currently challenging for them, such as simple actions that children can do [8][9]. Group 1: Evolution of Embodied Intelligence - Over the past decade, embodied intelligence has evolved significantly, with a focus on integrating AI into robots' control systems to enhance their performance in the physical world [9]. - The gap between research prototypes and practical applications is highlighted, with a need for robots to reach a Technology Readiness Level (TRL) of 8 to 9 for industrial acceptance [10]. - Machine learning advancements, including better sensors and algorithms, have led to substantial improvements in robotics, but achieving high success rates in real-world applications remains a challenge [12][14]. Group 2: Opportunities and Challenges in Robotics - The current landscape presents both opportunities and challenges for robotics, with a focus on structured environments for initial applications before tackling more complex, unstructured settings [14][17]. - The importance of scalable learning systems in robotics is emphasized, as researchers aim to leverage data from multiple robots to enhance performance across various tasks [20]. Group 3: Specialized vs. General Intelligence - The discussion contrasts Artificial Specialized Intelligence (ASI) with Artificial General Intelligence (AGI), suggesting that while ASI focuses on high performance in specific tasks, AGI aims for broader capabilities [27][29]. - The advantages of specialized models include efficiency, robustness, and the ability to run on-premise, while general models offer greater flexibility but are more complex and costly to operate [31][35]. Group 4: Future Directions in Robotics - The emergence of visual-language-action (VLA) models, such as RT-2, represents a significant step forward in robotics, allowing for more complex task execution through remote API calls [44][46]. - The development of the RTX dataset, which includes diverse robotic data, has shown that cross-embodied models can outperform specialized models in various tasks, indicating the potential for generalization in robotics [47][48]. - The second-generation VLA models, like PI-Zero, are designed to handle continuous actions and complex tasks, showcasing advancements in robot dexterity and adaptability [49][50]. Group 5: Data and Performance in Robotics - The importance of data in achieving high performance in robotics is underscored, with a call for large-scale data collection to support the development of robust robotic systems [62][70]. - The article concludes with a discussion on the need for a balance between performance and generalization in robotics, suggesting that achieving high performance is crucial for real-world deployment [66][68].
自动驾驶前沿方案:从端到端到VLA工作一览
自动驾驶之心· 2025-08-10 03:31
Core Viewpoint - The article discusses the advancements in end-to-end (E2E) and VLA (Vision-Language Architecture) algorithms in the autonomous driving industry, highlighting their potential to enhance driving capabilities through unified perception and control modeling, despite their higher technical complexity [1][5]. Summary by Sections End-to-End Algorithms - End-to-end approaches are categorized into single-stage and two-stage methods, with the latter focusing more on joint prediction, where perception serves as input for trajectory planning and prediction [3]. - Single-stage end-to-end models include various methods such as UniAD, DiffusionDrive, and Drive-OccWorld, each emphasizing different aspects and likely to be optimized by combining their strengths in production [3][37]. VLA Algorithms - VLA extends the capabilities of large models to enhance scene understanding in production models, with internal discussions on language models as interpreters and various algorithm summaries for modular and unified end-to-end VLA [5][45]. - The community has compiled over 40 technical routes, facilitating quick access to industry applications, benchmarks, and learning pathways [7]. Community and Resources - The community provides a platform for knowledge exchange among members from renowned universities and leading companies in the autonomous driving sector, offering resources such as open-source projects, datasets, and learning routes [19][35]. - A comprehensive technical stack and roadmap for beginners and advanced researchers are available, covering various aspects of autonomous driving technology [12][15]. Job Opportunities and Networking - The community has established job referral mechanisms with multiple autonomous driving companies, encouraging members to connect and share job opportunities [10][17]. - Regular discussions on industry trends, research directions, and practical applications are held, fostering a collaborative environment for learning and professional growth [20][83].
二段式SOTA!港科大FiM:从Planning的角度重新思考轨迹预测
自动驾驶之心· 2025-08-09 16:03
Core Insights - The article presents a novel approach to trajectory prediction in autonomous driving, emphasizing a "First Reasoning, Then Forecasting" strategy that integrates intention reasoning to enhance prediction accuracy and reliability [2][4][48]. Group 1: Methodology - The proposed method introduces an intention reasoner based on a query-centric Inverse Reinforcement Learning (IRL) framework, which captures the behavior of traffic participants and their intentions in a compact representation [2][6][48]. - A bidirectional selective state space model (Bi-Mamba) is developed to improve trajectory decoding, effectively capturing the sequential dependencies of trajectory states [7][9][48]. - The framework utilizes a grid-level graph to represent the driving context, allowing for efficient modeling of participant behavior and intentions [5][6][20]. Group 2: Experimental Results - Extensive experiments on large datasets such as Argoverse and nuScenes demonstrate that the proposed method significantly enhances prediction confidence and achieves competitive performance compared to state-of-the-art models [9][34][38]. - In the Argoverse 1 dataset, the proposed method (FiM) outperformed several strong baseline methods in key metrics such as Brier score and minFDE6, indicating its robust predictive capabilities [34][35]. - The results from Argoverse 2 further validate the effectiveness of the intention reasoning strategy, showing that longer-term intention supervision improves prediction reliability [36][37]. Group 3: Challenges and Innovations - The article highlights the inherent challenges in modeling intentions due to the complexity of driving scenarios, advocating for the use of large reasoning models (LRMs) to enhance intention inference [5][6][12]. - The integration of a dense occupancy grid map (OGM) prediction head is introduced to model future interactions among participants, which enhances the overall prediction performance [7][25][41]. - The study emphasizes the importance of intention reasoning in motion prediction, establishing a promising baseline for future research in trajectory prediction [48].
自动驾驶之心实习生招聘来啦!欢迎加入我们~
自动驾驶之心· 2025-08-09 16:03
目前自动驾驶和具身智能两个方向我们已经和业内主流的公司及相关高校建立起深度的合作,大模型方向 也正在快速搭建。我们不止聚焦在技术本身,更愿意和大家一起共创整个AI领域,分享认知成长的喜悦。 对于热门事件,我同样希望我们提供全网独一份的内容价值。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 大家好,我们是自动驾驶之心/具身智能/大模型之心Tech团队。非常高兴在这里和你相遇,如果你也认同技 术内容可以改变世界,那你可能就是我们在找的人! 我们在做什么? 我们希望通过技术内容连接学术界和工业界,成为企业和学校沟通的桥梁,更乃至数十万的AI开发者和创 业者。我们致力于为大家带来全网最新最权威的技术信息,团队聚焦在自动驾驶、具身智能、大模型等AI 最前沿的技术领域,涵盖学术论文解读、业内量产方案分析、大模型评测、商业动态、行业招聘、开源项 目等,并通过公众号、社群、视频号、知乎、小红书、B站等平台进行内容分享、粉丝交流及企业联系。 工作时间: 不积跬步无以至千里,我们深知一个人的力量是有限的,所以我们期待更多优秀的小伙伴与我们一起同行~ 内容运营 - 实习生 ...
自动驾驶二十年,这个自动驾驶黄埔军校一直在精打细磨...
自动驾驶之心· 2025-08-09 16:03
Core Viewpoint - The article emphasizes the ongoing evolution and critical phase of the autonomous driving industry, highlighting the transition from modular approaches to end-to-end/VLA methods, and the community's commitment to fostering knowledge and collaboration in this field [2][4]. Group 1: Industry Development - Since Google's initiation of autonomous driving technology research in 2009, the industry has progressed significantly, now entering a crucial phase of development [2]. - The community aims to integrate intelligent driving into daily transportation, reflecting a growing expectation for advancements in autonomous driving capabilities [2]. Group 2: Community Initiatives - The community has established a knowledge-sharing platform, offering resources across various domains such as industry insights, academic research, and job opportunities [2][4]. - Plans to enhance community engagement include monthly online discussions and roundtable interviews with industry and academic leaders [2]. Group 3: Educational Resources - The community has compiled over 40 technical routes to assist individuals at different levels, from beginners to those seeking advanced knowledge in autonomous driving [4][16]. - A comprehensive entry-level technical stack and roadmap have been developed for newcomers to the field [9]. Group 4: Job Opportunities and Networking - The community has established internal referral mechanisms with multiple autonomous driving companies, facilitating job placements for members [7][14]. - Continuous job sharing and networking opportunities are provided to create a complete ecosystem for autonomous driving professionals [14][80]. Group 5: Research and Technical Focus - The community has gathered extensive resources on various research areas, including 3D target detection, BEV perception, and multi-sensor fusion, to support practical applications in autonomous driving [16][30][32]. - Detailed summaries of cutting-edge topics such as end-to-end driving, world models, and visual language models (VLM) have been compiled to keep members informed about the latest advancements [34][40][42].
某具身智能创始人“身兼数职”
自动驾驶之心· 2025-08-09 16:03
Core Viewpoint - The article discusses the current state of investment in embodied intelligence companies, highlighting the dual roles of founders who often maintain academic positions while running startups, raising questions about their commitment to the entrepreneurial venture [5][6]. Group 1: Investment Trends - This year has seen a surge in investment in embodied intelligence, with significant capital flowing into these companies, often amounting to hundreds of millions [5][6]. - Founders of these companies frequently hold dual roles, such as being assistant professors at prestigious universities, which raises concerns about their full commitment to their startups [6]. Group 2: Founder Dynamics - Many founders are described as "multi-tasking," engaging in various roles including consulting for automotive companies and publishing papers, which can lead to a lack of focus on their primary business [5][6]. - The article notes that some founders, despite their academic accolades, may lack the practical experience necessary for the high-pressure environment of production, leading to a disconnect between their academic background and industry demands [7]. Group 3: Industry Challenges - The transition from academia to industry can be challenging, with some academics struggling to adapt to the rigorous demands of production, resulting in a shift in their professional demeanor [7]. - The article suggests that the current phase of embodied intelligence is still in the early stages, characterized by storytelling and presentations rather than tangible product development [7].
自动驾驶论文速递 | 端到端、分割、轨迹规划、仿真等~
自动驾驶之心· 2025-08-09 13:26
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 DRIVE 约束感知自动驾驶的动态规则推断与验证评估框架 斯坦福大学和微软提出了 DRIVE 框架,通过动态规则推断和验证评估技术,实现了自动驾驶中概率软约束 的学习与规划集成,在 inD、highD 和 RoundD 数据集上达成 0.0% 软约束违反率,并显著提升轨迹平滑性 与泛化能力。 主要贡献: 算法框架: 实验结果: 可视化: 论文标题:DRIVE: Dynamic Rule Inference and Verified Evaluation for Constraint-Aware Autonomous Driving 论文链接:https://arxiv.org/abs/2508.04066 代码:https://github.com/genglongling/DRIVE 1. 提出 DRIVE 框架,通过指数族似然建模从专家驾驶演示中学习概率性软约束,克服了传统方法依赖固 定约束形式或纯奖励建模的局限,实现了动态规则推理与轨迹级决策的紧密耦合。 2. 将学习到的约束分布嵌入凸优化规划模块,生成 ...