视觉语言模型(VLM)

Search documents
NIPS 2025 MARS 多智能体具身智能挑战赛正式启动!
具身智能之心· 2025-08-18 00:07
点击下方 卡片 ,关注" 具身智能 之心 "公众号 多智能体协作,全新具身智能挑战,共探机器人的未来。推动具身智能体迈向通用自主协作的新高度! 具身智能新挑战 随着机器人系统不断走向现实世界,单一智能体已难以胜任复杂多变的任务场景。如何实现具身智能体之 间的高效协同,正成为下一代通用型自主系统的核心难题。多具身智能体系统(如人形机器人、四足机器 人、机械臂)正在成为实现通用自主的关键力量。它们需要在复杂环境中,不仅能制定高层任务计划,还 能稳健执行精细操作。然而,面对异构机器人、不同感知能力与部分可观测性,如何实现从符号规划到连 续控制的跨层次协作,依然是一个亟待突破的难题。 围绕多智能体具身智能的研究难点,通过解耦"规划"与"控制"两大环节,两个互补赛道将分别探索 视觉语 言模型(VLM)的规划能力 和 端到端模型在多智能体具身智能协调控制 中的潜力,全面评估智能体在复 杂任务中的水平。同时,本次比赛预计将在 NeurIPS 2025 的SpaVLE Workshop上公布比赛结果与颁布奖 励。参赛选手将有机会赢得奖金并共同撰写比赛报告。 赛道1:多智能体具身规划(Planning) 面向异构机器人协同配合 ...
自动驾驶前沿方案:从端到端到VLA工作一览
自动驾驶之心· 2025-08-10 03:31
最近很多同学咨询柱哥端到端和VLA的方案,相比于模块化方法,统一的感知、规控建模会带来更高的智驾能力上限,这些方案的技术难度上也更大。今天自动 驾驶之心汇总了行业里面参考最多的端到端和VLA算法。 更多内容欢迎移步自动驾驶之心知识星球,一个交流技术和方案的地方。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 端到端思维导图一览。整体来说,端到端分为一段式和二段式两个大的方向,二段式更偏联合预测一些,感知作为输入,模型聚焦于自车的轨迹规划和他车的轨 迹预测。而一段式端到端则是建模从传感器输入到自车轨迹的输出的过程。具体又可以细分为基于感知的一段式端到端(UniAD)、基于扩散模型的一段式端到 端(DiffusionDrive)、基于世界模型的一段式端到端(Drive-OccWorld),这些方法各有侧重点,量产中会结合各种方法的优点进行模型优化。 而VLA则是VLM+E2E的延伸,期望通过大模型的能力赋予量产模型更高的场景理解能力,目前星球内部梳理了语言模型作为解释器的相关工作、模块化VLA汇 总、统一端到端VLA梳理及推理增强VLA的诸多算法汇总。星友们 ...
DriveBench:VLM在自动驾驶中真的可靠吗?(ICCV'25)
自动驾驶之心· 2025-08-07 23:32
戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 点击按钮预约直播 视觉语言模型(VLM)的最新进展激发了人们将其应用于自动驾驶的兴趣,尤其是通过自然语言生成可解释的驾驶决策。然而关于VLM是否能为驾驶 提供基于视觉的、可靠的且可解释的解释,这一假设在很大程度上尚未得到验证。为填补这一空白,我们推出了DriveBench,这是一个基准数据集, 旨在评估VLM在17种设置下的可靠性,包含19,200帧、20,498个问答对、三种问题类型、四种主流驾驶任务以及总共12个流行的VLM。 自动驾驶之心很荣幸邀请到加州大学尔湾分校在读博士生 - 谢少远,为大家分享介绍这篇ICCV 2025中稿的DriveBench。 一个专为自动驾驶设计的视 觉语言模型(VLMS)基准测试框架,旨在评估VLMs在不同环境和任务下的可靠性。DriveBench涵盖感知、预测、规划和行为四大核心任务,并引入 15 种OoD类型,以系统性测试VLMs 在复杂驾驶场景中的可靠性。 今天上午十一点,锁定自动驾驶之心直播间,我们不见不散~ 论文标 ...
4000人了,我们搭建了一个非常全栈的自动驾驶社区!
自动驾驶之心· 2025-08-03 00:33
Core Viewpoint - The article discusses the current state and future prospects of the autonomous driving industry, highlighting the shift towards embodied intelligence and large models, while questioning whether traditional autonomous driving technologies are becoming obsolete [2][3]. Group 1: Industry Perspectives - Some professionals have transitioned away from autonomous driving, believing that the technology stack has become homogenized, with only end-to-end and large models remaining viable [2]. - Those still observing the field are reluctant to leave their current high-paying jobs and lack reliable resources in the embodied intelligence sector [3]. - Many individuals remain committed to the autonomous driving field, viewing it as the most promising path towards achieving general embodied intelligence [3]. Group 2: Industry Challenges - The current state of mass production in autonomous driving is perceived as somewhat chaotic, with existing solutions not yet fully refined before new ones are rushed to market [3]. - The article suggests that the past hype around autonomous driving may have been beneficial, allowing for a more focused approach to solidifying mass production capabilities [3]. Group 3: Future Directions - The future of mass production in autonomous driving is expected to be unified, multi-modal, and end-to-end, requiring full-stack talent who are knowledgeable in perception, planning, prediction, and large models [3]. - The community aims to bridge the gap between academia and industry, facilitating communication and collaboration to advance the field [3][6]. Group 4: Community Initiatives - The "Autonomous Driving Heart" knowledge platform has created a comprehensive ecosystem for sharing academic and industrial insights, including job opportunities and technical resources [5][12][14]. - The platform has organized various resources, including over 40 technical routes and numerous open-source projects, to assist both newcomers and experienced professionals in the field [5][15][16]. Group 5: Educational Resources - The community provides a well-structured entry-level technical stack and roadmap for beginners, as well as valuable industry frameworks and project plans for those already engaged in research [10][12]. - Continuous job postings and sharing of opportunities are part of the community's offerings, aimed at building a complete ecosystem for autonomous driving [14].
资料汇总 | VLM-世界模型-端到端
自动驾驶之心· 2025-07-12 12:00
Core Insights - The article discusses the advancements and applications of visual language models (VLMs) and large language models (LLMs) in the field of autonomous driving and intelligent transportation systems [1][2]. Summary by Sections Overview of Visual Language Models - Visual language models are becoming increasingly important in the context of autonomous driving, enabling better understanding and interaction between visual data and language [4][10]. Recent Research and Developments - Several recent papers presented at conferences like CVPR and NeurIPS focus on improving the performance of VLMs through various techniques such as behavior alignment, efficient pre-training, and enhancing compositionality [5][7][10]. Applications in Autonomous Driving - The integration of LLMs and VLMs is expected to enhance various tasks in autonomous driving, including object detection, scene understanding, and planning [10][13]. World Models in Autonomous Driving - World models are being developed to improve the representation and prediction of driving scenarios, with innovations like DrivingGPT and DriveDreamer enhancing scene understanding and video generation capabilities [10][13]. Knowledge Distillation and Transfer Learning - Techniques such as knowledge distillation and transfer learning are being explored to optimize the performance of vision-language models in multi-task settings [8][9]. Community and Collaboration - A growing community of researchers and companies is focusing on the development of autonomous driving technologies, with numerous resources and collaborative platforms available for knowledge sharing and innovation [17][19].