视觉 - 语言推理 - filings, earnings calls, financial reports, news

视觉 - 语言推理

Search documents

自动驾驶之心· 2026-01-16 07:35

来源 | 机器之心原文链接：端到端智驾新SOTA | KnowVal：懂法律道德、有价值观的智能驾驶系统点击下方卡片，关注" 自动驾驶之心 "公众号戳我-> 领取自动驾驶近30个方向学习路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球本文只做学术分享，如有侵权，联系删文一个智能驾驶系统，在迈向高阶自动驾驶的过程中，应当具备何种能力？除了基础的感知、预测、规划、决策能力，如何对三维空间进行更深入的理解？如何具备包含法律法规、道德原则、防御性驾驶原则等知识？如何进行基本的视觉 - 语言推理？如何让智能系统具备世界观和价值观？来自北京大学王选计算机研究所王勇涛团队的最新工作 KnowVal 给出了一种有效可行的方案。通过自动驾驶领域专用感知和开放式三维感知，能够抽取常见实例与长尾实例的 3D 目标检测结果与实例特征，以及面向开放世界的全场景占据栅格预测与体素特征，抽取特征保证了整个系统的特征传递与可导；同时，通过利用轻型 VLM 实现的抽象元素理解，能够对上一时间帧知识检索分支要求的信息进行补充，针对「是否是隧道、桥梁场景？是否是夜间场景？」等抽象概念进行自然语言描述。论 ...

端到端智驾新SOTA | KnowVal：懂法律道德、有价值观的智能驾驶系统

机器之心· 2026-01-14 07:18

Core Viewpoint - The article discusses the development of KnowVal, an advanced autonomous driving system that integrates perception and knowledge retrieval to enhance visual-language reasoning capabilities, aiming for higher levels of automated driving [4][21]. Group 1: System Overview - KnowVal is a novel autonomous driving system that combines perception modules with knowledge retrieval modules to achieve visual-language reasoning [4]. - The system constructs a comprehensive driving knowledge graph that includes traffic regulations, defensive driving principles, and ethical considerations, supported by an efficient retrieval mechanism based on large language models [4][15]. - KnowVal integrates a world model and a value model within its planner to ensure value-aligned decision-making [4][11]. Group 2: Technical Framework - The system employs an open 3D perception and knowledge retrieval framework, enhancing the traditional visual-language paradigm to facilitate basic visual-language reasoning [7][9]. - It utilizes specialized perception for autonomous driving and open-world 3D perception to extract both common and rare instance features, ensuring effective feature transfer throughout the system [9]. - The knowledge graph retrieval process involves natural language processing of perception data to access relevant knowledge entries, ranked by relevance [10][15]. Group 3: Value Model and Trajectory Planning - KnowVal's trajectory planning is based on a world prediction and value model, iteratively generating candidate trajectories and evaluating them against retrieved knowledge for value assessment [11][19]. - A large-scale driving value preference dataset was created to train the value model, consisting of 160,000 trajectory-knowledge pairs, which were annotated with value scores ranging from -1 to 1 [19]. Group 4: Experimental Results - The KnowVal framework was tested against baseline models GenAD, HENet++, and SimLingo, achieving the lowest collision rate on the nuScenes dataset and the highest driving score and success rate on the Bench2Drive benchmark [21]. - The results indicate that KnowVal outperforms existing end-to-end and visual-language-action models, demonstrating its effectiveness in real-world driving scenarios [21][22]. Group 5: Qualitative Analysis - The article highlights qualitative analysis examples to illustrate KnowVal's performance in adhering to legal and ethical driving behaviors, such as slowing down in wet conditions and obeying lane change regulations in tunnels [23][25].