时空推理
Search documents
速递|获1.34亿美元巨额种子轮,General Intuition利用电子游戏,训练智能体空间推理能力
Z Potentials· 2025-10-17 03:04
Core Insights - General Intuition, a startup spun off from Medal, is leveraging a vast library of gaming videos to train AI models capable of understanding object and entity movement in space and time, a concept known as spatiotemporal reasoning [2] - The company has successfully raised $133.7 million in seed funding led by Khosla Ventures and General Catalyst, with participation from Raine [3] - General Intuition aims to expand its team focused on training general intelligence agents that can interact with their environment, initially applying this technology in gaming and search-and-rescue drone fields [5] Funding and Growth - The startup's significant funding will be used to grow its research engineering team dedicated to developing general intelligence agents [5] - The company has made breakthroughs in creating models that can understand untrained environments and predict behaviors using only visual inputs [5] Technology and Applications - General Intuition's next milestones include generating new simulated worlds for training other agents and enabling autonomous navigation in unfamiliar physical environments [6] - Unlike competitors that focus on building world models for agent training, General Intuition is concentrating on applications that avoid copyright issues [6][7] Strategic Focus - The company is not aiming to compete with game developers but rather to create adaptable robots and non-player characters that can adjust to various difficulty levels, maximizing player engagement and retention [8] - The founders believe that the core capability of spatiotemporal reasoning is essential for achieving artificial general intelligence (AGI), which requires abilities that large language models (LLMs) lack [8][9]
理想OmniReason: 更像人的VLA决策框架
理想TOP2· 2025-09-07 12:09
Core Insights - The article discusses the launch of OmniReason, a framework designed to enhance the intelligence and reliability of autonomous driving systems by integrating temporal-guided vision-language-action (VLA) capabilities [1][2]. Group 1: Innovation Highlights - OmniReason's primary breakthrough is the transformation of the decision-making process in autonomous driving from static perception to dynamic spatiotemporal reasoning, enabling the system to understand changes and generate decisions akin to human logic [2]. - The framework incorporates a closed-loop system that infuses human driving knowledge and temporal causal chains into the model through knowledge distillation, ensuring that autonomous behavior is safe, reliable, and interpretable [2]. Group 2: Key Contributions - Two spatiotemporal VLA datasets, OmniReason-nuScenes and OmniReason-Bench2Drive, have been released, featuring dense spatiotemporal annotations and natural language causal explanations, offering broader coverage compared to existing datasets like DRAMA and DriveLM [3]. - The OmniReason-Agent model architecture has been developed, integrating a sparse temporal memory module to continuously interpret scene changes and generate human-readable decision rationales [3]. - A unique spatiotemporal knowledge distillation method has been proposed, effectively transferring the spatiotemporal causal reasoning patterns from the datasets to the Agent model, internalizing human decision logic [3]. Group 3: Technical Framework - The framework consists of OmniReason-Data, which focuses on high-quality data construction, and OmniReason-Agent, which serves as the execution model [4]. Group 4: OmniReason-Data - The goal is to address the lack of temporal and causal dimensions in existing datasets, creating a data foundation that teaches the model to "think" [5]. - A three-step automated annotation process is employed to ensure high-quality, physically realistic data while effectively mitigating hallucination issues [6]. Group 5: OmniReason-Agent - The objective is to build an end-to-end autonomous driving model that utilizes high-quality data for interpretable, temporally aware decision-making [7]. - The architecture includes three main modules: environmental perception and temporal memory, VLM reasoning core, and knowledge distillation, which collectively enhance decision-making reliability and transparency [7]. Group 6: Experimental Results - In open-loop trajectory planning tasks, the OmniReason-Agent achieved an average L2 distance error of 0.34 meters, matching the best-performing ORION method, with a collision rate of 0.40% and a violation rate of 3.18%, setting new state-of-the-art (SOTA) records [8]. - The model also excelled in visual question answering (VQA) tasks, showing significant improvements in CIDEr and BLEU-4 metrics on the OmniReason-nuScenes dataset [8]. - Testing on the third-party OmniDrive dataset demonstrated superior performance across all evaluation metrics compared to existing models, reaffirming the framework's advanced architecture and robustness [8].