Workflow
世界模型
icon
Search documents
“反击”马斯克,奥特曼说OpenAI有“好得多”的自动驾驶技术
3 6 Ke· 2025-07-07 00:32
Group 1: Conflict Between OpenAI and Tesla - The conflict between OpenAI CEO Sam Altman and Tesla CEO Elon Musk has become a hot topic in Silicon Valley, with Musk accusing Altman of deviating from OpenAI's original mission after its commercialization [1] - Musk has filed a lawsuit against Altman for allegedly breaching the founding agreement, while also establishing xAI to compete directly with OpenAI [1] - Altman has countered Musk's claims by revealing emails that suggest Musk attempted to take control of OpenAI and has been obstructing its progress since being denied [1] Group 2: OpenAI's Autonomous Driving Technology - Altman has hinted at new technology that could enable self-driving capabilities for standard cars, claiming it to be significantly better than current approaches, including Tesla's Full Self-Driving (FSD) [3][4] - However, Altman did not provide detailed information about this technology or a timeline for its development, indicating that it is still in the early stages [5] - The technology is believed to involve OpenAI's Sora video software and its robotics team, although OpenAI has not previously explored autonomous driving directly [6][7] Group 3: Sora and Its Implications for Autonomous Driving - Sora, a video generation model released by OpenAI, can create high-fidelity videos based on text input and is seen as a potential tool for simulating and training autonomous driving systems [10] - While Sora's generated videos may not fully adhere to physical principles, they could still provide valuable data for training models, particularly in extreme scenarios [10][11] - The concept of "world models" in autonomous driving aligns with Sora's capabilities, as it aims to help AI systems understand the physical world and improve driving performance [11][21] Group 4: OpenAI's Investments and Collaborations - OpenAI has made investments in autonomous driving companies, such as a $5 million investment in Ghost Autonomy, which later failed, and a partnership with Applied Intuition to integrate AI technologies into modern vehicles [12][15] - The collaboration with Applied Intuition focuses on enhancing human-machine interaction rather than direct autonomous driving applications [15] - OpenAI's shift towards multi-modal and world models indicates a strategic expansion into spatial intelligence, which could eventually benefit autonomous driving efforts [16][24] Group 5: Industry Perspectives on AI and Autonomous Driving - Experts in the AI field, including prominent figures like Fei-Fei Li and Yann LeCun, emphasize the need for AI to possess a deeper understanding of the physical world to effectively drive vehicles [19][20] - NVIDIA's introduction of the Cosmos world model highlights the industry's focus on creating high-quality training data for autonomous systems, which could complement OpenAI's efforts [22][24] - The autonomous driving market is recognized as a multi-trillion-dollar opportunity, making it a critical area for competition between companies like OpenAI and Tesla [24]
自动驾驶黄埔军校,一个死磕技术的地方~
自动驾驶之心· 2025-07-06 12:30
Core Viewpoint - The article discusses the transition of autonomous driving technology from Level 2/3 (assisted driving) to Level 4/5 (fully autonomous driving), highlighting the challenges and opportunities in the industry as well as the evolving skill requirements for professionals in the field [2]. Industry Trends - The shift towards high-level autonomous driving is creating a competitive landscape where traditional sensor-based approaches, such as LiDAR, are being challenged by cost-effective vision-based solutions like those from Tesla [2]. - The demand for skills in reinforcement learning and advanced perception algorithms is increasing, leading to a sense of urgency among professionals to upgrade their capabilities [2]. Talent Market Dynamics - The article notes a growing anxiety among seasoned professionals as they face the need to adapt to new technologies and methodologies, while newcomers struggle with the overwhelming number of career paths available in the autonomous driving sector [2]. - The reduction in costs for LiDAR technology, exemplified by Hesai Technology's price drop to $200 and BYD's 70% price reduction, indicates a shift in the market that requires continuous learning and adaptation from industry professionals [2]. Community and Learning Resources - The establishment of the "Autonomous Driving Heart Knowledge Planet" aims to create a comprehensive learning community for professionals, offering resources and networking opportunities to help individuals navigate the rapidly changing landscape of autonomous driving technology [7]. - The community has attracted nearly 4,000 members and over 100 industry experts, providing a platform for knowledge sharing and career advancement [7]. Technical Focus Areas - The article outlines several key technical areas within autonomous driving, including end-to-end driving systems, perception algorithms, and the integration of AI models for improved performance [10][11]. - It emphasizes the importance of understanding various subfields such as multi-sensor fusion, high-definition mapping, and AI model deployment, which are critical for the development of autonomous driving technologies [7].
最新综述:从物理仿真和世界模型中学习具身智能
自动驾驶之心· 2025-07-05 13:41
Core Viewpoint - The article focuses on the advancements in embodied intelligence within robotics, emphasizing the integration of physical simulators and world models as crucial for developing robust embodied intelligence [3][5]. Group 1: Embodied Intelligence and Robotics - Embodied intelligence is highlighted as a key area of research, emphasizing the importance of physical interaction with the environment for perception, action, and cognition [5]. - The article discusses the necessity for a scientific and reasonable grading system for robotic intelligence, especially in dynamic and uncertain environments [5][6]. - A proposed grading model for intelligent robots includes five progressive levels (IR-L0 to IR-L4), covering autonomy and task handling capabilities [6][10]. Group 2: Grading System for Intelligent Robots - The grading system categorizes robots based on their task execution capabilities, decision-making depth, interaction complexity, and ethical cognition [7][10]. - Key dimensions for grading include autonomy, task processing ability, environmental adaptability, and social cognition [11]. Group 3: Physical Simulators and World Models - The article reviews the complementary roles of physical simulators and world models in enhancing robot autonomy, adaptability, and generalization capabilities [3][72]. - A resource repository is maintained to provide comprehensive insights into the development of embodied AI systems and future challenges [3]. Group 4: Key Technologies and Trends - The advancements in robotics include the integration of various technologies such as model predictive control, reinforcement learning, and imitation learning to enhance robot capabilities [24][25]. - The article discusses the evolution of world models, which simulate real-world dynamics and improve the robustness of robotic systems [45][60]. Group 5: Future Directions and Challenges - Future directions include the development of structured world models, multi-modal integration, and lightweight models for efficient inference [73][72]. - The challenges faced by the industry include high-dimensional perception, causal reasoning, and real-time processing requirements [71][73].
本来决定去具身,现在有点犹豫了。。。
自动驾驶之心· 2025-07-05 09:12
Core Insights - The article discusses the evolving landscape of embodied intelligence, highlighting its transition from a period of hype to a more measured approach as the technology matures and is not yet at a productivity stage [2]. Group 1: Industry Trends - Embodied intelligence has gained significant attention over the past few years, but the industry is now recognizing that it is still in the early stages of development [2]. - There is a growing demand for skills in multi-sensor fusion and robotics, particularly in areas like SLAM and ROS, which are crucial for engaging with embodied intelligence [3][4]. - Many companies in the robotics sector are rapidly developing, with numerous startups receiving substantial funding, indicating a positive outlook for the industry in the coming years [3][4]. Group 2: Job Market and Skills Development - The job market for algorithm positions is competitive, with a focus on cutting-edge technologies such as end-to-end models, VLA, and reinforcement learning [3]. - Candidates with a background in robotics and a solid understanding of the latest technologies are likely to find opportunities, especially as traditional robotics remains a primary product line [4]. - The article encourages individuals to enhance their technical skills in robotics and embodied intelligence to remain competitive in the job market [3][4]. Group 3: Community and Resources - The article promotes a community platform that offers resources for learning about autonomous driving and embodied intelligence, including video courses and job postings [5]. - The community aims to gather a large number of professionals and students interested in smart driving and embodied intelligence, fostering collaboration and knowledge sharing [5]. - The platform provides access to the latest industry trends, technical discussions, and job opportunities, making it a valuable resource for those looking to enter or advance in the field [5].
想清楚再动手:具身智能也要学会脑补未来和择优执行 | RSS 2025
机器之心· 2025-07-05 05:53
Core Viewpoint - The article discusses the development of a new framework called FOREWARN, which combines world models and multimodal language reasoning to enhance the deployment intelligence of robotic systems, enabling them to make real-time decisions without additional data collection [5][21]. Group 1: Research Background - The first author, Wu Yilin, is a second-year PhD student at Carnegie Mellon University, focusing on object manipulation and lifelong learning in robotics [1]. - The second author, Tian Ran, is a PhD candidate at UC Berkeley and a research scientist at NVIDIA, working on the safe and reliable application of foundational models in robotics [2]. Group 2: Challenges in Deployment Intelligence - Current embodied intelligence models often struggle in real-world deployments due to their inability to adapt to environmental disturbances and user preference variations, leading to execution failures [3][21]. - The two main challenges in deployment are predicting the future consequences of actions and evaluating the predicted outcomes against task goals and user preferences [8][10]. Group 3: FOREWARN Framework - The FOREWARN framework consists of two modules: Foresight (simulating future outcomes) and Forethought (evaluating those outcomes), allowing for a more structured decision-making process [11]. - The system uses a world model to predict environmental changes based on candidate actions and employs a fine-tuned multimodal language model to interpret these predictions semantically [12][18]. Group 4: Innovation Highlights - The framework achieves cross-modal alignment between the world model's predictions and the language model's understanding, facilitating a closed-loop reasoning process from perception to decision-making [18]. - FOREWARN automates the decision-making process, significantly reducing deployment barriers and labor costs by enabling real-time selection of optimal action plans [19]. Group 5: Performance Evaluation - The introduction of the FOREWARN framework improved the success rate of robotic tasks from below 30% to 70%-80%, demonstrating its effectiveness in adapting to changing task instructions and user preferences [21]. - Even under varying conditions, the system maintained a success rate of 60%-80%, showcasing its robustness and adaptability [21]. Group 6: Future Directions - The research team identifies three challenges for broader application: enhancing the diversity and generalization of underlying strategies, addressing data scarcity issues, and optimizing reasoning efficiency and computational costs [23]. - The ongoing advancements in multimodal language models and world models are expected to further enhance the deployment intelligence of robots, enabling them to autonomously select safe and reasonable operational plans based on natural language instructions [23].
750城市+5000小时第一人称视频,上海AI Lab开源面向世界探索高质量视频数据集
量子位· 2025-07-05 04:03
Core Viewpoint - The Sekai project aims to create a high-quality video dataset that serves as a foundation for interactive video generation, visual navigation, and video understanding, emphasizing the importance of high-quality data in building world models [1][2]. Group 1: Project Overview - The Sekai project is a collaborative effort involving institutions like Shanghai AI Lab, Beijing Institute of Technology, and Tokyo University, focusing on world exploration through a continuously iterated high-quality video dataset [2]. - The dataset includes over 5000 hours of first-person walking and drone footage from more than 750 cities across 101 countries, featuring detailed labels such as text descriptions, location, weather, time, crowd density, scene type, and camera trajectory [2][10]. Group 2: Dataset Composition - Sekai consists of two complementary datasets: Sekai-Real, which focuses on real-world videos sourced from YouTube, and Sekai-Game, which includes high-fidelity game footage [3]. - Sekai-Real was created from over 8600 hours of YouTube videos, ensuring a minimum resolution of 1080P and a frame rate above 30FPS, with all videos published within the last three years [3][5]. - Sekai-Game was developed using over 60 hours of gameplay from the high-fidelity game "Lushfoil Photography Sim," capturing realistic lighting effects and consistent image formats [3][9]. Group 3: Data Processing and Quality Control - The data collection process involved gathering 8623 hours of video from YouTube and over 60 hours from games, followed by a preprocessing phase that resulted in 6620 hours of Sekai-Real and 40 hours of Sekai-Game [5][6]. - Video annotation for Sekai-Real utilized large visual language models for efficient labeling, while the dataset underwent rigorous quality control measures, including brightness assessment and video quality scoring [7][8]. - The final dataset features segments ranging from 1 minute to nearly 6 hours, with an average length of 18.5 minutes, and includes structured location information and detailed content classification [10]. Group 4: Future Goals - The Sekai team aims to leverage this dataset to advance world modeling and multimodal intelligence, supporting applications in world generation, video understanding, and autonomous navigation [10].
最新综述:从物理模拟器和世界模型中学习具身智能
具身智能之心· 2025-07-04 09:48
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Xiaoxiao Long等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 出发点与工作背景 本综述聚焦具身智能在机器人研究中的前沿进展,指出实现强大具身智能的关键在于物理模拟器与世界模 型的整合。物理模拟器提供可控高保真环境用于训练评估机器人智能体,世界模型则赋予机器人环境内部 表征能力以支持预测规划与决策。 文中系统回顾了相关最新进展,分析了两者在增强机器人自主性、适应性和泛化能力上的互补作用,探讨 了外部模拟与内部建模的相互作用以弥合模拟训练与现实部署的差距。此外,还提及维护了一个包含最新 文献和开源项目的资源库,网址为https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey, 旨在为具身 AI 系统的发展提供全面视角并明确未来挑战。 一些介绍 随着人工智能与机器人技术的发展,智能体与物理世界的交互成为研 ...
小米社招&校招 | 自动驾驶与机器人具身智能算法研究员 (VLA方向)
具身智能之心· 2025-07-03 13:36
职位描述 我们正在寻找一位杰出的研究员/科学家,加入我们的前沿探索团队,共同定义和构建下一代自动驾驶与机器人 的"大脑"。您将致力于突破性的具身基座模型 (Embodied Foundation Model) 的研究,该模型将深度融合视觉-语 言-行动 (VLA) 能力,并具备卓越的空间感知与空间推理能力。 核心职责包括 前沿算法研究与构建:负责设计和实现领先的具身多模态大模型。您的研究将不仅限于现有的VLA框架,更将 探索如何构建能够理解复杂三维世界、并进行长时序、多步骤任务规划的世界模型 (World Model)。 核心模型能力攻关:主导模型在以下关键能力上的突破: 多模态场景理解:融合视觉、语言、雷达等多源信息,实现对动态、开放环境的深刻理解和空间感知。 学习与适应机制:深入研究强化学习 (RL)、模仿学习 (IL) 及自监督学习方法,使模型能从海量数据和与环境的 交互中持续学习和进化。 技术愿景与路线图:主导构建可泛化、高效率的具身智能基座模型,为未来1-3年的技术演进提供核心支撑,并 探索其在自动驾驶和通用机器人领域的统一应用潜力。 复杂语义推理与决策:让模型能够理解模糊、抽象的人类指令,并结合对 ...
首次!世界模型、动作模型融合,全自回归模型WorldVLA来了
机器之心· 2025-07-03 08:01
Core Viewpoint - Alibaba's Damo Academy has introduced WorldVLA, a model that integrates World Model and Action Model into a unified autoregressive framework, enhancing understanding and generation across text, images, and actions [1][4]. Summary by Sections Research Overview - The development of Vision-Language-Action (VLA) models has become a significant focus in robotic action modeling, typically built on large-scale pretrained multimodal language models (MLLMs) with added action output capabilities [4]. - Existing VLA models often lack a deep understanding of actions, treating them merely as output rather than analyzing them as input [5]. Model Description - WorldVLA addresses the limitations of both VLA and World Models by using a unified autoregressive mechanism for action and image understanding and generation [5][10]. - It employs three independent encoders for processing images, text, and action data, sharing the same vocabulary to facilitate cross-modal tasks [12]. Mechanism and Strategy - The World Model component generates visual representations based on input actions, learning the physical dynamics of the environment, while the Action Model enhances visual understanding [7]. - An action attention masking strategy is introduced to mitigate error accumulation during the generation of multiple actions, significantly improving performance in action chunking tasks [8][14]. Experimental Results - In the LIBERO benchmark, WorldVLA achieved a 4% improvement in grasp success rate compared to traditional action models and a 10% reduction in Fréchet Video Distance (FVD) compared to traditional world models [8]. - The introduction of the attention mask strategy led to a performance improvement in grasp success rates ranging from 4% to 23% in action chunking tasks [8]. Comparative Analysis - WorldVLA outperformed other models in various metrics, demonstrating its effectiveness in integrating action and world modeling [18]. - The model's ability to generate the next frame based on actions and images showcases its advanced capabilities in visual prediction [24].
中国汽车的“爷爷”长啥样?70年变迁,竟然只在一瞬间!
电动车公社· 2025-07-02 15:59
Core Viewpoint - The article emphasizes the evolution of the Chinese automotive industry, highlighting its journey from manual craftsmanship to becoming the world's largest producer and exporter of automobiles, and the current advancements in technology and culture within the sector [1]. Group 1 - The Beijing Automobile Museum serves as a platform to reflect on the history of Chinese automotive development and its cultural roots [1]. - The article mentions the significance of national-level models in understanding the progress of the automotive industry in China [1]. - There is a focus on the future of new energy vehicles and the direction of automotive culture in China [1]. Group 2 - The article introduces recent vehicle launches, specifically mentioning the Xiaopeng G7, indicating ongoing innovation in the market [3]. - It discusses the new national standards for batteries, suggesting regulatory changes that could impact the industry [3]. - The concept of world models and the underlying logic of AI and intelligent driving are explored, indicating a shift towards advanced technology in automotive operations [3].