空间智能
Search documents
李飞飞3D世界模型爆火后,国内首个免费版来了:我当了回「为所欲为」的造物主
3 6 Ke· 2025-12-22 09:21
还记得前段时间在 AI 圈刷屏的李飞飞「3D 世界生成模型」吗?现在,国产版终于来了。 就在上周腾讯官宣姚顺雨加盟的新闻刷屏时,腾讯混元团队低调上线了 世界模型 1.5(TencentHY WorldPlay) ,这是 国内首个开放体验的实时世界模型 什么叫世界模型?简单说:你输入几句话或者一张图,AI 就能给你生成一个可以「走进去玩」的虚拟世界。不是那种只能看的视频,而是可以用键盘、 鼠标甚至手柄实时操控的 3D 空间。 根据首帧图片场景生成的游戏场景 这次有什么亮点 : 是不是被这些晦涩的技术名词绕晕了,APPSO 下面直接带你玩起来,来创造一些脑洞大开的「世界」。 在线体验网站:https://3d.hunyuan.tencent.com/sceneTo3D?tab=worldplay 文字→世界,体验 「 造物主 」 的快感 打开页面的第一眼,我发现界面做成了一台复古电视机的样子。回想我们小时候看电视,只能看央视放什么、湖南台播什么,遥控器怎么换台也逃不出编 排好的节目单。 但现在不需要等晚上 8 点的黄金档,不需要等导演拍完,你自己就是这个世界的总导演。想去体验过山车?打几个字,生成。想回到千禧年跨 ...
港大领衔DrivePI:统一自动驾驶理解、感知、预测和规划的空间智能4D MLLM
自动驾驶之心· 2025-12-22 09:20
Core Viewpoint - DrivePI is introduced as a novel unified spatial-aware 4D multimodal large language model (MLLM) framework that integrates coarse-grained language understanding with fine-grained 3D perception capabilities, bridging the gap between vision-based and VLA paradigms in autonomous driving [2][38]. Group 1: Project Overview - DrivePI is developed collaboratively by Hong Kong University, leading the project with contributions from companies like Huawei and universities such as Tianjin University and Huazhong University of Science and Technology [2]. - The model is designed to perform spatial understanding, 3D perception, prediction, and planning tasks through end-to-end optimization, showcasing its capability to handle complex autonomous driving scenarios [4][6]. Group 2: Technical Innovations - DrivePI incorporates a multimodal perception approach, utilizing LiDAR alongside camera images to enhance spatial understanding and provide accurate 3D geometric information [11]. - The model generates intermediate fine 3D perception and prediction representations, ensuring reliable spatial awareness and enhancing the interpretability and safety of autonomous driving systems [11]. - A rich data engine is developed to seamlessly integrate 3D occupancy and flow representations into natural language scene descriptions, allowing the model to understand complex spatiotemporal dynamics [11]. Group 3: Performance Metrics - DrivePI outperforms existing VLA models, achieving a 2.5% higher average accuracy on nuScenes-QA compared to OpenDriveVLA-7B and reducing collision rates by 70% from 0.37% to 0.11% [5][16]. - In 3D occupancy and flow prediction, DrivePI achieved 49.3% OccScore and 49.3% RayIoU, surpassing the FB-OCC method by 10.3 percentage points [15][21]. - The model demonstrated a 32% reduction in L2 error for trajectory planning compared to VAD, showcasing its effectiveness in planning tasks [16]. Group 4: Data Engine and Annotation - The data engine for DrivePI operates in three main stages, focusing on generating diverse question-answer pairs for 4D spatial understanding and planning reasoning [12][18]. - Scene understanding annotations are generated to avoid confusion in distinguishing different views, enhancing the model's ability to interpret various perspectives [18]. Group 5: Ablation Studies and Insights - Ablation studies indicate that combining text and visual heads improves performance across most tasks, demonstrating the effectiveness of unifying text understanding with 3D perception, prediction, and planning [23]. - The impact of different text data scales was explored, revealing significant improvements in occupancy state prediction accuracy when increasing the training data size [26]. Group 6: Future Prospects - DrivePI is expected to inspire future research directions in autonomous driving by enhancing the interpretability and decision-making capabilities of systems through language reasoning and detailed 3D outputs [38].
赵何娟独家对话李飞飞:“我信仰的是人类,不是AI”
Xin Lang Cai Jing· 2025-12-22 05:27
Core Insights - The article discusses the advancements in AI, particularly in the realm of "world models" and spatial intelligence, led by Professor Fei-Fei Li and her company World Labs, which is expected to see significant application-level breakthroughs within two years [2][5]. Group 1: AI Developments - Professor Fei-Fei Li's World Labs has launched the first commercial "world model" called Marble, which allows for the generation of sustainable, navigable, and geometrically consistent 3D worlds from images or text prompts [4][5]. - The concept of "world models" is becoming a competitive frontier in the industry, with companies like Google DeepMind releasing models that emphasize interactive environments and spatial understanding [5][6]. - The transition from "language generation" to "world generation" is anticipated to accelerate, with spatial intelligence expected to experience an application-level explosion in the next two years [5][6]. Group 2: Historical Context and Impact - The article reflects on the historical significance of the ImageNet project, which was pivotal in demonstrating the importance of large datasets in AI development, and how it laid the groundwork for advancements in generative AI [2][3][29]. - Li's leadership in the ImageNet initiative has been recognized as a milestone in the evolution of AI, showcasing the critical role of data alongside algorithms in enhancing AI capabilities [3][29]. Group 3: Challenges and Future Directions - The development of spatial intelligence faces a "data bottleneck," which poses challenges for the advancement of world models, as the collection of spatial data is inherently more complex than that of visual or textual data [32][37]. - Li emphasizes the need for patience in the AI field, acknowledging that while expectations for rapid advancements are high, meaningful progress often takes time [6][20]. - The article suggests that the journey towards achieving Artificial General Intelligence (AGI) is incremental, with spatial intelligence being a crucial component in this ongoing quest [25][26].
复杂空间推理新SOTA,性能提升55%!中山大学新作SpatialDreamer
具身智能之心· 2025-12-22 01:22
Core Insights - The article discusses the introduction of SpatialDreamer, a framework developed by researchers from Sun Yat-sen University and MBZUAI, which enhances complex spatial task performance through active mental imagery and spatial reasoning [1][4]. Group 1: Limitations of Current Models - Despite significant advancements in multimodal large language models (MLLMs) for scene understanding, their performance remains limited in complex spatial reasoning tasks that require psychological simulation [2]. - Existing methods primarily rely on passive observation of spatial data, lacking the unique human ability for active imagination and dynamic internal representation updates [3]. Group 2: SpatialDreamer Framework - SpatialDreamer simulates human spatial cognition through a closed-loop reasoning process consisting of three steps: exploration, imagination, and reasoning [6]. - The exploration phase involves the model determining optimal self-centered actions based on the current scene, such as "move forward 0.75 meters" or "turn left 45 degrees" [6]. - The imagination phase generates new perspective images after executing actions using a world model [6]. - The reasoning phase integrates all accumulated visual evidence to produce a final answer [6]. Group 3: GeoPO Strategy Optimization - To address the issue of sparse rewards in long-sequence reasoning tasks, the research team introduced GeoPO, a strategy optimization method combining tree sampling structures and geometric consistency constraints [8]. - The tree sampling approach allows multiple action branches at each step, supporting backtracking and multi-path exploration [8]. - A multi-level reward design merges task-level and step-level rewards to provide fine-grained feedback [8]. - A geometric penalty mechanism imposes penalties on redundant or conflicting actions, encouraging efficient trajectory generation [8]. Group 4: Performance Validation - The effectiveness of SpatialDreamer was validated across multiple spatial reasoning benchmarks, achieving state-of-the-art (SOTA) results with an average accuracy of 93.9% and 92.5% on real and synthetic images, respectively, in the SAT benchmark [13]. - In the MindCube-Tiny benchmark, it achieved an overall accuracy of 84.9%, surpassing the baseline Qwen2.5-VL-7B by over 55% [13]. - In the VSI-Bench, it outperformed in tasks such as object counting, relative direction, and path planning, with an average accuracy of 62.2% [13]. Group 5: Significance of SpatialDreamer - The significance of SpatialDreamer lies not only in improving spatial reasoning accuracy but also in demonstrating that MLLMs can enhance reasoning capabilities through "imagination," marking a significant step towards human-like spatial intelligence [14].
「一脑多形」圆桌:世界模型、空间智能在具身智能出现了哪些具体进展?丨GAIR 2025
雷峰网· 2025-12-20 04:07
Core Viewpoint - The article discusses the current state and future potential of embodied intelligence, focusing on the challenges and opportunities presented by world models and spatial intelligence in the field of robotics and AI [2][4][10]. Group 1: Development of Embodied Intelligence - The technology route for embodied intelligence is still in an exploratory phase, with no convergence yet, which is seen as a positive sign for innovation [4][3]. - There is a consensus among experts that the core issues of embodied intelligence, such as interaction and human-machine collaboration, should be addressed by academic institutions, while industries focus on practical applications [4][5]. - The integration of AI with physical entities is expected to lead to significant advancements in intelligence, but the field must avoid reverting to industrial automation without achieving generalized intelligence [4][5][30]. Group 2: World Models in Autonomous Driving - World models are currently being utilized by leading companies like Tesla to enhance data generation and improve decision-making processes through closed-loop testing [11][12]. - The concept of world models has gained traction in autonomous driving due to the simplicity of generating scenarios compared to robotics, with advancements in generative AI enabling the creation of realistic training samples [12][13]. - There is ongoing debate regarding the definition and application of world models in both autonomous driving and robotics, with differing opinions on the necessity of pixel-level reconstruction versus latent state representation [12][13][14]. Group 3: Spatial Intelligence in Robotics - Spatial intelligence is a critical aspect of robotics, with a focus on perception and understanding spatial relationships, which has evolved from traditional SLAM techniques to more learning-based approaches [20][21]. - The current challenges in spatial intelligence include the need for better data representation and understanding of complex spatial relationships, which are still underdeveloped in robotic systems [22][23]. - The integration of visual and semantic information is essential for enhancing robots' spatial capabilities, but the field is still in its early stages [22][23][24]. Group 4: Commercialization and Future Applications - The future of drone applications is expected to expand significantly, with potential uses in various sectors, but the timeline for widespread adoption remains uncertain [26][27]. - The gap between technological capabilities and market needs poses challenges for entrepreneurs, as there is often a mismatch between innovative ideas and practical industrial requirements [30][31]. - The shift towards learning-based control paradigms is anticipated to increase the applicability of drones and robots in real-world scenarios, moving beyond traditional automation [28][29].
让人工智能“睁眼看世界” 走在国际科技变革最前沿 上海量子城市建设画卷正从复兴岛展开
Jie Fang Ri Bao· 2025-12-20 00:59
Core Insights - The launch of the Global Maker Island and the 2025 Shanghai Quantum City Annual Conference marks a significant step in building smart infrastructure on Fuxing Island, aiming for a standard of 100,000 intelligent sensing facilities per square kilometer [1] - The rapid evolution of next-generation artificial intelligence technologies is set to transform urban landscapes, with Fuxing Island positioned as a key player in this transformation [2] Group 1: Artificial Intelligence and Urban Development - The Shanghai Quantum City Time-Space Innovation Base will open in December 2024, focusing on building a "world model" for artificial intelligence, which is essential for capturing technological changes [3] - The city aims to enhance AI's understanding of the physical world by creating training environments, such as the first heterogeneous humanoid robot training ground and the issuance of operational licenses for smart connected vehicles [3][4] - The complexity of urban environments and AI's ability to interpret them are central to the mission of the Shanghai Quantum City [4] Group 2: Scientific and Technological Advancements - Leading scientists are increasingly focusing on "spatial intelligence" as the next frontier for AI, which will define the development direction for the next decade [5] - The Shanghai Quantum City has already achieved significant milestones, including the release of a time-space data sharing platform and a specialized corpus for planning and natural resources [6] Group 3: Talent and Innovation Ecosystem - The construction of the Shanghai Quantum City emphasizes the importance of talent investment to gain strategic advantages in the new technological revolution [8] - Fuxing Island is actively inviting global creators to participate in its innovation ecosystem, offering a low-cost entrepreneurial environment and support for startups [9] - Currently, 12 well-known incubators and 14 innovative startups have officially settled in Fuxing Island, indicating a growing entrepreneurial landscape [10]
【金猿人物展】袋鼠云CEO宁海元:AI浪潮下,数据中台的生存与跃迁
Sou Hu Cai Jing· 2025-12-18 12:20
Core Insights - The article emphasizes the transformation of data middle platforms from mere data managers to enablers of AI capabilities, driven by the urgent need for high-quality data supply in the era of AI technology [2][3] Industry Trends - The past decade has seen a shift in data infrastructure from serving only internet companies to becoming a public infrastructure for all industries, indicating a broader application of big data [2][3] - The evolution of data platforms has moved through three phases: installation, bubble, and deployment, with the current focus on integrating AI capabilities into business processes [6][12] Company Strategy - The company has adopted a "one body, two wings" strategy, focusing on a multi-modal data intelligence platform as the core, with data intelligence and spatial intelligence as supporting wings [4][6] - The transition from traditional BI tools to Data Agents is highlighted, where the latter will serve as the primary interface for business personnel, simplifying data interaction and decision-making [15][17] Future Outlook - The future of data middle platforms is seen as a "multi-modal data operating system," which will unify governance and management of diverse data types, essential for supporting AI applications [12][14] - The concept of "world modeling" is expected to evolve, integrating big data, AI, and spatial intelligence into a cohesive methodology for real-world applications [18][19]
Xiaomi MiMo 大模型落地应用,小米“人车家全生态”合作伙伴大会介绍IoT平台生态新进展
Sou Hu Wang· 2025-12-18 10:06
Core Insights - Xiaomi successfully held the "Human-Vehicle-Home Ecosystem" Partner Conference in Beijing, showcasing its latest IoT platform capabilities and user experience innovations [1][3] Group 1: IoT Platform Progress - As of Q3, Xiaomi's IoT platform has surpassed 1 billion connected devices, reaching 1.04 billion units, with the Mi Home app achieving over 110 million monthly active users [3] - The annual shipment of Xiaomi IoT modules has exceeded 10 million units for the first time, solidifying its position as a leading global smart ecosystem platform [3] - Xiaomi has partnered with over 15,000 companies globally, including renowned brands like Miele, Bosch, Siemens, and LG, while also focusing on social responsibility initiatives [3] Group 2: Future Innovations - Xiaomi introduced the Xiaomi Miloco smart home exploration plan, which integrates visual perception into smart home systems, allowing users to create smart rules through natural language [3] - The company is collaborating with leading brain-computer interface firms to enhance interaction possibilities for individuals with mobility impairments [4] Group 3: AI and Ecosystem Integration - The IoT Future Summit 2026 highlighted the role of AI in driving innovation across the entire ecosystem, moving beyond isolated breakthroughs to a comprehensive approach [6] - Various partners presented advancements in smart solutions, emphasizing user experience improvements and seamless integration of devices [6][7] - Xiaomi's IoT platform is transitioning towards "spatial intelligence," focusing on proactive decision-making through multi-modal perception and distributed computing technologies [7][11] Group 4: User Experience Enhancements - The IoT Ecosystem Access and Experience Innovation Forum focused on the new capabilities of the Mi Home 11.0 experience, addressing user demands for comfort, safety, and energy efficiency [9] - Xiaomi upgraded its scene capabilities and 3D central control interactions, enhancing user experience for over 110 million monthly active users [9] Group 5: Technical Developments - The IoT Platform Technology Forum showcased a full-stack upgrade of Xiaomi's IoT capabilities, including the launch of the IoT-BLE 2.0 module matrix and advancements in AI-driven device interactions [11] - The forum discussed strategies for AIoT developers in the context of global trends in security and privacy compliance [11] Group 6: Exhibition Highlights - The conference featured an IoT exhibition area displaying various smart home solutions, IoT connection technologies, and the overall capabilities of Xiaomi's IoT platform [13]
接入高德,千问打通“AI干活”最后一公里
华尔街见闻· 2025-12-18 09:58
Core Viewpoint - Alibaba is strategically positioning itself in the AI landscape by integrating its services through the Qianwen app, which connects various local life services and enhances user experience by utilizing real-time data from Gaode Map [1][2][3][4]. Group 1: Integration of Services - The Qianwen app is becoming a comprehensive entry point for Alibaba's ecosystem, effectively linking its various local life services such as Taobao, Fliggy, and Ele.me [2][3]. - Qianwen's integration with Gaode Map allows it to provide real-time recommendations and planning for restaurants, travel, and more, thus enhancing user engagement [1][3][9]. - This integration addresses the previous challenge of directing traffic to Alibaba's local services, which were previously disjointed [3][16]. Group 2: AI and Real-World Application - The incorporation of Gaode Map into Qianwen signifies a shift from theoretical AI capabilities to practical applications that solve real-world problems [4][8][25]. - Qianwen's ability to process spatial data and provide actionable insights marks a significant advancement in AI's role in everyday life, moving beyond mere information output to actionable solutions [4][13][25]. - The app's features, such as visual understanding and weather-based recommendations, demonstrate its advanced capabilities in handling complex real-world scenarios [12][13]. Group 3: Competitive Advantage - Alibaba's unique position combines top-tier AI model capabilities with a robust offline service fulfillment system, creating a competitive moat that is difficult for rivals to replicate [21][23]. - Unlike competitors like OpenAI and Google, which lack comprehensive service fulfillment networks, Alibaba leverages its extensive ecosystem to provide seamless user experiences [21][22][23]. - The integration of Qianwen and Gaode Map is a strategic move that enhances Alibaba's ability to capture user intent and streamline service delivery, positioning it favorably in the AI-driven market [18][25][26].
特斯拉再一次预判潮水的方向
自动驾驶之心· 2025-12-18 09:35
Core Viewpoint - Tesla's AI leader Ashok Elluswamy revealed the technical methodology behind Tesla's Full Self-Driving (FSD) in a recent article, emphasizing the choice of an end-to-end neural network model and addressing the challenges faced in practice [4][6]. Group 1: End-to-End Neural Network Model - Tesla's decision to adopt an end-to-end neural network model is driven by the need to address complex driving scenarios that cannot be pre-defined by rules, such as the "trolley problem" and second-order effects [6][10]. - The end-to-end model is described as a complete overhaul of previous architectures, fundamentally changing design, coding, and validation processes, leading to a more human-like driving experience [11][19]. - The model outputs driving instructions alongside interpretable "intermediate results," utilizing technologies like generative Gaussian splatting to create dynamic 3D models of the environment in real-time [8][17]. Group 2: VLA and World Model Concepts - VLA (Vision-Language-Action) is an extension of the end-to-end model that incorporates language information, allowing for a more visual representation of driving behavior [12][14]. - The world model aims to establish a high-bandwidth cognitive system based on video/image data, addressing the limitations of language models in understanding complex, dynamic environments [15][19]. - The relationship between end-to-end, VLA, and world models is clarified, with end-to-end serving as the foundation, VLA as an upgrade, and the world model as the ultimate form of understanding spatial dynamics [12][19]. Group 3: Industry Perspectives and Trends - The industry is divided into three main technical routes: end-to-end, VLA, and world model, with companies like Horizon Robotics and Bosch primarily adopting end-to-end due to lower costs and higher stability [13][19]. - VLA has faced criticism from industry leaders who argue that its reliance on language models may not be essential for effective autonomous driving, emphasizing the need for spatial understanding instead [16][19]. - Tesla's recent publication has reignited discussions in the industry, positioning the company at the forefront of current technological directions and providing a systematic analysis of practical applications [20].