Workflow
世界模型
icon
Search documents
从 ChatGPT 到 Marble,李飞飞押注的下一个爆发点是 3D 世界生成?
锦秋集· 2025-09-18 07:33
Core Viewpoint - The article discusses the launch of World Labs' latest spatial intelligence model, Marble, which allows users to generate persistent and navigable 3D worlds from images or text prompts, marking a significant advancement in spatial intelligence technology [1][2]. Summary by Sections Marble's Features and Comparison - Marble shows significant improvements over similar products in geometric consistency, style diversity, world scale, and cross-device support, allowing users to truly "walk into" AI-generated spaces [2]. Li Feifei's Vision and World Model Narrative - Li Feifei's approach emphasizes a transition from language understanding to world understanding, culminating in spatial intelligence as a pathway to AGI (Artificial General Intelligence) [3][6]. Limitations of LLMs - While acknowledging the achievements of large language models (LLMs), Li Feifei highlights their limitations in understanding the three-dimensional world, asserting that true intelligence requires spatial awareness [5][7]. The Necessity of Spatial Intelligence for AGI - Spatial intelligence is deemed essential for AGI, as the real world is inherently three-dimensional, and understanding it requires more than just two-dimensional observations [16]. Evolution of AI Learning Paradigms - The article outlines three phases of AI learning evolution: supervised learning, generative modeling, and the current focus on three-dimensional world models, emphasizing the importance of data, computation, and algorithms [21][24]. Data Strategy for World Models - A mixed approach to data collection is necessary for training world models, combining real data acquisition, reconstruction, and simulation to overcome the scarcity of high-quality three-dimensional data [26]. Practical Applications and Development Path - The initial focus for Marble's application is on content production, transitioning to robotics and AR/VR, with an emphasis on creating interactive 3D worlds for various industries [29][30].
来自MIT最强AI实验室:OpenAI天才华人研究员博士毕业了
3 6 Ke· 2025-09-17 07:05
Core Insights - The article highlights the achievements of Boyuan Chen, a Chinese researcher at OpenAI, who recently completed his PhD at MIT in under four years, focusing on world models, embodied AI, and reinforcement learning [1][5][7]. Group 1: Academic Background and Achievements - Boyuan Chen holds a PhD in Electrical Engineering and Computer Science from MIT, with a minor in philosophy [7][24]. - He has been involved in significant projects at OpenAI, including the development of GPT image generation technology and the Sora video generation team [5][1]. - During his time at Google DeepMind, he contributed to the training of multimodal large language models (MLLM) using large-scale synthetic data [7][10]. Group 2: Research Focus and Future Aspirations - Chen emphasizes the importance of visual world models for embodied intelligence, believing that integrating these fields will enhance AI's understanding and interaction with the physical world [4][7]. - He expresses optimism about the future of embodied intelligence, predicting it will be a key technology for the next century and hopes to witness the emergence of general-purpose robots [17][20]. - OpenAI is reportedly increasing its efforts in robotics technology, aiming to develop algorithms for controlling robots and hiring experts in humanoid robotics [20].
DeepMind哈萨比斯最新认知都在这里了
量子位· 2025-09-15 05:57
Core Insights - The discussion emphasizes the potential of achieving Artificial General Intelligence (AGI) within the next decade, which could usher in a new scientific renaissance and significant advancements across various fields such as energy and health [2][7][51] - Current AI systems, while advanced, lack true creativity and the ability to generate new hypotheses, which are essential characteristics of AGI [5][34] Group 1: AGI Development - Demis Hassabis predicts that AGI could be realized around 2030, but current AI systems are not yet at a "PhD-level intelligence" due to their limited capabilities in various domains [4][35] - The construction of AGI requires a comprehensive understanding of the physical world, not just abstract concepts like language or mathematics [6][22] - Hassabis believes that the arrival of AGI will lead to a "scientific golden age," providing immense benefits to humanity [7][51] Group 2: DeepMind's Role - DeepMind is viewed as a central engine within Alphabet, integrating various AI teams to develop models like Gemini, which are now embedded in Google's ecosystem [15] - The team at DeepMind consists of approximately 5,000 members, primarily engineers and researchers, focusing on advancing AI technologies [16] Group 3: Innovations in AI Models - The Genie 3 model represents a breakthrough in creating interactive virtual environments based on textual descriptions, showcasing the ability to generate realistic physical interactions [17][20] - The development of mixed models, which combine learning components with established solutions, is seen as crucial for advancing AGI [45][47] Group 4: Future of Robotics - Hassabis envisions a future where robots can understand and interact with the physical world through language commands, enhancing their utility in everyday tasks [23][25] - The design of humanoid robots is considered beneficial for navigating human environments, while specialized robots will still have their unique applications [26][27] Group 5: AI in Drug Development - DeepMind is working on transforming drug development processes, aiming to reduce the timeline from years to weeks or days, leveraging breakthroughs like AlphaFold [41][43] - Collaborations with pharmaceutical companies are underway to advance research in areas such as cancer and immunology [44] Group 6: Energy Efficiency and AI - The conversation highlights the importance of energy efficiency in AI systems, with advancements in model architecture and hardware optimization potentially mitigating energy demands [49][50] - Hassabis believes that the contributions of AI to energy efficiency and climate change will outweigh its energy consumption in the long run [50] Group 7: Creative Tools and User Experience - The future of creative tools like Nano Banana is characterized by their ability to allow users to interact intuitively, enabling rapid iterations and creative processes [38][39] - These tools are designed to democratize creativity, making advanced capabilities accessible to a broader audience while enhancing the productivity of professional creators [39][40]
理想汽车推送OTA 8.0版本,李想称公司辅助驾驶开始“全面领先”,VLA优于世界模型?
Mei Ri Jing Ji Xin Wen· 2025-09-12 10:06
Core Viewpoint - Li Auto's advanced driver assistance and smart cockpit have transitioned from "partially leading" to "fully leading" following the OTA 8.0 update of their vehicle system [1] Group 1: OTA 8.0 Update - The OTA 8.0 version has officially launched, enhancing driver assistance, smart cockpit, and smart electric features [3] - The new VLA (Vision-Language-Action Model) driver model is being fully pushed to Li MEGA and L series AD Max models [3] - Li Auto's chairman, Li Xiang, described VLA as the third generation of their driver assistance technology, emphasizing its ability to understand road conditions, comprehend human commands, and remember user habits [3] Group 2: VLA Model Features - The current version of VLA is referred to as a "crippled version" due to the temporary absence of a highly praised feature [4] - Li Auto has acknowledged the need for a cautious approach in rolling out new features, especially after the suspension of the VLA remote summon function [4] - The VLA model enhances the accuracy of route selection in complex scenarios and remembers user speed preferences for specific roads [6] Group 3: Industry Competition and Technology - Other companies like Yuanrong Qixing and XPeng Motors are also developing VLA models, indicating a competitive landscape in this technology [7] - The VLA model is seen as an "intelligent enhanced version" of end-to-end models, addressing challenges in handling unseen scenarios [8] - The VLA model integrates perception, action execution, and language processing, enhancing its ability to understand and make decisions in complex environments [8] Group 4: Differing Approaches - Huawei's approach focuses on the World Action model, which bypasses the language processing step, emphasizing direct control through vision [12] - The debate between VLA and world models highlights differing strategies in achieving advanced autonomous driving capabilities [12][13] - Experts suggest that both VLA and world models can coexist and complement each other, with different companies choosing paths based on their specific goals [13]
成都研发出国内首个基于世界模型的机器人任务执行系统 让人形机器人实现“类人思考”
Si Chuan Ri Bao· 2025-09-12 06:23
据悉,在操作中,给机器人一张想达成的目标图片后,机器人便会自动判断现有状态,自主规划任 务并执行,最终使结果与目标图片相符合。该系统在陌生环境中展现出强大的自适应性和任务完成度, 从源头上解决了人形机器人不够"聪明"的问题,成为加快机器人迈向实用化、商业化的重要一步。 成都人形机器人创新中心相关负责人介绍,世界模型是一种真正接近人类大脑思考方式的系统框 架,通过学习现实世界中的物理和因果规律,具备"类似条件反射的物理直觉",可在内部模拟环境变 化,基于当前环境状态推演未来状态,并评估行为所产生的后果。以人类为例,当人们看到乌云密布 时,就会自然地预判"马上就要下雨了",因为人的大脑已经提前模拟了未来的天气变化。 记者在演示视频中看到,当给机器人一张插有吸管的玻璃瓶图片作为目标,机器人随即对现场环境 进行观察,看到了一个没有吸管的玻璃瓶。此时,R-WMES系统通过规划,生成一套完整的机器人动作 方案:指示机器人先抓取一根吸管,再插入玻璃瓶,从而使最终结果与预设的"带吸管玻璃瓶"目标图片 完全一致。 这是国内首个基于世界模型的机器人任务执行系统(R-WMES),标志着成都在核心人工智能与人 形机器人技术的"世界模 ...
特斯拉、华为与新势力决胜:世界模型大战
3 6 Ke· 2025-09-12 02:45
Core Viewpoint - The emergence of "World Models" has complicated the high-end intelligent driving landscape, leading to debates over the authenticity and effectiveness of various models like VLA, WEWA, and others [3][5]. Group 1: Company Perspectives - Xiaopeng Motors claims to be the only company in China that has truly developed VLA, criticizing competitors for creating modified versions [3][7]. - Huawei's CEO of Intelligent Automotive Solutions stated that the company will not pursue the VLA path, emphasizing a focus on World Action (WA) instead of language processing [3][5]. - Li Auto is developing a foundational model to support its MindVLA algorithm, which is positioned as a marketing strategy rather than a true VLA implementation [7][8]. Group 2: Technical Insights - VLA (Vision-Language-Action) is seen as an evolution of the end-to-end plus VLM (Vision-Language Model) approach, addressing some limitations of the previous models [5][7]. - Xiaopeng is developing a large-scale driving model with 720 billion parameters, utilizing cloud distillation to deploy smaller models to vehicles [8][15]. - The concept of World Models, initially proposed by Tesla, aims to create a virtual environment for autonomous driving learning and validation [9][11]. Group 3: Industry Trends - The industry is witnessing a shift from perception-driven to cognition-driven approaches, with various companies exploring different architectures for intelligent driving [12][13]. - The debate over the effectiveness of VLA versus World Models reflects a broader struggle within the industry to define the best methodologies for achieving autonomous driving capabilities [17]. - The integration of cloud and vehicle-based models is seen as essential for optimizing perception and decision-making in autonomous systems [17].
自动驾驶世界模型技术交流群成立了
自动驾驶之心· 2025-09-11 23:33
自动驾驶之心世界模型技术交流群成立了,欢迎大家加入一起世界模型相关的内容。感兴趣的同学欢迎添 加小助理微信进群:AIDriver005, 备注:昵称+世界模型加群。 ...
快讯|成立1个月的具身黑马融资2亿;中国首个基于世界模型的机器人任务执行系统;工信部:我国已具备人形机器人全产业链制造能力等
机器人大讲堂· 2025-09-11 12:57
Group 1 - Chengdu's humanoid robot innovation center has developed the first domestic robot task execution system (R-WMES) based on a world model, marking a significant milestone in intelligent humanoid robot capabilities [2] - The world model framework mimics human brain thinking by learning physical and causal laws from the real world, enabling robots to autonomously plan and execute tasks based on target images [2] - The R-WMES system demonstrates strong adaptability and task completion in unfamiliar environments, addressing the intelligence gap in humanoid robots and accelerating their practical and commercial application [2] Group 2 - The Ministry of Industry and Information Technology (MIIT) stated that China has established a complete manufacturing capability for humanoid robots, covering key chips, components, and complete machines [5] - Since the 14th Five-Year Plan, 46 cities have been supported in new technology transformation pilot projects, resulting in over 230 excellent smart factories and 1,260 5G factories [5] - China's industrial robot installation accounted for over 50% of the global total, with significant improvements in energy consumption efficiency for products like steel and cement [5] Group 3 - Xingyuan Intelligent, a company focused on embodied intelligence, has completed a 200 million RMB angel round of financing to accelerate the development and commercialization of its embodied brain technology [6] - The company was incubated by the Beijing Academy of Artificial Intelligence and aims to create a universal embodied brain for the physical world, leveraging a team of top talents in the field [6] - The founding team includes experienced professionals from leading companies, establishing a closed-loop ecosystem of "technical barriers + commercial realization" [6] Group 4 - The Swiss Federal Institute of Technology Zurich has proposed an innovative control framework for legged robots that combines reinforcement learning and multi-head attention mechanisms, enabling precise control and 100% success in obstacle navigation [11] - This method enhances the robot's adaptability to complex terrains by dynamically adjusting its focus based on real-time motion states and environmental data [11] - Both GR-1 and ANYmal-D robots have shown excellent performance in experimental and real-world environments, opening up new possibilities for practical applications [11] Group 5 - Lifeward's seventh-generation personal exoskeleton, ReWalk 7, has received CE certification for the European market, marking a significant milestone in medical device innovation for spinal cord injury rehabilitation [12] - ReWalk 7 features cloud connectivity, allowing users to control the device and track usage data through a smartwatch and mobile app, enabling personalized rehabilitation goals [12] - The new system supports seamless transitions between indoor and outdoor environments and includes one-click activation for stairs and sidewalks, enhancing user independence [12]
VLA:有人喊“最强解法”,有人说“跑不动”
3 6 Ke· 2025-09-11 08:17
Core Viewpoint - The intelligent driving industry is at a critical juncture with the emergence of VLA (Vision-Language-Action) technology, leading to a division among key players regarding its potential and implementation [1][2][3]. Group 1: VLA Technology and Its Implications - VLA is seen as a potential solution to the limitations of end-to-end systems in intelligent driving, which can only address about 90% of the challenges [6][10]. - The introduction of language as a bridge in the VLA model aims to enhance the system's understanding and decision-making capabilities, allowing for more complex and nuanced driving actions [12][14][18]. - VLA is believed to improve three key areas: understanding dynamic traffic signals, enabling natural voice interactions, and enhancing risk prediction capabilities [19][20][21]. Group 2: Challenges and Criticisms of VLA - Despite the potential advantages, VLA faces significant challenges, including the need for substantial financial investment and the technical difficulties of aligning multimodal data [31][32]. - Critics argue that VLA may not be necessary for achieving higher levels of autonomous driving, with some suggesting it is more of a supplementary enhancement rather than a fundamental solution [35][36]. - The current limitations of existing intelligent driving chips hinder the effective deployment of VLA models, raising concerns about their practical application in real-world scenarios [31][32]. Group 3: Industry Perspectives and Strategies - Companies like Li Auto, Yuanrong, and Xiaopeng are betting on VLA, emphasizing high investment and computational intensity to pursue its development [41][42]. - In contrast, players like Huawei and Horizon are focusing on structural solutions and world models, arguing that these approaches may offer more reliable paths to achieving advanced autonomous driving [43][46]. - The ongoing debate over VLA reflects broader strategic choices within the industry, with companies prioritizing different technological pathways based on their resources and market positioning [47].
2025年,盘一盘中国智驾的自动驾驶一号位都有谁?
自动驾驶之心· 2025-09-10 23:33
Core Viewpoint - The automatic driving industry is undergoing a significant technological shift towards "end-to-end" solutions, driven by Tesla's leadership and advancements in large model technologies. This shift is prompting domestic automakers to increase investments and adjust their structures, making "end-to-end" a mainstream production solution by 2024 [1]. Group 1: Key Figures in Automatic Driving - The article highlights key figures in China's automatic driving sector, focusing on those who directly influence technology routes and team growth [1]. - Notable leaders include: - **Lang Xianpeng** from Li Auto, who has led advancements in assisted driving technology, including the launch of full-scene NOA and the no-map NOA feature [5]. - **Ye Hangjun** from Xiaomi, who has been pivotal in the development of Xiaomi's end-to-end driving system and has overseen multiple cutting-edge projects [7][9]. - **Ren Shaoqing** from NIO, who has significantly contributed to the development of urban NOA and emphasizes the importance of data in smart driving [11]. - **Li Liyun** from XPeng, who has taken over leadership in smart driving and focuses on a pure vision solution [14][15]. - **Yang Dongsheng** from BYD, who has led the development of the DM-i hybrid system and is pushing for the integration of advanced driving systems across all BYD models [17][20]. - **Su Jing** from Horizon Robotics, who is leading the development of end-to-end HSD solutions [21][22]. - **Cao Xudong** from Momenta, who has developed a data-driven strategy for autonomous driving and is focusing on end-to-end large models [25][26]. Group 2: Technological Trends and Innovations - The article discusses the technological evolution in the automatic driving sector, emphasizing the transition to end-to-end architectures and the emergence of large models, world models, and VLM solutions [1][53]. - Companies are adopting various strategies: - Li Auto is focusing on E2E and VLA systems [5]. - Xiaomi is heavily investing in end-to-end technology with significant output [9]. - NIO is pursuing a world behavior model approach [11]. - XPeng is committed to a pure vision strategy [15]. - BYD is integrating advanced driving systems across its entire lineup [20]. - Momenta is leveraging a dual strategy of L2 and L4 development to enhance its market position [26]. Group 3: Future Outlook - The article concludes that the leaders in the automatic driving industry are crucial in shaping the future of smart driving in China, with a shared goal of creating systems that are safe, reliable, and tailored to local conditions [51][53]. - The ongoing competition and collaboration among these leaders will drive the industry towards more intelligent and user-friendly solutions [51].