Workflow
世界模型
icon
Search documents
Google 新作背后:机器人测评Evaluation范式正在发生变化
具身智能之心· 2025-12-19 00:05
具身纪元 . 以下文章来源于具身纪元 ,作者具身纪元 见证具身浪潮,书写智能新纪元 编辑丨 具身纪元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 姚顺雨的在人工智能下半场的文章《The Second Half》,他说:在AI的下半场,技术方案已经很成熟,瓶颈变成了评估。 在具身智能的下半场,模型评估更加重要,也更加复杂。 完整评估单一策略,本身就不容易。 传统的评估方法需要在真机上去测试 ,困难也接踵而至: 第一点,成本高 :在真实硬件上进行大规模测试既费时又费力 尤其是当需要对比多个不同的策略版本时。 如果要提升测试效率,多个硬件的部署在所难免,这又是额外的成本。 控制测评变量的沉默成本也不小,比如要减轻光照的影响,要挑同样光线的情况去做测评 第二点,覆盖面有限: 测评需要设置不同的情况来测试模型是否能够依旧表现出色,但在真实场景中很难穷尽所有现实的情况,比如干扰物、杂乱的桌面和光线等 第三点,安全性风险: 测试机器人的安全性,往往意味着要给机器人去尝 ...
《环球时报》记者探访2025人工智能创新大会:AI下一程,从“单点突围”到“生态共进”
Huan Qiu Wang· 2025-12-18 22:49
Core Insights - Artificial Intelligence (AI) is becoming the core engine driving the development of new productive forces, but traditional scaling methods are no longer sufficient for maintaining rapid iteration in AI technology [1] - The key paths for advancing AI technology and industrial upgrading in China are collaboration and integration [2] Industry Developments - China is promoting "AI+" at the national strategic level, aiming for a comprehensive layout in response to rapid technological advancements [2] - The AI+ model, driven by large models, has permeated nearly all industries within a few years, but faces challenges such as high-end computing power shortages and high application costs [2][3] - The HAIC2025 conference emphasized "open computing" to combine the advantages of various AI industry chain enterprises, moving from isolated technological breakthroughs to collaborative industrial ecosystems [2][3] Technological Innovations - The scaleX supercluster, designed for trillion-parameter models and complex tasks, was showcased at HAIC2025, achieving a 20-fold increase in computing density per cabinet and significantly lowering overall ownership costs [4] - The supercluster supports multiple brands of AI acceleration cards and is compatible with mainstream computing ecosystems [4] Future Directions - The future of AI development is characterized by "two supers," "one openness," and "two integrations," focusing on ultra-node and ultra-density computing, open ecosystems, and the integration of various computing resources [6][7] - AI superclusters are seen as a promising direction, overcoming traditional communication bottlenecks and enhancing computational efficiency [7] Practical Applications - The HAIC2025 conference highlighted numerous successful AI+ applications, including the world's first multimodal language model focused on geographic science, which addresses global change and sustainable development issues [8] - Examples of AI+ applications include the rapid iteration of domestic electric vehicles, supported by AI computing in design and testing, and the "5G+ smart highway" project in Gansu province, which utilizes AI for traffic management [8]
首创ACE具身研发范式 大晓机器人构建具身智能开放新生态
Core Insights - The launch of the ACE (Ambient Capture Engine) and the open-source Kairos 3.0 model marks a significant advancement in embodied intelligence, aiming to create a fully autonomous and controllable ecosystem in the industry [1][2] - The focus on "human-centered" ACE development paradigm emphasizes the interaction between humans and the physical world, enabling extensive data collection and enhancing the value of real data [1][2] Group 1 - The ACE paradigm allows for the collection of millions of hours of environmental data, which can scale to over a billion hours of data value through the Kairos 3.0 model [1] - The Kairos 3.0 model is open-sourced for developers, facilitating the rapid emergence of lightweight and customized embodied intelligence products [2] - Strategic partnerships with various companies, including Mu Xi Co., Wallen Technology, and Zhongke Shuguang, have been established to enhance chip performance and adapt the Kairos 3.0 model [2] Group 2 - The launch of the A1 super brain module aims to accelerate the commercialization of robots and enhance the value of the embodied intelligence industry [2] - Collaboration with leading companies in the field of robotics, such as Zhiyuan Robotics and Galaxy General, is focused on creating solutions suitable for various scenarios [2] - The expectation of large-scale deployment of four-legged robots in retail sectors like front warehouses and flash purchase warehouses is anticipated to begin next year [3]
特斯拉再一次预判潮水的方向
自动驾驶之心· 2025-12-18 09:35
Core Viewpoint - Tesla's AI leader Ashok Elluswamy revealed the technical methodology behind Tesla's Full Self-Driving (FSD) in a recent article, emphasizing the choice of an end-to-end neural network model and addressing the challenges faced in practice [4][6]. Group 1: End-to-End Neural Network Model - Tesla's decision to adopt an end-to-end neural network model is driven by the need to address complex driving scenarios that cannot be pre-defined by rules, such as the "trolley problem" and second-order effects [6][10]. - The end-to-end model is described as a complete overhaul of previous architectures, fundamentally changing design, coding, and validation processes, leading to a more human-like driving experience [11][19]. - The model outputs driving instructions alongside interpretable "intermediate results," utilizing technologies like generative Gaussian splatting to create dynamic 3D models of the environment in real-time [8][17]. Group 2: VLA and World Model Concepts - VLA (Vision-Language-Action) is an extension of the end-to-end model that incorporates language information, allowing for a more visual representation of driving behavior [12][14]. - The world model aims to establish a high-bandwidth cognitive system based on video/image data, addressing the limitations of language models in understanding complex, dynamic environments [15][19]. - The relationship between end-to-end, VLA, and world models is clarified, with end-to-end serving as the foundation, VLA as an upgrade, and the world model as the ultimate form of understanding spatial dynamics [12][19]. Group 3: Industry Perspectives and Trends - The industry is divided into three main technical routes: end-to-end, VLA, and world model, with companies like Horizon Robotics and Bosch primarily adopting end-to-end due to lower costs and higher stability [13][19]. - VLA has faced criticism from industry leaders who argue that its reliance on language models may not be essential for effective autonomous driving, emphasizing the need for spatial understanding instead [16][19]. - Tesla's recent publication has reignited discussions in the industry, positioning the company at the forefront of current technological directions and providing a systematic analysis of practical applications [20].
商汤科技、大晓机器人与中科曙光正式达成战略合作,将共建国产化“算力基础设施+世界模型+具身智能 ”生态
Xin Lang Cai Jing· 2025-12-18 07:04
Core Insights - The first AI Innovation Conference (HAIC2025) was held on December 18, where SenseTime, Daxiao Robotics, and Zhongke Shuguang announced a strategic partnership to advance AI infrastructure and embodied intelligence technologies [1] Group 1 - The collaboration focuses on domestic AI infrastructure and key technologies in embodied intelligence, leveraging each company's technological and industrial strengths [1] - The partnership aims to promote the development of a comprehensive ecosystem that integrates computing power infrastructure, world models, and embodied intelligence [1] - This initiative is expected to accelerate the extension of AI capabilities into the physical world [1]
2026产业预判:AI智能体接管互联网,认知差异将重塑贫富格局
Tai Mei Ti A P P· 2025-12-18 04:20
Core Insights - The report from Andreessen Horowitz (a16z) signals a significant shift in the internet landscape, indicating that the foundational elements built over the past fifteen years are collapsing as AI agents replace human users [1][2] - The transition to AI-driven interactions will lead to a "recursive storm" in infrastructure, fundamentally altering the way businesses operate and compete [1][3] Infrastructure Crisis - The internet infrastructure was originally designed for human users, characterized by low concurrency and predictable behavior, but this assumption will be overturned by 2026 [2] - The emergence of Agentic Architecture will replace existing backend logic, as AI agents will execute thousands of tasks in milliseconds, resembling a DDoS attack rather than typical user traffic [3] Business Logic Transformation - The traditional attention economy, which focused on screen time and user engagement, is becoming obsolete as AI can complete tasks without human interaction [6] - Future monetization models will shift from "per user" to "per outcome," with a focus on ROI rather than user engagement metrics [6][7] - SEO will be replaced by GEO (Generative Engine Optimization), emphasizing machine readability over human-centric design [7] SaaS Evolution - A future "multi-agent collaboration network" in the B2B sector will enable various AI agents to negotiate and process information autonomously, creating a new competitive landscape for SaaS companies [9] - The core competency for SaaS firms will shift from feature accumulation to ecosystem connectivity [9] Experience and Service Enhancement - The concept of "world models" will transform media from passive consumption to interactive environments, leading to highly customized service models across various sectors [9] Educational and Healthcare Innovations - The emergence of "AI-native universities" will allow for real-time updates to curricula based on the latest research and student feedback, enabling personalized education [12] - Healthcare will transition from low-frequency, high-cost treatments to high-frequency, subscription-based preventive care, creating a new demographic of "health MAUs" [12]
世界模型是一种实现端到端自驾的途径......
自动驾驶之心· 2025-12-18 03:18
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not end-to-end but serve as a pathway to achieve end-to-end autonomous driving [2][3][4]. Group 1: Definitions and Concepts - End-to-end autonomous driving is defined as a model that processes information input on one end and outputs decision results without explicit information processing and decision logic [3]. - World models are defined as models that accept information input and internally establish a complete understanding of the environment, capable of reconstructing and predicting future changes [4]. Group 2: Course Introduction - A new course on world models has been launched, focusing on general world models, video generation, and OCC generation algorithms, including applications from Tesla and the Li Fei Fei team [5]. - The course aims to enhance understanding of end-to-end autonomous driving and is designed for individuals looking to enter the autonomous driving industry [15]. Group 3: Course Structure - Chapter 1 introduces world models and their relationship with end-to-end autonomous driving, covering historical development and current applications [10]. - Chapter 2 provides foundational knowledge on world models, including scene representation and relevant technologies like Transformer and BEV perception [10][16]. - Chapter 3 discusses general world models and popular algorithms such as Marble and Genie 3, explaining their core technologies and design philosophies [11]. - Chapter 4 focuses on video generation world models, detailing significant works and advancements in this area [12]. - Chapter 5 covers OCC generation models, discussing their applications and potential for trajectory planning [13]. - Chapter 6 shares industry insights and interview preparation tips for roles related to world models [14]. Group 4: Learning Outcomes - The course aims to elevate participants to the level of a world model autonomous driving algorithm engineer within approximately one year, covering key technologies and enabling practical application in projects [18].
67页深度 | 智能驾驶行业专题:Robo-X的产业趋势、市场空间和产业链拆解【国信汽车】
车中旭霞· 2025-12-18 01:09
Industry Insights - The Robo-X initiative is expected to reach a milestone in 2026, driven by supportive policies, technological advancements, and cost reductions in L4 autonomous driving [3][4] - The global L4 market is projected to exceed trillions by 2030, with the domestic Robotaxi market estimated at 236 billion yuan annually, and Robovan and Robotruck markets also showing significant potential [4][12] - The competitive landscape includes key players such as Pony.ai and WeRide in the Robotaxi sector, with various companies emerging in Robovan, Robotruck, Robobus, and Robosweeper markets [4] Company Analysis - Pony.ai reported a 72% year-on-year revenue growth in Q3, with ongoing progress in the commercialization of Robotaxi services [1][2] - WeRide achieved a remarkable 144% year-on-year revenue growth in Q3, indicating accelerated commercialization of its L4 products [2][1] Policy Developments - Global policies are increasingly supportive of autonomous driving, with countries like the UAE and Singapore implementing frameworks to facilitate the testing and deployment of autonomous vehicles [12][14] - In China, the Ministry of Industry and Information Technology has initiated pilot programs for smart connected vehicles, involving major automotive companies [14][15] Investment Trends - In 2025, the L4 sector is expected to attract significant investment, with over 49 financing events reported, totaling nearly 21.8 billion yuan in funding [16]
未来智造局|当AI走进物理世界:从一场技能赛看具身智能的“能”与“不能”
Xin Hua Cai Jing· 2025-12-17 16:53
Core Insights - The 2025 Global Developer Pioneer Conference showcased advancements in robotics, highlighting both capabilities and limitations in real-world applications [1][2] - The field of embodied intelligence has made significant progress over the past year, with robots demonstrating improved stability and functionality in various tasks [2][3] Group 1: Technological Advancements - The A2 humanoid robot successfully completed a continuous 100-kilometer cross-province walk, demonstrating its stability [2] - The evolution of the Visual-Language-Action (VLA) model has enhanced robots' cognitive abilities, allowing them to understand human commands and adapt to unfamiliar environments [2] - Robots showcased their skills in tasks such as flower arrangement and restaurant service, effectively identifying materials and controlling grip strength to prevent spills [2] Group 2: Limitations and Challenges - Robots still struggle with complex tasks like folding clothes due to the variability of soft materials, requiring extensive training data [4] - Precision tasks such as screwing require human remote operation, as robots lack the necessary tactile feedback and understanding of physical properties like friction and torque [6] - In industrial settings, while robots can navigate and grasp objects, they still face challenges with stability and precision during operations [7] Group 3: Future Directions - The industry is exploring new research paradigms to address existing challenges, with "world models" being a focal point for improving spatial understanding and causal reasoning [8] - Experts suggest that the evolution of embodied intelligence should transition from imitation to reasoning, integrating planning and control into a unified framework [8][9] - The industry must overcome data scarcity and promote collaboration through open standards and challenges to facilitate algorithm reproducibility and commercialization [9]
深度解析世界模型:新范式的路线之争,实时交互与物理仿真
海外独角兽· 2025-12-17 07:53
Core Insights - The article posits that 2026 will be a pivotal year for multimodal technology, particularly in video generation and world models, with significant advancements expected in both research and practical applications [2][3]. Group 1: Definition and Importance of World Models - Various definitions of world models exist, including comparisons to human brain representations and neural networks that understand physical rules [4][5]. - World models are increasingly important due to three trends: limitations of language-based intelligence, rapid advancements in architecture and algorithms, and the demand for embodied intelligence [5]. Group 2: Key Improvements Needed for World Models - Long-term memory is crucial for generating coherent, continuous worlds, with current models limited to short video segments [6][7]. - Interactivity is essential, allowing users to influence world generation through real-time actions, which requires innovative training methods [8][11]. - Real-time feedback is critical for applications like gaming and VR, with current models struggling to meet low latency requirements [12][15]. - Physical realism is vital for high-stakes applications like autonomous driving, necessitating models that adhere to real-world physics [16][18]. Group 3: Two Development Paths for World Models - The first path focuses on real-time video world models for consumer applications, prioritizing interactivity and long-term memory over physical realism [19][20]. - The second path emphasizes structured 3D models for robotics and autonomous driving, prioritizing physical accuracy and reliability [21][22]. Group 4: Market Players and Their Positions - The market is categorized into four quadrants based on representation forms and target audiences, with players like Decart and Odyssey positioned in different segments [24][26]. - World Labs is highlighted as a leading startup focusing on spatial intelligence, emphasizing 3D consistency and persistence in its models [26][28]. - General Intuition leverages vast gaming data to train agents for spatial-temporal reasoning, positioning itself uniquely in the market [33][35]. - Decart aims for speed and efficiency with its interactive AI model Oasis, while Odyssey focuses on high-fidelity reconstruction for creative industries [39][45].