世界模型
Search documents
小米陈光:我们不想制造技术焦虑了
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-25 08:24
Core Viewpoint - The smart driving industry is experiencing a "term overload" phenomenon, with various factions emerging around different models such as VLA (Vision Language Action), VA (Vision Action), and WA (World Action) [2] Group 1: Industry Trends - The industry is divided between proponents of VLA, like Li Auto and Yuanrong Qixing, and opponents like Huawei and Xiaopeng, who prefer WA [2] - Xiaomi is focusing on end-to-end development, showcasing significant potential in this area, despite starting later than competitors like Li Auto and NIO [3][6] - Xiaomi's end-to-end algorithm has evolved rapidly, with multiple versions released within a year, indicating a fast-paced development cycle [6] Group 2: Technological Development - Xiaomi's latest version of its HAD (Highly Automated Driving) system incorporates world models and reinforcement learning, enhancing its cognitive capabilities [3][4] - The introduction of world models and reinforcement learning is seen as a necessary evolution from simple data-driven approaches to more complex cognitive-driven methodologies [9][10] - Xiaomi's approach emphasizes maximizing the model's intelligence density within limited computational resources [8][15] Group 3: Team Structure and Strategy - Xiaomi's smart driving team has grown to over 1,800 members, reflecting a rapid scaling compared to competitors [6][12] - The team is divided into three groups focusing on different technological routes, including end-to-end, VLA, and other exploratory research [4][13] - Xiaomi's strategy is characterized by a gradual introduction of new technologies, prioritizing user experience over merely adopting the latest advancements [5][10] Group 4: Challenges and Responses - The integration of reinforcement learning faces challenges, such as ensuring the fidelity of world models and managing computational efficiency [4][33] - Xiaomi's team has encountered external criticism, which they view as a necessary part of their growth and development process [25][26] - The company aims to balance the introduction of new technologies with the need for practical, user-friendly solutions [10][11]
对话大晓机器人董事长王晓刚:不押注VLA,押注世界模型
Sou Hu Cai Jing· 2025-12-25 07:59
Core Insights - The current technological routes in embodied intelligence, particularly the VLA model, have significant flaws in understanding the physical world and its laws [4][11] - Many companies are developing embodiments, but there is a lack of products that can truly understand the world and solve real problems [5] - In 2025, the domestic market is expected to see a surge in instant retail warehousing applications, which require 24/7 service, presenting an opportunity for robots to excel [5] Group 1: Company Strategy - The CEO of DaXiao Robotics, Wang Xiaogang, emphasizes a restrained approach by not entering the crowded embodiment market or betting on VLA, but instead focusing on the world model as a consensus direction in the industry [6][8] - DaXiao Robotics aims to integrate soft and hard solutions, addressing the shortcomings of existing technology routes, particularly the VLA model, which does not require a true understanding of the physical world [11][12] - The company’s world model consists of three parts: multi-modal understanding, long-term dynamic interaction scenes, and predictive capabilities, which are essential for the core of their technology [13] Group 2: Market Position and Opportunities - The industry is still maturing, and the head positioning has not been completed, with significant opportunities for new startups due to existing technological flaws [17] - The company sees a unique opportunity in the integration of hardware and software, leveraging its extensive client base from previous years to achieve rapid scaling in the robotics field [18] - Short-term goals include deploying four-legged robotic dogs with navigation and AI capabilities, while mid-term focus will be on commercial service scenarios like flash purchase warehouses [19] Group 3: Technological Differentiation - The ACE research paradigm proposed by DaXiao Robotics is seen as a revolutionary change that could provide a competitive edge in the market [18] - The world model approach is believed to be more adaptable and capable of covering a wider range of scenarios compared to VLA, which is limited by its embodiment [21] - The company plans to open-source its model to gather diverse feedback and data, differentiating its development path from other countries [22]
刚做了一份世界模型的学习路线图,面向初学者......
自动驾驶之心· 2025-12-25 03:24
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not a specific technology but rather a category of models with certain capabilities. It emphasizes the trend in the industry towards using world models for closed-loop simulation to address the high costs associated with corner cases in autonomous driving [2]. Course Overview - The course on world models in autonomous driving is structured into six chapters, covering the introduction, background knowledge, discussions on general world models, video generation-based models, OCC-based models, and job-related insights in the industry [5][6][7][8][9]. Chapter Summaries - **Chapter 1: Introduction to World Models** This chapter outlines the relationship between world models and end-to-end autonomous driving, discussing the development history and current applications of world models, as well as various streams such as pure simulation, simulation plus planning, and generating sensor inputs [5]. - **Chapter 2: Background Knowledge** This chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding subsequent chapters [6]. - **Chapter 3: General World Models** Focuses on popular general world models like Marble from Li Fei-Fei's team and Genie 3 from DeepMind, discussing their core technologies and design philosophies [7]. - **Chapter 4: Video Generation-Based World Models** This chapter delves into video generation algorithms, starting with GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM, highlighting both classic and cutting-edge advancements in this area [8]. - **Chapter 5: OCC-Based World Models** Concentrates on OCC generation algorithms, discussing three major papers and a practical project, emphasizing the potential for these methods to extend into vehicle trajectory planning [9]. - **Chapter 6: World Model Job Topics** This chapter shares practical insights from the instructor's experience, addressing industry applications, pain points, and interview preparation for positions related to world models [9]. Learning Outcomes - The course aims to provide a comprehensive understanding of world models in autonomous driving, equipping participants with the knowledge to achieve a level comparable to one year of experience as a world model algorithm engineer [10].
LeCun哈萨比斯神仙吵架,马斯克也站队了
量子位· 2025-12-25 00:27
Core Viewpoint - The article discusses a heated debate between AI experts Yann LeCun and Demis Hassabis regarding the nature of intelligence, particularly focusing on the concept of "general intelligence" and its implications for artificial intelligence development [3][8][30]. Group 1: Debate Overview - Yann LeCun argues that the idea of "general intelligence" is nonsensical, asserting that human intelligence is highly specialized rather than universal [9][13]. - Demis Hassabis counters LeCun's claims, stating that human brains exhibit significant generality and complexity, and that general intelligence is a valid concept [17][22]. - The debate has attracted considerable attention, with notable figures like Elon Musk publicly supporting Hassabis [5][7]. Group 2: Key Arguments - LeCun emphasizes that human intelligence is shaped by evolutionary pressures to adapt to specific environments, leading to specialized skills rather than general capabilities [14][36]. - Hassabis argues that the brain's complexity allows for general intelligence, and he believes that with sufficient resources, any computable task can be learned, akin to a Turing machine [18][24]. - Both experts agree on the importance of world models in AI development, but they differ in their interpretations and applications of this concept [50][42]. Group 3: Future Directions - LeCun plans to establish a new company, Advanced Machine Intelligence Labs, focusing on world models, with a target valuation of €3 billion (approximately ¥24.7 billion) [43]. - Hassabis highlights that Google DeepMind is also prioritizing world models, emphasizing the understanding of causal relationships and interactions within the world [47][49]. - The article concludes that while the two experts may appear to be discussing different aspects of intelligence, they are ultimately addressing the same fundamental issue of how to achieve artificial general intelligence (AGI) [41][42].
下周开课!我们设计了一份自动驾驶世界模型学习路线图....
自动驾驶之心· 2025-12-24 09:22
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, emphasizing that world models are a means to achieve end-to-end autonomous driving rather than a specific technology [2]. Summary by Sections Chapter 1: Introduction to World Models - This chapter provides an overview of the relationship between world models and end-to-end autonomous driving, covering the development history and current applications of world models. It introduces various types of world models, including pure simulation, simulation plus planning, and those generating sensor inputs and perception results, along with their industry applications and relevant datasets [5]. Chapter 2: Background Knowledge of World Models - The second chapter focuses on the foundational knowledge necessary for understanding world models, starting with scene representation and expanding to technologies like Transformer and BEV perception. It highlights key technical terms frequently encountered in job interviews related to world models [6][11]. Chapter 3: Discussion on General World Models - This chapter centers on general world models and recent popular works in autonomous driving, including models from Li Fei-Fei's team (Marble), DeepMind (Genie 3), and Meta (JEPA). It also discusses the widely talked-about VLA+ world model algorithms and Tesla's latest world model simulator shared at ICCV [7]. Chapter 4: Video Generation-Based World Models - The fourth chapter focuses on video generation algorithms, which are currently the most researched in both academia and industry. It covers classic works like GAIA-1 & GAIA-2 from Wayve and recent advancements such as UniScene and OpenDWM, providing a comprehensive view of the field's progress [8]. Chapter 5: OCC-Based World Models - This chapter discusses OCC generation algorithms, explaining three major papers and a practical project. These methods can be easily extended for vehicle trajectory planning, contributing to end-to-end solutions [9]. Chapter 6: World Model Job Topics - The final chapter shares practical insights from the instructor's years of experience, addressing the application of world models in the industry, existing pain points, and how to prepare for related job interviews, focusing on what companies prioritize [10]. Course Outcomes - The course aims to advance understanding of end-to-end autonomous driving, equipping participants with knowledge of world model technologies, including video generation and OCC generation methods, and preparing them for roles in the autonomous driving industry [10][13].
不装了,LeCun哈萨比斯神仙吵架,马斯克也站队了
3 6 Ke· 2025-12-24 07:47
Core Argument - The debate centers around the essence of intelligence, with Yann LeCun arguing against the concept of "general intelligence," while Demis Hassabis defends its existence and potential [6][8][12]. Group 1: Key Perspectives - LeCun claims that human intelligence is not "general" but rather a specialized adaptation to the physical world, emphasizing that humans excel in certain areas while failing in others [6][8][14]. - Hassabis counters that the human brain is the most complex known entity in the universe, possessing significant generality, and argues that the concept of general intelligence is valid and essential for understanding cognitive capabilities [9][10][12]. - The disagreement highlights a fundamental difference in their views: LeCun focuses on what intelligence is, while Hassabis emphasizes what intelligence can become [20]. Group 2: World Models - Both LeCun and Hassabis agree on the importance of "world models" in achieving artificial general intelligence (AGI), although they have different interpretations of what a world model entails [20][22]. - LeCun's upcoming venture, Advanced Machine Intelligence Labs, aims to develop world models that focus on control theory and cognitive science, rather than just visual representation [20][21]. - Hassabis has introduced the Genie 3 model, which aims to understand the causal relationships and interactions within the world, viewing it as a step towards AGI [21][22].
不装了!LeCun哈萨比斯神仙吵架,马斯克也站队了
量子位· 2025-12-24 05:14
Core Viewpoint - The article discusses a heated debate between AI experts Yann LeCun and Demis Hassabis regarding the nature of intelligence, particularly focusing on the concept of "general intelligence" and its implications for artificial intelligence development [3][8][30]. Group 1: Debate Overview - Yann LeCun argues that the idea of "general intelligence" is nonsensical, asserting that human intelligence is highly specialized rather than universal [9][13]. - Demis Hassabis counters LeCun's claims, stating that human brains exhibit significant generality and complexity, and that general intelligence is a valid concept [17][22]. - The debate has attracted considerable attention, with notable figures like Elon Musk publicly supporting Hassabis [5][7]. Group 2: Key Arguments - LeCun emphasizes that human intelligence is shaped by evolutionary pressures to adapt to specific environments, leading to specialized skills rather than general capabilities [14][36]. - Hassabis argues that the brain functions similarly to a Turing machine, capable of learning any computable content given sufficient resources, thus supporting the existence of general intelligence [18][24]. - The discussion highlights a fundamental disagreement over terminology, with LeCun focusing on the specialized nature of human cognition while Hassabis advocates for the potential of general intelligence [32][41]. Group 3: Future Directions in AI - Both experts agree on the importance of "world models" in advancing artificial general intelligence (AGI), though they have different interpretations of what this entails [42][50]. - LeCun's upcoming venture, Advanced Machine Intelligence Labs, aims to develop world models that prioritize understanding control theory and cognitive science [43][44]. - Hassabis and Google DeepMind are also focusing on world models, emphasizing the need for models that comprehend causal relationships and interactions within the world [46][47].
刷完英伟达今年所有的项目后,我们推荐这几个......
自动驾驶之心· 2025-12-24 03:29
Core Insights - NVIDIA has become a focal point in the AI landscape, achieving a market valuation of $5 trillion, an elevenfold increase over three years, marking it as the first company to reach this milestone [2] - The company has transitioned from a graphics chip manufacturer to a leading AI infrastructure provider, with significant advancements in various AI domains, including autonomous driving and embodied intelligence [2] Group 1: Technological Developments - The Cosmos series, initiated in January, has produced foundational models like Cosmos-Transfer1, Cosmos-Reason1, and Cosmos-Predict2.5, which support downstream applications in autonomous driving and embodied intelligence [5] - The Nemotron series aims to create a "digital brain" for the agent-based AI era, providing efficient models and tools for enterprises to build specialized AI systems [5] - The Isaac Lab project offers a GPU-accelerated simulation framework for multi-modal robot learning, addressing challenges in data scarcity and the simulation-to-reality gap [6] Group 2: Key Projects and Papers - The Nemotron Nano V2 VL model, a 12 billion parameter visual language model, achieves state-of-the-art performance in document understanding and long video reasoning tasks while maintaining text reasoning capabilities [12] - The Alpamayo-R1 project introduces a visual-language-action model that integrates causal reasoning and trajectory planning to enhance decision-making in complex driving scenarios [13] - The Cosmos-Predict2.5 model unifies text, image, and video generation capabilities, significantly improving video quality and consistency for physical AI tasks [17] Group 3: Performance Metrics - The Nemotron Nano V2 VL model has shown superior performance across 45 multi-modal benchmark tests, particularly in document understanding and long video question-answering tasks [12] - The Alpamayo-R1 model demonstrated a 12% increase in planning accuracy and a 35% reduction in derailment rates in challenging scenarios compared to baseline models [16] - The Cosmos-Reason1 model has achieved over a 10% performance improvement in physical reasoning tasks after fine-tuning, showcasing its capability in understanding physical laws [33]
深度解析世界模型嵌入具身系统的三大技术范式
具身智能之心· 2025-12-24 00:25
Core Insights - The article discusses the integration of world models into embodied intelligent systems, emphasizing the shift from reactive to predictive capabilities in these systems [1][3][8]. Summary by Sections Introduction to World Models - Embodied intelligent systems traditionally relied on a reactive loop of "perception-action" and lacked predictive capabilities. The introduction of world models allows these systems to "imagine" future scenarios [1][3]. Research Overview - A comprehensive survey from a research team including institutions like Tsinghua University and Harbin Institute of Technology categorizes existing research into three paradigms based on architectural integration [3][5]. Paradigm Classification - The relationship between world models (WM) and policy models (PM) is described as a "coupling strength spectrum," ranging from weak to strong dependencies [11]. - Three categories are identified: Modular, Sequential, and Unified architectures, each with distinct characteristics regarding gradient flow and information dependency [12]. Modular Architecture - In this architecture, WM and PM are independent, with no gradient flow between them. WM acts as a simulator, predicting future states based on current observations and candidate actions [16]. Sequential Architecture - This architecture involves two stages where WM predicts future states, and PM executes actions based on those predictions. It simplifies complex tasks into goal generation and goal-conditioned execution [17][18]. Unified Architecture - The unified architecture integrates WM and PM into a single end-to-end network, allowing for simultaneous training and optimization. This structure enables the system to predict future states and generate actions without explicitly separating simulation and decision-making [19][21]. Future Directions - The article outlines potential research directions, including the selection of representation spaces for world models, the generation of structured intentions, and the need for unified world-policy model paradigms to enhance decision-making efficiency [22][24].
7000亿豪赌,扎克伯格买了众叛亲离
创业邦· 2025-12-23 10:51
Core Viewpoint - 2025 is expected to be a tumultuous year for Meta, with significant internal challenges and strategic shifts in its AI initiatives [3][4]. Group 1: AI Strategy and Developments - Meta is aggressively pursuing AI advancements, restructuring its AI department around the Meta Superintelligence Labs (MSL) and investing hundreds of billions to compete with rivals like OpenAI and Google [5][6]. - The company is developing new AI models, "Mango" for image and video generation and "Avocado" for advanced code generation, with a planned release in 2026 [12][19]. - Internal issues have plagued the development of the Llama 4 model, which has underperformed and faced multiple delays, leading to concerns about Meta's AI capabilities [16][19]. Group 2: Leadership and Internal Dynamics - CEO Mark Zuckerberg's management style has shifted towards micromanagement, causing internal chaos and dissatisfaction among employees, including key figures like Alexandr Wang [10][31]. - Wang, who was brought in to lead AI initiatives, has expressed frustration over Zuckerberg's tight control, which he believes stifles innovation [31][32]. - The company has seen a wave of executive departures, including long-standing leaders and key AI talent, raising concerns about its internal stability and future direction [40][41]. Group 3: Financial Commitments and Future Outlook - Meta's capital expenditures are projected to reach at least $70 billion in 2025, significantly higher than the previous year's $39 billion, as the company invests heavily in AI infrastructure [48]. - The company has issued a $30 billion corporate bond, one of the largest in U.S. history, to fund its AI initiatives and maintain a competitive edge [53]. - Despite substantial investments, there is uncertainty regarding how Meta will monetize its AI developments, with calls for clearer strategies on integrating AI into its existing business model [57][58].