Workflow
因果推理
icon
Search documents
世界模型最新综述!中科院联合MBZ、NTU、Oxford系统梳理前沿进展
机器之心· 2026-03-24 09:17
Core Insights - The article emphasizes the significance of world models in advancing AI capabilities towards reasoning, planning, and decision-making, moving beyond mere understanding of the present [2][3] - A comprehensive survey categorizes existing world models into four main branches: observation-level generative models, latent-space models, reinforcement learning-based models, and object-centric models [2][9] Group 1: Research Motivation - The resurgence of world models is attributed to advancements in video generation, multimodal foundational models, and large-scale training, highlighting their importance in building general intelligent systems [6] - The article notes the fragmented discussions on world models across various fields, indicating a lack of unified technical routes and evaluation protocols [6][7] Group 2: Distinctive Features of the Survey - Unlike previous reviews that focus on specific applications or basic definitions, this survey systematically analyzes world models based on modeling paradigms, mathematical forms, and key functionalities [10] - The article provides a clear technical classification of existing world models and covers their progress across multiple application scenarios, including robotics, autonomous driving, and scientific discovery [10][19] Group 3: Applications of World Models - World models are positioned as central to connecting perception, prediction, reasoning, and action in robotics, emphasizing their role in control loops and navigation [20] - In autonomous driving, world models are integrated into decision-making processes, enhancing predictive modeling and action-conditioned imagination [22] - The application of world models in scientific discovery is highlighted, showcasing their potential for long-term predictions and simulations in both social sciences and natural sciences [26] Group 4: Benchmarking and Evaluation - The article outlines the importance of benchmarking in evaluating world models, emphasizing that future assessments should consider generalization capabilities, causal reasoning, and long-term consistency [31] - A detailed comparison of various simulators and their functionalities is provided, illustrating the diversity of tools available for world model development [32] Group 5: Challenges and Future Directions - Key obstacles facing world models include long-term temporal consistency, causal reasoning, and the integration of physical and semantic constraints [34][35] - The article suggests that future research should focus on multi-modal large-scale pre-training, efficient data learning, and real-world deployment validation [35]
GAN之父Ian Goodfellow病后归来,剑指高效世界模型
机器之心· 2026-03-07 11:20
Core Viewpoint - Ian Goodfellow, known as the father of GANs, has re-emerged in discussions about AI, particularly focusing on the development of multimodal world models that can predict and plan actions in complex environments [1][6][20]. Group 1: Importance of World Models - World models represent how environments operate, including their dynamics and causal structures, and are essential for predicting and planning actions without direct interaction with the real world [8][9]. - The goal of constructing world models is to unlock significant economic value in AI capabilities and help automate undesirable tasks, emphasizing the need for understanding causal relationships in complex environments [12][22]. Group 2: Multimodal World Models - Multimodal world models integrate various sensory modalities beyond text, such as visual and auditory data, to create a more comprehensive understanding of the environment [11][12]. - The construction of these models raises critical questions about the purpose of the model and the availability of scalable data sources for training [11][17]. Group 3: Data Sources and Efficiency - Data is crucial for building effective models, with current pixel-based models lacking action-conditional capabilities due to a scarcity of data that records actions and their outcomes [18]. - Utilizing software abstractions to create synthetic worlds can enhance model training efficiency, allowing for better data utilization [18][19]. Group 4: Cognitive Tools and Symbolic Representations - Human cognitive tools, such as natural language and symbolic representations, enable more efficient abstraction and expression of causal relationships, which can improve model performance [15][19]. - These symbolic systems facilitate a data feedback loop that combines actions and observations, essential for training effective world models [19]. Group 5: Future Directions - The article suggests starting the construction of multimodal world models in digital environments, such as interactive media and games, which can provide scalable data collection and engagement incentives [20][22]. - The design of world models should focus on learning strategies that prioritize key environmental factors, ensuring consistency and realism in long-term predictions [22].
2026企业AI展望:三大新技术趋势
Sou Hu Cai Jing· 2026-02-26 09:00
Group 1 - In 2026, global AI spending is projected to reach $2.52 trillion, representing a 44% year-over-year growth, with IBM's generative AI business expected to exceed $12.5 billion by Q4 2025 [2] - AI is anticipated to transition from a bubble phase to a period of maturity and widespread application, similar to the role of PCs in modern productivity [2] - The emergence of Causal AI is highlighted, which focuses on understanding the causal relationships in decision-making processes, moving beyond traditional knowledge graphs [2][3] Group 2 - Causal AI is expected to enter the practical stage by 2026, with applications in various fields such as healthcare and social sciences, aiming to establish cause-and-effect relationships [3] - The integration of machine learning with causal reasoning is seen as a significant trend, enabling intelligent agents to test interventions and produce explainable decision outputs [4] - Major events like CES 2026 and NRF 2026 signal a shift towards AI's integration into physical products and services, indicating a new era of productivity tools [4] Group 3 - Companies are urged to focus on measurable improvements in productivity and safety for end-users, rather than abstract technological promises [5] - NRF 2026 emphasizes results-driven execution, with retailers deploying AI in scenarios that yield immediate and repeatable value, marking a shift from experimentation to execution [6] - The traditional approach of organizing data assets before AI deployment is being challenged, suggesting that AI can be implemented first to clean and organize data [7][9] Group 4 - Successful companies are adopting AI rapidly without waiting for perfect data, focusing on early implementation and iterative improvement [8][10] - The future will see a significant increase in technology investment from approximately 4% to 10% of revenue, driven by AI capabilities replacing traditional processes [10] - By 2026, AI applications are expected to become as ubiquitous as electricity, transforming traditional technology assets into AI-driven business value [11]
盛大科技战略调整聚焦AI新方向,股价历史波动引关注
Jing Ji Guan Cha Wang· 2026-02-13 22:39
Recent Events - The founder of Shengda Group, Chen Tianqiao, released an internal letter on February 6, 2026, outlining the technological direction of MiroMind, focusing on "discovery-based intelligence" and "general solvers," while avoiding the general chatbot arena [1] - The emphasis is on promoting AI development through causal reasoning and systematic innovation, which may indirectly impact Shengda Technology's business ecosystem [1] Stock Recent Trends - Historical data indicates that Shengda Technology's stock price experienced multiple fluctuations in December 2025, including a surge of 5.34% on December 29 and a drop of 5.00% on December 17 [2]
给机器人造一颗会思考的大脑,白惠源的“反共识”突围
财富FORTUNE· 2026-01-21 13:03
Core Viewpoint - The article emphasizes the need for robots to possess a "thinking" brain that understands the causal relationships of the world, rather than merely focusing on perfecting their physical forms. This perspective is articulated by Bai Huiyuan, the founder and CEO of Infiforce, who argues that the essence of embodied intelligence lies in the brain's ability to perceive and predict the physical world [1][2][3]. Group 1: Industry Context - In 2023, Bai Huiyuan left Alibaba to establish Infiforce amidst a competitive landscape where many companies were focused on hardware advancements, leading to a "body-making" arms race in the robotics industry [2]. - The robotics industry is currently characterized by a fascination with hardware, with companies competing on joint flexibility and human-like movements, while neglecting the cognitive capabilities of the robots [2][3]. - Infiforce aims to break this trend by focusing on developing a "thinking" brain that can adapt to various bodies and understand the physical world, rather than merely enhancing hardware specifications [3][12]. Group 2: Technological Approach - Infiforce's technological strategy involves a continuous learning model called Hyper-VLA combined with a causal world model, which contrasts with the mainstream AI approach that primarily relies on correlation [5][6]. - The existing AI models often depend on vast amounts of data for training, which is not feasible in the physical world, leading to issues of data scarcity and lack of robustness [6]. - Infiforce's approach integrates causal reasoning into its models, allowing robots to understand the implications of their actions, thus enhancing their decision-making capabilities in unfamiliar environments [6]. Group 3: Business Development - In 2025, Infiforce secured over 500 million yuan in commercial orders, signaling a significant milestone in the industry, although these orders are seen more as experimental partnerships rather than the launch of standardized products [8]. - The orders came from leading clients in various sectors, including cultural tourism, research, energy, and smart manufacturing, indicating a willingness to invest in the potential of robotics beyond mere demonstrations [8]. - Infiforce's AstroDroid AD series is transitioning from demonstration to pilot projects, where robots are actively engaging in real-world tasks, such as understanding visitor intentions in museums and performing household chores [8]. Group 4: Vision and Future Aspirations - Bai Huiyuan envisions Infiforce becoming an integral part of the robotics ecosystem, akin to "air" and "water," where the core intelligence of future robots will stem from Infiforce [13]. - The ultimate goal is to create robots that seamlessly integrate into human environments, making their intelligence so advanced that users forget they are interacting with machines [13].
开源8300小时标注数据,新一代实时通用游戏AI Pixel2Play发布
机器之心· 2026-01-17 03:24
Core Insights - The article discusses the advancements in AI models for gaming, particularly focusing on the Pixel2Play (P2P) model developed by researchers at Player2, which aims to enhance AI's performance in real-time gaming environments [2][5]. Group 1: Model Development - The P2P model utilizes game visuals and text instructions as inputs to generate corresponding keyboard and mouse operation signals, achieving over 20Hz end-to-end inference speed on consumer-grade RTX 5090 graphics cards [2]. - P2P has been trained on over 40 games, totaling more than 8300 hours of gameplay data, and can play multiple games on Roblox and Steam in a zero-shot manner [2]. - The model employs a lightweight framework and is built from scratch, featuring a decoder Transformer and a lightweight action-decoder to enhance inference speed by five times [10]. Group 2: Training Data and Open Source - High-quality "visual-action" data is scarce online, prompting the Open-P2P project to open-source all training datasets to fill this gap [5][3]. - The training data includes game images, text instructions, and precise keyboard and mouse operation annotations, which are crucial for training effective game AI models [8][5]. Group 3: Model Evaluation - P2P has been evaluated using four different model sizes, with parameters ranging from 150M to 1.2B, achieving inference speeds of 80Hz for the 150M model and 40Hz for the 1.2B model [12]. - In human evaluations, the 1.2B model showed a preference rate of 80%, 83%, and 75% over smaller models in various games, indicating superior performance [13]. - The model's ability to follow text instructions significantly improved its success rate in tasks, demonstrating strong understanding and execution capabilities [15]. Group 4: Causal Reasoning - The article highlights the challenge of causal confusion in behavior cloning, particularly in high-frequency interaction environments, and notes that increasing model size and training data can enhance the model's understanding of causal relationships [17]. - As training data and model parameters increase, the P2P model's performance in causal inference assessments shows a positive trend [19].
索菱股份涨幅10.00%封板!智能驾驶业务成炒作核心,英伟达开源AI模型催化智驾赛道升温
Jin Rong Jie· 2026-01-15 02:13
Group 1 - The core viewpoint of the news highlights the significant market interest in intelligent driving technologies, particularly focusing on companies like索菱股份, which has shown a 10% increase in stock price due to its advancements in the field [1][3] -索菱股份 has accumulated substantial experience in research and delivery in the field of assisted driving, with successful mass production cases and driving data in L2 and L2++ products, and feasible technical solutions for L3 autonomous driving [1] - The company serves a diverse customer base that includes both passenger and commercial vehicles, covering various models such as sedans, SUVs, MPVs, and light trucks [1] Group 2 - NVIDIA has launched the Alpamayo series of open-source AI models, simulation tools, and datasets aimed at enhancing the development of safe and reliable inference-based assisted driving vehicles [2] - The Alpamayo initiative introduces a complete open-source closed loop that incorporates "causal reasoning" capabilities into the autonomous driving research paradigm, significantly lowering the barriers to industry implementation [2] - The open-source framework includes a high-fidelity end-to-end simulation environment and a diverse large-scale dataset with over 1,700 hours of driving data, which supports rapid validation and strategy optimization for autonomous driving [2]
用“因果规划”解决多智能体协作中的任务依赖难题|港科广&腾讯
量子位· 2025-09-03 05:49
Core Viewpoint - The article discusses the challenges faced by traditional single-agent systems in long-cycle, multi-step collaborative tasks, highlighting the need for a distributed agent framework with global planning and causal dependency management capabilities [1][2]. Group 1: CausalMACE Method - The CausalMACE method is proposed by a research team from Hong Kong University of Science and Technology and Tencent, integrating causal reasoning mechanisms into open-world multi-agent systems to provide scalable engineering solutions for complex task collaboration [2]. - The method includes a "global causal task graph" concept, allowing AI to learn "if-then" logic, enabling dynamic adjustments and clear division of labor among agents [5][6]. Group 2: Framework Components - The CausalMACE framework consists of three main components: Judger, Planner, and Worker [7]. - Judger ("裁判") verifies the legality of actions in real-time and provides feedback on success or failure, ensuring all agents operate under the same game rules [11]. - Planner ("总工") breaks down complex tasks into smaller sub-tasks and creates a rough flowchart based on game rules, refining it through causal reasoning to ensure task dependencies remain valid [12][14]. - Worker ("调度室") utilizes depth-first search to split the causal graph into multiple production lines, calculating a "busy index" for real-time task reassignment among agents [16]. Group 3: Experimental Results - The experimental results indicate that CausalMACE significantly enhances both completion rates and efficiency in benchmark tasks such as construction, cooking, and escape rooms, achieving up to a 12% increase in task completion rates and a maximum efficiency improvement of 1.5 times compared to baseline methods [17]. - In the VillagerBench benchmark tasks, CausalMACE outperformed AgentVerse and VillagerAgent across various metrics, demonstrating its effectiveness in multi-agent collaboration [18]. Group 4: Author Information - The lead author of the paper is Professor Wang Hao, an assistant professor and doctoral supervisor at Hong Kong University of Science and Technology (Guangzhou), with a research background in generative AI models and 3D reconstruction [19][20].
具身领域LLM结合强化学习与世界模型工作汇总
具身智能之心· 2025-07-29 06:15
Core Viewpoint - The article discusses recent advancements in the field of embodied intelligence, particularly focusing on the integration of large language models (LLMs) with reinforcement learning and world models, highlighting several notable research papers from 2024 [2][3]. Group 1: UniSim - UniSim aims to learn general real-world interactive simulators through generative modeling, revealing that natural datasets can provide diverse advantages for learning simulators [3]. - The research demonstrates that integrating various datasets allows for the simulation of high-level commands and low-level controls, enabling zero-shot application in real-world scenarios [3]. Group 2: Robust Agents - The study from Google DeepMind asserts that causal reasoning is essential for robust and general AI, concluding that agents capable of satisfying regret bounds must learn approximate causal models [5]. - This finding has significant implications for transfer learning and causal inference [5]. Group 3: MAMBA - MAMBA introduces an efficient world model approach for meta-reinforcement learning, addressing sample efficiency issues prevalent in current methods [8]. - The framework shows a remarkable improvement in sample efficiency, achieving up to 15 times better performance in high-dimensional tasks [8]. Group 4: EMMA - EMMA leverages LLMs trained in text-based worlds to guide the training of visual world agents, enhancing their ability to interact with dynamic environments [10]. - The approach results in a significant success rate improvement of 20%-70% in diverse tasks compared to existing VLM agents [10]. Group 5: Text2Reward - The Text2Reward framework automates the generation of dense reward functions using LLMs, addressing the challenges of reward function design in reinforcement learning [13][14]. - The method demonstrates superior performance in 13 out of 17 tasks, achieving over 94% success in new motion behaviors [14]. Group 6: Online Continual Learning - The research proposes two frameworks for continuous learning in interactive instruction-following agents, emphasizing the need for agents to learn incrementally as they explore their environments [17][18]. - A confidence-aware moving average mechanism is introduced to update parameters without relying on task boundary information [18]. Group 7: AMAGO - AMAGO is a scalable contextual reinforcement learning framework that addresses challenges in generalization, long-term memory, and meta-learning [21]. - The framework allows for parallel training of long-sequence transformers, enhancing scalability and performance in complex tasks [21]. Group 8: PDDL-based Planning - The study presents a novel paradigm for task planning using pre-trained LLMs, focusing on building explicit world models through PDDL [22][23]. - The framework significantly reduces the need for human intervention by allowing LLMs to convert between PDDL and natural language, facilitating efficient model correction [23].
概率统计机制下,LLM 推理真的「理解世界了」吗?
机器之心· 2025-06-21 06:32
Group 1 - The article discusses whether LLMs (Large Language Models) truly "understand the world" or if their reasoning is merely a form of pattern matching, highlighting the debate within the industry regarding the nature of LLM reasoning capabilities [1][3][4] - It references a paper from Apple suggesting that current reasoning models do not genuinely think, but rather engage in pattern matching, which has sparked renewed discussion in the AI community [3][4] - The article mentions that true reasoning involves understanding causal relationships, as emphasized by various researchers, indicating that LLMs lack a causal framework necessary for deep and flexible reasoning [5][6][7] Group 2 - The article explores the motivations behind enterprises increasing their spending on generative AI, noting a shift from building in-house solutions to purchasing third-party AI applications [1][2] - It outlines the evaluation framework for selecting AI models, which includes key factors that influence procurement decisions in the context of traditional software purchasing characteristics [1][2]