World Model - filings, earnings calls, financial reports, news - Reportify

World Model

Search documents

杨立昆批评Meta的AI战略，称LLM不是通往人类水平智能的途径；夸克全面接入千问对话助手，将发布全新AI浏览器丨AIGC日报

创业邦· 2025-11-19 00:12

Group 1 - Ant Group launched a multimodal AI assistant named "Lingguang" on November 18, which can generate small applications in natural language within 30 seconds on mobile devices. It supports various output formats including 3D, audio, video, charts, animations, and maps, and is available on both Android and Apple app stores [2] - Jeff Bezos founded a new AI startup called "Project Prometheus," which has raised $6.2 billion in funding, including contributions from Bezos himself. The company has nearly 100 employees, including researchers from Meta, OpenAI, and Google DeepMind. Elon Musk responded to this development by calling Bezos a "copycat" [2] - Quark app fully integrated with the Qianwen dialogue assistant, positioning itself as an AI browser. A major version upgrade for the PC version is also expected, enhancing its collaboration with the Qianwen app [2] - Notable Apple designer Abidur Chowdhury has left the company to join an AI startup, causing significant internal reactions due to his importance in the design team [2] - Yang Likun, former chief AI scientist at Meta, criticized the company's AI strategy, arguing that large investments in large language models (LLMs) are misguided. He believes that true breakthroughs in AI will come from "world models" rather than relying solely on visual data [3]

Artificial Intelligence

Large Language Model

Artificial Intelligence

Artificial Intelligence

Large Language Model

Artificial Intelligence

Emergent Behavior in Autonomous Driving with Wayve CEO Alex Kendall

Sequoia Capital· 2025-11-18 17:01

Reasoning in the physical world can be really well expressed as a world model. In 2018, we put our very first world model approach on the road. It was a very small 100,000 parameter neural network that could simulate a 30x3 pixel image of a road in front of us.But we were able to use it as this internal simulator to train a modelbased reinforcement learning algorithm. Fast forward to today and we've developed a GIA. It's a full generative world model that's able to simulate multiple camera and sensors and v ...

Autonomous Driving

Emergent Behavior

Reinforcement Learning

Autonomous Driving

Emergent Behavior

Reinforcement Learning

李飞飞最新播客：从洞穴实验理解世界模型｜Jinqiu Select

锦秋集· 2025-11-17 08:43

Core Insights - The essence of AI is not "artificial" but an extension of "intelligence," enhancing human understanding of the world [3][11] - The concept of "world models" is crucial for advancing AI, particularly in spatial and visual understanding beyond language models [4][39] - The development of AI has transitioned from skepticism to widespread acceptance, with companies now identifying as AI firms [9][30] Group 1: AI Development and Historical Context - AI's evolution has been marked by significant milestones, including the creation of ImageNet, which provided a vast dataset for training models [6][23] - The combination of big data, neural networks, and GPUs has been pivotal in the modern AI landscape, leading to breakthroughs like ChatGPT [24][25] - The early days of AI were characterized by limited public interest and funding, with a resurgence occurring in the last decade [9][19] Group 2: World Models and Their Importance - World models are foundational capabilities that enable reasoning, interaction, and world creation, essential for both AI and robotics [40][41] - The development of world models aims to bridge the gap between language understanding and spatial intelligence, enhancing AI's ability to operate in real-world scenarios [39][43] - The recent launch of Marble, a product that generates navigable 3D worlds, exemplifies the application of world models in various fields, including gaming and virtual production [53][60] Group 3: Challenges in Robotics - The "Bitter Lesson" suggests that simple models with large datasets outperform complex models with limited data, but this principle faces challenges in robotics due to data scarcity [45][47] - Robotics requires not only advanced algorithms but also physical systems and real-world applications, complicating the training process [48][49] - The current state of robotics is still experimental, with significant hurdles remaining before achieving desired capabilities [47][50] Group 4: Future Directions and Innovations - Continuous innovation is necessary for AI to reach new heights, as current models still lack capabilities like abstract reasoning and emotional intelligence [35][36] - The focus on spatial intelligence and world modeling is expected to drive future advancements in AI, particularly in enhancing human-machine collaboration [39][44] - The integration of AI into various sectors, including psychology and design, highlights its potential to transform industries and improve efficiency [60][61]

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

理想詹锟ICCV'25讲世界模型从数据闭环到训练闭环PPT

理想TOP2· 2025-10-28 15:18

Core Insights - The article discusses the evolution of autonomous driving technology, emphasizing the transition from data closed-loop systems to training closed-loop systems, which focus on real-world utility and evaluation of progress [13][14]. Group 1: Data and Infrastructure - The company has accumulated 1.5 billion kilometers of driving data, which is crucial for training autonomous systems [8]. - A closed-loop data system is in place, utilizing over 200 trigger data points for training datasets, with clips ranging from 15 to 45 seconds [8]. - The data scaling law indicates a significant increase in the number of clips used for training, with projections showing up to 600 million clips by 2025 [10]. Group 2: Technology Stack - The key technology stack for autonomous driving includes regional-scale simulation, synthetic data, reinforcement learning, and multimodal generation [18]. - The focus is on enhancing simulation quality through advanced techniques like scene reconstruction and traffic agent modeling [18][19]. - The transition from reconstruction to generation in simulation is highlighted, utilizing diffusion models for improved scene generation [19]. Group 3: Training and Evaluation - The article emphasizes the importance of building a training closed-loop that integrates various models, including VLA (Vision-Language Alignment) and reinforcement learning [15]. - The evaluation environment and reward systems are critical for assessing the performance of autonomous driving systems [14][35]. - Interactive agents are identified as a key challenge in the training closed-loop, necessitating accurate feedback and generalization ability [38][40]. Group 4: Future Directions - The company is working on various projects aimed at enhancing both reconstruction and generation capabilities, with milestones set for 2024 and 2025 [21][24]. - The application of generated data includes scene editing, scene transfer, and scene generation, which are essential for improving the realism of simulations [27][33].

Autonomous Driving

Data Closed-loop

Training Closed-loop

Autonomous Driving

Data Closed-loop

Training Closed-loop

自动驾驶论文速递！VLA、世界模型、强化学习、轨迹规划等......

自动驾驶之心· 2025-10-18 04:00

Core Insights - The article discusses advancements in autonomous driving technologies, highlighting various research contributions and their implications for the industry. Group 1: DriveVLA-W0 - The DriveVLA-W0 training paradigm enhances the generalization ability and data scalability of VLA models by using world modeling to predict future images, achieving 93.0 PDMS and 86.1 EPDMS on NAVSIM benchmarks [6][12] - A lightweight Mixture-of-Experts (MoE) architecture reduces inference latency to 63.1% of the baseline VLA, meeting real-time deployment needs [6][12] - The data scaling law amplification effect is validated, showing significant performance improvements as data volume increases, with a 28.8% reduction in ADE and a 15.9% decrease in collision rates when using 70M frames [6][12] Group 2: CoIRL-AD - The CoIRL-AD framework combines imitation learning and reinforcement learning within a latent world model, achieving an 18% reduction in collision rates on the nuScenes dataset and a PDMS score of 88.2 on the Navsim benchmark [13][16] - The framework integrates RL into an end-to-end autonomous driving model, addressing offline RL's scene expansion issues [13][16] - A decoupled dual-policy architecture facilitates structured interaction between imitation learning and reinforcement learning, enhancing knowledge transfer [13][16] Group 3: PAGS - The Priority-Adaptive Gaussian Splatting (PAGS) framework achieves high-quality real-time 3D reconstruction in dynamic driving scenarios, with a PSNR of 34.63 and SSIM of 0.933 on the Waymo dataset [23][29] - PAGS incorporates semantic-guided pruning and regularization to balance reconstruction fidelity and computational cost [23][29] - The framework demonstrates a rendering speed of 353 FPS with a training time of only 1 hour and 22 minutes, outperforming existing methods [23][29] Group 4: Flow Planner - The Flow Planner achieves a score of 90.43 on the nuPlan Val14 benchmark, marking the first learning-based method to surpass 90 without prior knowledge [34][40] - It introduces fine-grained trajectory tokenization to enhance local feature extraction while maintaining motion continuity [34][40] - The architecture employs adaptive layer normalization and scale-adaptive attention to filter redundant information and strengthen key interaction information extraction [34][40] Group 5: CymbaDiff - The CymbaDiff model defines a new task for sketch-based 3D outdoor semantic scene generation, achieving a FID of 40.74 on the Sketch-based SemanticKITTI dataset [44][47] - It introduces a large-scale benchmark dataset, SketchSem3D, for evaluating 3D semantic scene generation [44][47] - The model employs a Cylinder Mamba diffusion mechanism to enhance spatial coherence and local neighborhood relationships [44][47] Group 6: DriveCritic - The DriveCritic framework utilizes vision-language models for context-aware evaluation of autonomous driving, achieving a 76.0% accuracy in human preference alignment tasks [55][58] - It addresses limitations of existing evaluation metrics by focusing on context sensitivity and human alignment in nuanced driving scenarios [55][58] - The framework demonstrates superior performance compared to traditional metrics, providing a reliable solution for human-aligned evaluation in autonomous driving [55][58]

Reinforcement Learning

3D Reconstruction

Trajectory Planning

Autonomous Driving

Autonomous Driving

Reinforcement Learning

3D Reconstruction

Trajectory Planning

Autonomous Driving

Autonomous Driving

X @Demis Hassabis

Demis Hassabis· 2025-10-09 21:44

Product Innovation - Google DeepMind's Genie 3, a world model capable of generating interactive environments from text or image prompts, is recognized as one of TIME's Best Inventions of 2025 [1][2] - Genie 3 enables the creation of entire playable worlds from a single image or text prompt [1] Company Recognition - Google DeepMind expresses pride in the Genie 3 team for its recognition by TIME [1] - Google DeepMind announces Genie 3 has been named one of TIME's Best Inventions of 2025 [1]

Artificial Intelligence

Artificial Intelligence

首个代码世界模型引爆AI圈，能让智能体学会「真推理」，Meta开源

具身智能之心· 2025-09-26 00:04

Core Viewpoint - The article discusses the introduction of the Code World Model (CWM) by Meta, which represents a significant evolution in AI models aimed at improving code generation through world modeling techniques [2][5][31]. Group 1: Model Overview - CWM is a 32 billion parameter open-weight large language model (LLM) designed to enhance code generation research based on world models [7][12]. - It features a dense, decoder-only architecture with a context length of up to 131k tokens, demonstrating strong performance in general programming and mathematical tasks [8][9]. Group 2: Performance Metrics - CWM achieved notable scores in various benchmarks: SWE-bench Verified (pass@1 65.8%), LiveCodeBench (68.6%), Math-500 (96.6%), and AIME 2024 (76.0%) [8][23]. - In comparison to other models, CWM's performance is competitive, particularly in the 30B parameter range [9][30]. Group 3: Training Methodology - The model was trained using extensive observational-action trajectories in a Python interpreter and agent-based Docker environment, focusing on improving code understanding beyond static code training [12][22]. - Meta has made available checkpoints from the mid-training, SFT, and reinforcement learning phases to support further research [13]. Group 4: Research Implications - CWM serves as a robust testing platform to explore the potential of world modeling in enhancing reasoning and planning capabilities in code generation [15][31]. - The research indicates that world models can benefit agent-based coding by allowing for stepwise simulation of Python code execution, which enhances reasoning from such simulations [16][31]. Group 5: Future Directions - Meta envisions that the code world model will bridge the gap between linguistic reasoning and executable semantics, with ongoing research needed to fully leverage its advantages across tasks [31]. - The model aims to improve reinforcement learning by enabling agents familiar with environmental dynamics to focus on learning actions that yield rewards [31].

Meta Platforms(US:META)

Artificial Intelligence

Code World Model (CWM)

Artificial Intelligence

Code World Model (CWM)

人形机器人考察要点_市场展望、组件与具身人工智能-Humanoid Robot tour takeaways_ market outlook, components and embodied AI

2025-09-18 13:09

Summary of Conference Call Notes on Greater China Industrials (Humanoid Robots and Autonomous Driving) Industry Overview - The humanoid robot and autonomous driving (AD) sectors in China are expected to experience rapid expansion over the next decade, with significant growth anticipated in factory settings within 2-3 years and further opportunities in commercial and household applications in the long term [1][1] - The current bill of materials (BOM) cost for a fully-functional humanoid robot is approximately US$50-60k, with expectations for rapid cost reductions in the next five years due to improved product design and economies of scale [1][1] - Stricter regulations in the AD sector are anticipated to create more opportunities for AD components, particularly for LiDAR technology, which will benefit from new long-distance object detection requirements [1][1] Key Players and Developments Dobot - Dobot is a leading global collaborative robot (COBOT) brand, achieving a 47% year-over-year growth in 6-axis COBOT sales in the first half of 2025, indicating market share gains [8][8] - The company has entered the humanoid robot market, launching its first prototype in early 2025 and planning deployment in manufacturing and business scenarios [9][9] RoboSense - RoboSense is focusing on its new EMX LiDAR products, which offer superior precision and detection distance compared to competitors, with expectations to ship 600-700k units in 2025 and 1.5 million units in 2026 [10][10] - The company is also exploring opportunities in the lawn mower, unmanned delivery, and robotaxi industries, with significant partnerships established [11][11] Zhaowei Machinery & Electronics - Zhaowei has launched new dexterous hand models for humanoid robots and aims for a 10-15% global market share in this segment [12][12][13][13] - The BOM cost of the dexterous hand is estimated to account for 20-30% of the total BOM cost of a humanoid robot [13][13] Googol Technology - Googol Technology specializes in high-end control systems for advanced manufacturing and sees strong growth potential in humanoid robots due to its expertise in multi-degree-of-freedom (DoF) controlling [14][15] Minieye - Minieye is making progress with its smart driving solutions, including iPilot and iRobo, and anticipates significant growth in the penetration of front-view camera modules and driver monitoring systems due to new safety regulations [16][17] Leju Robotics - Leju targets to deliver over 1,000 units of robotics in 2025, focusing on stability and durability for large-scale applications [18][18] Orbbec - Orbbec is a leading player in robot vision systems, holding over 70% market share in 3D vision systems for service robots in China [21][21][22][22] UBTECH - UBTECH aims to ship 500 humanoid robots in 2025 and 2,000-3,000 units in 2026, with expectations for BOM cost reductions in the coming years [23][23][24][24] LK Tech - LK Tech is focusing on magnesium alloy technology for humanoid robots, which offers lightweighting and other advantages, and has signed cooperation agreements for R&D projects [25][26][26] Technology Insights - The competition between VLA (Vision-Language-Action) and world model technologies for embodied AI is highlighted, with data availability being a key bottleneck [3][3] - The vision system of humanoid robots is evolving, with depth cameras becoming the mainstream choice for enhancing sensing and navigation capabilities [22][22] Market Outlook - The humanoid robot market is expected to grow significantly, with projections of 3 million units shipped by 2030, leading to substantial opportunities for component suppliers [13][13] - The average selling price (ASP) of humanoid robots is expected to decline to approximately RMB150k (~US$20k) by 2026-2028 due to scale effects [20][20] Conclusion - The humanoid robot and AD sectors in Greater China are poised for significant growth, driven by technological advancements, regulatory changes, and increasing market demand. Key players are actively innovating and expanding their product offerings to capture market share in this rapidly evolving landscape.

SIASUN(SZ:300024)

Autonomous Driving

Autonomous Driving

X @Demis Hassabis

Demis Hassabis· 2025-08-24 02:15

AI Development & Innovation - AI can now be trained within another AI, indicating a significant advancement in AI training methodologies [1] - The world model, Genie 3, can imagine and generate new worlds dynamically, showcasing its advanced simulation capabilities [1] - An embodied agent, Sima, can autonomously navigate these AI-generated environments, demonstrating progress in embodied intelligence [1] - The entire environment-to-action loop is now generated by AI, highlighting the potential for fully AI-driven training simulations [1] - The industry anticipates the development of world simulators for training general embodied intelligence, suggesting future research directions [1]

Artificial Intelligence

Embodied Intelligence

Artificial Intelligence

Embodied Intelligence

又有很多自动驾驶工作中稿了ICCV 2025，我们发现了一些新趋势的变化...

自动驾驶之心· 2025-08-16 00:03

Core Insights - The article discusses the latest trends and research directions in the field of autonomous driving, highlighting the integration of multimodal large models and vision-language action generation as key areas of focus for both academia and industry [2][5]. Group 1: Research Directions - The research community is concentrating on several key areas, including the combination of MoE (Mixture of Experts) with autonomous driving, benchmark development for autonomous driving, and trajectory generation using diffusion models [2]. - The closed-loop simulation and world models are emerging as critical needs in autonomous driving, driven by the limitations of real-world open-loop testing. This approach aims to reduce costs and improve model iteration efficiency [5]. - There is a notable emphasis on performance improvement in object detection and OCC (Occupancy Classification and Counting), with many ongoing projects exploring specific pain points and challenges in these areas [5]. Group 2: Notable Projects and Publications - "ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation" is a significant project from Huazhong University of Science and Technology and Xiaomi, focusing on integrating vision and language for action generation in autonomous driving [5]. - "All-in-One Large Multimodal Model for Autonomous Driving" is another important work from Zhongshan University and Meituan, contributing to the development of comprehensive models for autonomous driving [6]. - "MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding" from Chongqing University aims to enhance understanding of driving scenarios through multimodal analysis [8]. Group 3: Simulation and Reconstruction - The project "Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images" from TUM focuses on advanced reconstruction techniques for autonomous driving [14]. - "CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving" from Fraunhofer IVI and TU Munich is another notable work that addresses dynamic scene reconstruction [16]. Group 4: Trajectory Prediction and World Models - "Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics" from Hong Kong University of Science and Technology and Didi emphasizes the importance of trajectory prediction in autonomous driving [29]. - "World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model" from the Chinese Academy of Sciences focuses on developing a comprehensive world model for autonomous driving [32].

Multimodal Large Model

Closed-loop Simulation

OCC and Detection

Autonomous Driving

Multimodal Large Model

Closed-loop Simulation

OCC and Detection

Autonomous Driving