Workflow
世界模型
icon
Search documents
把“会跑的代码世界”装进AI,Meta重磅开源首个代码世界模型:让AI像程序员一样思考
3 6 Ke· 2025-09-25 13:02
Core Insights - Meta's FAIR team has launched the Code World Model (CWM), a large language model (LLM) with 32 billion parameters and a context length of up to 131k tokens, aimed at integrating "world model" concepts into code generation and reasoning [1][2][3] - CWM is designed to not only write code but also simulate code execution, reason about program states, and self-detect and fix bugs, enhancing the model's understanding of code execution [2][3] Training Phases - The training of CWM is divided into three main phases: - Pre-training with 8 trillion tokens, where approximately 30% are code-related [3][4] - Mid-training, which incorporates 5 trillion tokens of world modeling data, extending the context length to 131k tokens [4][6] - Post-training (SFT + RL), involving 100 billion tokens for instruction and reasoning capabilities, followed by large-scale multi-task reinforcement learning with 172 billion tokens [4][10] Data Utilization - CWM's world model capabilities are driven by two main types of data during mid-training: - Execution traces from Python, which help the model learn how code execution alters local states [6][8] - Interaction trajectories from an automated agent that executes tasks in a repository, collecting around 3 million trajectories from 10.2k images and 3.15k repositories [9] Performance Metrics - In benchmark tests, CWM demonstrated strong performance, achieving 65.8% pass@1 on SWE-bench Verified with Test-Time-Scaling enabled, and notable results on LiveCodeBench (68.6%), Math-500 (96.6%), and AIME 2024 (76.0%) [10][12] - CWM's performance is competitive with larger or closed-source LLMs, nearing GPT-4 levels, although it has limitations in certain editing formats and multi-language scenarios [12] Industry Reception - The release of CWM has garnered significant attention, with Meta's AI researchers actively promoting it, highlighting its potential impact on software development [13][15] - While the open-sourcing of CWM's training checkpoints is praised for its utility in academic and engineering replication, there are concerns regarding the model's computational demands and the need for practical testing in real development environments [15]
代码生成要变天了?被质疑架空后,Yann LeCun携320亿参数开源世界模型“杀回来了”
AI前线· 2025-09-25 08:04
Core Viewpoint - The article discusses the release of the Code World Model (CWM) by Meta, which aims to enhance code generation capabilities by integrating a deeper understanding of code execution, addressing the limitations of previous models that could generate syntactically correct code but failed in execution [4][10]. Group 1: Model Overview - CWM is the first open-source code world model with 32 billion parameters, designed to advance code generation research based on world models [4][5]. - Unlike traditional models that rely on static code training, CWM incorporates dynamic interaction data from Python interpreters and Docker environments to improve its understanding and reasoning about code [7][14]. - The model can simulate the step-by-step execution of code, understanding how variables change and what feedback the program receives [7][10]. Group 2: Performance Metrics - CWM achieved a score of 65.8% on the SWE-bench Verified task, outperforming all other open-source models of similar size and nearing GPT-4 levels [8]. - It scored 68.6% on LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024, showcasing its strong performance across various benchmarks [8]. Group 3: Training Methodology - The training of CWM involved three key phases: pre-training, mid-training, and post-training, utilizing supervised fine-tuning (SFT) and reinforcement learning (RL) [15][16]. - The model was pre-trained on 8 trillion tokens, followed by mid-training on code world modeling data with an additional 5 trillion tokens, enhancing its contextual understanding [15][16]. Group 4: Industry Context and Implications - The release of CWM marks a significant step in Meta's AI strategy, especially following the restructuring of its AI business [5][23]. - The model's development reflects a shift towards balancing open-source initiatives with commercial interests, as Meta navigates its AI strategy amidst organizational changes [26].
汽车业AI“狂飙”,“轮式智能生命体”即将到来
Hua Xia Shi Bao· 2025-09-25 07:58
Core Insights - The automotive industry is on the brink of a significant transformation driven by artificial intelligence, moving from traditional vehicles to "intelligent wheeled life forms" that can interact with users and adapt to their needs [1][2][4] Industry Trends - The "Global AI Technology Conference" coincided with the release of the State Council's document on deepening the integration of AI with the real economy, setting a target for significant advancements by 2027 [2] - Industry leaders emphasize the need to shift focus from hardware specifications and price wars to creating vehicles that can think, learn, and collaborate within smart city traffic networks [2][4] Technological Developments - Discussions at the conference highlighted the importance of AI in transforming the automotive landscape, with leaders proposing that future vehicles will communicate with traffic systems to optimize travel efficiency [4][6] - The report released by the Automotive Home Research Institute identified five core trends shaping the future of China's electric vehicle market, including the widespread adoption of advanced driver-assistance systems and the emergence of RoboTaxi services [6][7] Consumer Behavior Changes - A significant shift in consumer perception has occurred, with the percentage of users viewing "intelligence" as the core advantage of electric vehicles rising from 30% to 73% over three years [7] - The relationship between consumers and vehicles is evolving into a "two-way selection" process, where consumers will demand rigorous testing of vehicles' intelligent features before purchase [8] Safety and Ethical Considerations - Despite advancements, a report indicated that 85% of tested vehicles required human intervention during assisted driving, highlighting the critical need for safety and reliability in AI-driven vehicles [8] - Industry leaders have called for a focus on core technological breakthroughs, expanding from "single vehicle intelligence" to "industry-wide intelligence," while maintaining safety and ethical standards [8][9] Company Strategies - Automotive Home is leveraging its data assets and self-developed models to enhance both consumer and business services, aiming for a dual upgrade in user experience and ecosystem services [9] - The integration of AI into vehicles is seen as essential for modern automotive products, positioning it as a critical competitive advantage for all market participants [9]
周鸿祎:语言是最重要的,语言掌握了就一通百通
Xin Lang Ke Ji· 2025-09-24 05:09
Core Insights - The discussion between Luo Yonghao and Zhou Hongyi emphasizes the importance of language in understanding and developing world models in artificial intelligence [1] - Zhou Hongyi critiques the focus on world models by figures like Yang Lequn from Meta and Li Feifei, arguing that the key to progress in AI lies in comprehending language [1] - The recent launch of Google's product "nano banana" showcases advancements in understanding graphics that surpass mere visual perception, integrating extensive knowledge [1] Summary by Categories Language and AI Development - Zhou Hongyi asserts that language is crucial for communication, knowledge transfer, logical reasoning, and world description, which are essential for creating effective world models [1] - The lack of progress in AI is attributed to a failure to grasp the significance of language, which serves as a key to understanding human knowledge and reasoning [1] Technological Advancements - The introduction of Google's "nano banana" product is highlighted as a significant breakthrough, demonstrating enhanced graphic understanding that integrates knowledge beyond visual capabilities [1] - The advancements in various models, including music, video, and visual models, are linked to breakthroughs in language comprehension [1]
打算招聘几位大佬共创平台(4D标注/世界模型/VLA等方向)
自动驾驶之心· 2025-09-23 23:32
Core Viewpoint - The article discusses the recruitment of business partners for the autonomous driving sector, emphasizing the need for expertise in various advanced technologies and offering attractive incentives for potential candidates [2][3][5]. Group 1: Recruitment Details - The company plans to recruit 10 outstanding partners for autonomous driving-related course development, paper guidance, and hardware research [2]. - Candidates with expertise in areas such as large models, multimodal models, diffusion models, and 3D target detection are particularly welcome [3]. - Preferred qualifications include a master's degree or higher from universities ranked within the QS200, with priority given to candidates who have published in top conferences [4]. Group 2: Incentives and Opportunities - The company offers resource sharing related to autonomous driving, including job recommendations, PhD opportunities, and study abroad guidance [5]. - Attractive cash incentives and opportunities for collaboration on entrepreneurial projects are part of the recruitment package [5].
3DGS重建!gsplat 库源码解析
自动驾驶之心· 2025-09-23 23:32
Core Insights - The article discusses the implications of OpenAI's new video generation model, Sora, on computer graphics, particularly in relation to 3D Gaussian Splatting (3DGS) and its potential to replace traditional rendering techniques [7][8]. Group 1: 3D Gaussian Splatting (3DGS) - 3DGS is highlighted as a significant area of research, with ongoing developments in its application for self-driving perception and scene reconstruction [4][9]. - The gsplat library is recommended for its better documentation and maintenance compared to the original Gaussian Splatting library, indicating a preference for more user-friendly resources in the field [5]. - The article mentions the potential for 3DGS to integrate with other technologies, such as NeRF (Neural Radiance Fields), to enhance video generation and scene understanding [4][9]. Group 2: Technical Aspects of Sora and 3DGS - Sora's capabilities are positioned as a potential game-changer in computer graphics, with the possibility of it being recognized as a foundational technology in the field [6][7]. - The article outlines various technical components of 3DGS, including the use of Gaussian parameters, covariance matrices, and the importance of camera coordinate transformations [21][22][30]. - The compression capabilities of gsplat are noted, with the ability to reduce Gaussian parameters significantly while maintaining quality, which is crucial for efficient rendering [13][14]. Group 3: Future Prospects and Community Engagement - The article expresses optimism about the broader application of "world models" in video generation and scene reconstruction, suggesting that even smaller players in the industry could benefit from advancements in these technologies [9]. - The community around autonomous driving and related technologies is emphasized, with numerous technical groups and resources available for learning and collaboration [78].
AI技术未来发展趋势预测
Sou Hu Cai Jing· 2025-09-21 13:31
Group 1: Technological Breakthroughs - The emergence of native multimodal large models will replace piecemeal multimodal systems, achieving a 300% improvement in inference efficiency through deep integration of text, images, audio, and 3D data [1] - The acceleration of world models will establish a core technology foundation for embodied intelligence by 2025 [1] - The training paradigm will shift towards post-training scaling laws, optimizing reinforcement learning to reduce computational power consumption by 50% [4] Group 2: Industry Restructuring Trends - AI agents will provide hyper-personalized product customization, increasing customer satisfaction by 40% [6] - Real-time decision systems will enhance the speed of market response by three times in logistics and marketing [6] - The penetration of humanoid robots in industrial scenarios will achieve millimeter-level control precision, with smart factory coverage exceeding 80%, reducing manufacturing R&D cycles by 28.4% [6] Group 3: Social Integration Challenges - "Responsible AI" will become a mandatory standard, with non-compliant companies facing regulatory penalties and user attrition risks [8] - The automation rate of repetitive jobs will exceed 30%, while demand for creative and emotionally interactive roles will grow by 200% [8] - New mechanisms for privacy and copyright will emerge, with blockchain-enabled AI data rights technology addressing content ownership disputes [8] Group 4: Future Milestones - By 2027, general artificial intelligence (AGI) is expected to pass the Turing test in closed environments, and by 2030, neuromorphic chips will achieve a 1000-fold increase in energy efficiency [12] - By 2035, AI is projected to contribute over 40% to global GDP growth [12]
打算招聘几位大佬共创平台(世界模型/VLA等方向)
自动驾驶之心· 2025-09-21 06:59
Group 1 - The article announces the recruitment of 10 partners for the autonomous driving sector, focusing on course development, paper guidance, and hardware research [2] - The recruitment targets individuals with expertise in various advanced technologies such as large models, multimodal models, and 3D target detection [3] - Candidates from QS200 universities with a master's degree or higher are preferred, especially those with significant conference contributions [4] Group 2 - The compensation package includes resource sharing for job seeking, PhD recommendations, and study abroad opportunities, along with substantial cash incentives [5] - The company encourages potential partners to reach out via WeChat for collaboration inquiries, specifying the need to mention their organization or company [6]
无需训练的世界模型?西湖大学WorldForge开启空间智能新路径,让AI读懂3D世界
量子位· 2025-09-21 06:36
Core Viewpoint - The article discusses the advancements in AI-generated video content, highlighting the challenges of controllability in video generation models and introducing WorldForge as a solution to enhance precision in video creation without altering the model's weights [1][2]. Group 1: Challenges in Video Generation - AI-generated videos have gained significant attention due to their realistic visuals, but the lack of precise control over generated content remains a major limitation [1]. - Current models often require extensive retraining to improve controllability, which can be costly in terms of time and computational resources, potentially degrading the model's generalization ability [1]. Group 2: Introduction of WorldForge - WorldForge offers an innovative approach by guiding existing video generation models during the inference phase, allowing for precise control without modifying the model's weights [2][14]. - The framework consists of three collaborative modules designed to enhance the generation process [4]. Group 3: Key Modules of WorldForge - **Intra-step Recursive Refinement (IRR)**: This module sets boundaries for the AI's imagination by implementing a "predict-correct" micro-loop, allowing for timely corrections after each prediction to ensure adherence to a predefined trajectory [4][5]. - **Flow-Gated Latent Fusion (FLF)**: This module separates appearance and motion features, injecting motion signals only into relevant channels to maintain the quality of the generated content while controlling the perspective [6][7]. - **Dual-Path Self-Correcting Guidance (DSG)**: DSG addresses the imperfections in injected guidance signals by utilizing two parallel denoising paths to ensure high-quality output while adhering to trajectory constraints [7]. Group 4: Applications of WorldForge - WorldForge demonstrates remarkable capabilities, such as reconstructing 3D static scenes from a single image and generating 360° surround videos, indicating its potential for efficient world model exploration [9][8]. - The system allows users to design new camera trajectories for existing videos, executing complex movements and intelligently filling in newly exposed areas, outperforming traditional models that require extensive training [11]. - Additionally, WorldForge supports video content editing, including subject replacement and object manipulation, enabling creative modifications [12]. Group 5: Future Implications - WorldForge introduces a novel interactive and control approach in video generation, paving the way for the development of controllable world models without increasing training costs or losing prior knowledge [14]. - The potential for future advancements includes more natural interactions through language or gestures, allowing models to better understand and execute creative visions [14].
开放几个自动驾驶技术交流群(世界模型/端到端/VLA)
自动驾驶之心· 2025-09-20 16:03
欢迎大家加入一起交流相关的内容。感兴趣的同学欢迎添加小助理微信进群:AIDriver005, 备注:昵称 +方向加群。 自动驾驶之心技术交流群成立了,开学季&秋招期我们开放了几个技术交流群(世界模型/端到端/VLA等方 向)。 ...