World Model

Search documents
DeepMind科学家揭秘Genie 3:自回归架构如何让AI建构整个世界 | Jinqiu Select
锦秋集· 2025-08-06 09:07
Google DeepMind于2025年8月4日晚间发布了Genie 3,这是一个革命性的通用世界模型(world model),能够从文本提示或图像生成高度互动的3D环境,支持实时 交互和动态修改。 当一个虚拟世界不再被一行行代码"规定",而是从数据中自行"涌现"时,这意味着什么?它又将为AGI的探索带来怎样的质变? 本文整理并翻译了对谷歌DeepMind两位核心研究员 Shlomi Fuchter 和 Jack Parker Holder 的独家专访,深入探讨了其最新发布的生成式交互环境模型——Genie 3。 锦秋基金(公众号:锦秋集;ID:jqcapital)认为,这篇文章揭示了Genie 3模型背后的一手信息,也点明了DeepMind在AGI探索上的一条不同思路,因此我们做了 编译。 01 一项"改变范式"的突破性技术 谷歌DeepMind近期独家展示了一项被誉为"前所未见、最令人震撼"的AI技术,它有望开启下一个万亿美元的商业版图,并可能成为虚拟现实(VR)领域的"杀手 级"应用。这项技术的核心是一种全新的AI模型——"生成式交互环境"(Generative Interactive Enviro ...
深夜,OpenAI、谷歌等更新多款模型
第一财经· 2025-08-06 07:17
Core Insights - The article discusses the recent product launches by major AI model companies, highlighting shifts in product strategies and advancements in AI capabilities [3][11]. Group 1: OpenAI Developments - OpenAI has released two new open-source models, gpt-oss-120b with 117 billion parameters and gpt-oss-20b with 21 billion parameters, both utilizing the MoE architecture [4][5]. - The gpt-oss-120b model can run on a single 80GB GPU, while gpt-oss-20b can operate on consumer devices with 16GB memory, allowing for local deployment on laptops and smartphones [5][6]. - OpenAI's new models have shown competitive performance in benchmark tests, with gpt-oss-120b scoring close to or exceeding the closed-source o4-mini model [5][6]. Group 2: Anthropic's Strategy - Anthropic has shifted to a strategy of more frequent incremental updates, exemplified by the release of Claude Opus 4.1, which improves upon its predecessor in areas like coding and data analysis [6][7]. - In benchmark tests, Claude Opus 4.1 scored 74.5%, surpassing Opus 4's 72.5%, indicating enhanced coding capabilities [7]. Group 3: Google's Innovations - Google introduced Genie 3, its first world model that supports real-time interaction, building on previous models like Genie 1 and 2 [8][9]. - Genie 3 can simulate complex environments and interactions, generating consistent visuals for several minutes, a significant improvement over Genie 2 [9][11]. - Despite its advancements, Genie 3 still faces limitations, such as restricted action spaces and challenges in simulating multiple agents in shared environments [11].
X @Demis Hassabis
Demis Hassabis· 2025-08-05 15:21
RT Google DeepMind (@GoogleDeepMind)What if you could not only watch a generated video, but explore it too? 🌐Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt.From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵 ...
Google Genie 3 - The Most Advanced World Simulator Ever...
Matthew Berman· 2025-08-05 14:02
Google just announced Genie 3, their world model that is fully controllable like a video game and fully immersive. This is going to change movies, TV, video games, everything. And according to Google is a big leap towards AGI.Let me show you some demos and then I'm going to tell you all about it. All right, so check this one out. A gorilla wearing a fancy outfit walking through some buildings.And you can see on screen they're actually showing that this is fully controllable. Now, what I want you to look at ...
CAAI具身智能专委会主任蒋树强:世界模型是智能体进行决策的重要依据
机器人圈· 2025-08-04 11:38
关于具身大模型,蒋树强认为,具身大模型一般需融合视觉、语言和行为数据进行训练。训练具身大模型需要数 据、算力、算法三者统一。数据不再只是文本或视频,而是包含行为、物理参数、触觉等多模态信息,复杂度更 高。 "我觉得在特定场景下,只用一种类型的本体去训练,相对务实一点。但如果是各种各样的机器形态一起训练,事 情的复杂度会很高。"因此,蒋树强表示,具身大模型在真实物理空间中的泛化能力、数据复杂度、传感器差异等 问题仍是挑战。 蒋树强还提到,世界模型是对真实世界的抽象表示,包括三维空间、动态变化、对象关系、记忆与知识等。其目 标是对环境状态进行理解和预测,是智能体进行决策的重要依据。NIPS 2018的一篇文章指出,世界模型相当于 是推理和角色相关模型系统。然而世界模型和大模型的关系以及世界模型和三维空间的关系,都是值得去思考和 挖掘的。 "我们现在有单臂的机器人,让它去自动导航到一个地方,把桌面收拾干净。实际上,这个是偏工程实现的,没有 太多理论的方法,我们主要做研究还是在导航这一块。"蒋树强介绍,目前研究中大量使用模拟器生成数据,但虚 拟环境的物理参数可能不够真实,如何将虚拟与真实环境对齐仍是难题。 "具身智能 ...
Meta chief AI scientist Yann LeCun clarifies his role after the company hires another chief AI scientist
Business Insider· 2025-07-26 19:50
Core Insights - Meta has appointed Shengjia Zhao, co-creator of ChatGPT and former lead scientist at OpenAI, as the chief scientist at its Superintelligence Labs, indicating a strategic move in the AI talent acquisition landscape [1][2]. Group 1: Leadership and Structure - Shengjia Zhao will set the research agenda and scientific direction for Meta's Superintelligence Labs, working closely with CEO Mark Zuckerberg and Chief AI Officer Alexandr Wang [2]. - The formalization of Zhao's leadership role comes as Meta reports successful recruitment efforts and team assembly [2]. - Yann LeCun, who has been with Meta since 2013 and serves as the chief AI scientist for Meta's Fundamental AI Research (FAIR), clarified that his role remains unchanged despite Zhao's appointment [3]. Group 2: Research Focus - Meta's FAIR, established over a decade ago, focuses on advancing AI technology, leading to the release of the open-source large language model, Llama, in 2023 [8]. - The Superintelligence Labs will encompass FAIR and other teams, aiming to develop "personal superintelligence for everyone," as stated by Zuckerberg [9]. - LeCun is currently focused on developing a new model type, known as a world model, which could potentially replace large language models [8]. Group 3: Collaboration and Future Directions - Zhao's expertise in pioneering new scaling paradigms in AI research is expected to guide the scientific direction of Meta's AI initiatives [10]. - LeCun expressed enthusiasm about collaborating with Zhao to enhance the integration of new research into Meta's advanced models [10].
一边是毕业等于失业,一边是企业招不到人,太难了。。。
自动驾驶之心· 2025-07-23 09:56
Core Insights - The automatic driving industry is experiencing a paradox where job openings are abundant, yet companies struggle to find suitable talent. This is attributed to a shift in market expectations and a focus on sustainable business models rather than rapid expansion [2][3]. Industry Overview - Companies in the automatic driving sector are now more cautious with their spending, prioritizing survival and the establishment of viable business models over aggressive hiring and expansion strategies. This shift is expected to lead to significant industry adjustments within the next 1-3 years [2][3]. Talent Demand - There is an unprecedented demand for "top talent" and "highly compatible talent" in the automatic driving field. Companies are not necessarily unwilling to hire, but they are looking for candidates with exceptional skills and relevant experience [4][3]. Community and Resources - The "Automatic Driving Heart Knowledge Planet" is the largest community focused on automatic driving technology in China, established to provide resources and networking opportunities for professionals in the field. It has nearly 4000 members and over 100 industry experts contributing to discussions and knowledge sharing [9][10]. Learning and Development - The community offers comprehensive learning pathways covering various subfields of automatic driving technology, including perception, mapping, and AI model deployment. This initiative aims to support both newcomers and experienced professionals in enhancing their skills [9][12][13]. Job Placement Support - The community has established a direct referral mechanism with numerous automatic driving companies, facilitating job placements for members. This service aims to streamline the hiring process and connect qualified candidates with potential employers [10][9].
自动驾驶论文速递 | 世界模型、端到端、VLM/VLA、强化学习等~
自动驾驶之心· 2025-07-21 04:14
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Orbis model developed by Freiburg University, which significantly improves long-horizon prediction in driving world models [1][2]. Group 1: Orbis Model Contributions - The Orbis model addresses shortcomings in contemporary driving world models regarding long-horizon generation, particularly in complex maneuvers like turns, and introduces a trajectory distribution-based evaluation metric to quantify these issues [2]. - It employs a hybrid discrete-continuous tokenizer that allows for fair comparisons between discrete and continuous prediction methods, demonstrating that continuous modeling (based on flow matching) outperforms discrete modeling (based on masked generation) in long-horizon predictions [2]. - The model achieves state-of-the-art (SOTA) performance with only 469 million parameters and 280 hours of monocular video data, excelling in complex driving scenarios such as turns and urban traffic [2]. Group 2: Experimental Results - The Orbis model achieved a Fréchet Video Distance (FVD) of 132.25 on the nuPlan dataset for 6-second rollouts, significantly lower than other models like Cosmos (291.80) and Vista (323.37), indicating superior performance in trajectory prediction [6][7]. - In turn scenarios, Orbis also outperformed other models, achieving a FVD of 231.88 compared to 316.99 for Cosmos and 413.61 for Vista, showcasing its effectiveness in challenging driving conditions [6][7]. Group 3: LaViPlan Framework - The LaViPlan framework, developed by ETRI, utilizes reinforcement learning with verifiable rewards to address the misalignment between visual, language, and action components in autonomous driving, achieving a 19.91% reduction in Average Displacement Error (ADE) for easy scenarios and 14.67% for hard scenarios on the ROADWork dataset [12][14]. - It emphasizes the transition from linguistic fidelity to functional accuracy in trajectory outputs, revealing a trade-off between semantic similarity and task-specific reasoning [14]. Group 4: World Model-Based Scene Generation - The University of Macau introduced a world model-driven scene generation framework that enhances dynamic graph convolution networks, achieving an 83.2% Average Precision (AP) and a 3.99 seconds mean Time to Anticipate (mTTA) on the DAD dataset, marking significant improvements [23][24]. - This framework combines scene generation with adaptive temporal reasoning to create high-resolution driving scenarios, addressing data scarcity and modeling limitations [24]. Group 5: ReAL-AD Framework - The ReAL-AD framework proposed by Shanghai University of Science and Technology and the Chinese University of Hong Kong integrates a three-layer human cognitive decision-making model into end-to-end autonomous driving, improving planning accuracy by 33% and reducing collision rates by 32% [33][34]. - It features three core modules that enhance situational awareness and structured reasoning, leading to significant improvements in trajectory planning accuracy and safety [34].
L4产业链跟踪系列第三期-头部Robotaxi公司近况跟踪(技术方向)
2025-07-16 06:13
Summary of Conference Call Company and Industry - The conference call primarily discusses advancements in the autonomous driving industry, specifically focusing on a company involved in Level 4 (L4) autonomous driving technology. Key Points and Arguments 1. **Technological Framework**: The company has a modular architecture for its autonomous driving system, which includes perception, prediction, control, and planning. This framework has evolved to incorporate advanced techniques like reinforcement learning and world models, although the core structure remains intact [1][2][3]. 2. **Transition to Large Models**: The industry is shifting from CNN architectures to transformer-based models. The company is gradually replacing its existing models with these new frameworks, which may take longer due to the high baseline performance of their current systems [3][4]. 3. **Data Utilization**: The company emphasizes the importance of both real and simulated data for model training. While real data is primarily used, there is a plan to increasingly incorporate simulated data to address data shortages, especially for control models [8][9][10]. 4. **Learning Techniques**: Imitation learning has been used for scenarios where rule-based approaches fail, while reinforcement learning is applied in end-to-end (E2E) models. The proportion of reinforcement learning used is not significant, indicating a cautious approach to its implementation [11][12]. 5. **Operational Deployment**: The company has deployed several autonomous vehicles in major cities like Beijing and Guangzhou, with plans to expand in Shenzhen and Shanghai. The current fleet consists of a few hundred vehicles [14][21]. 6. **Cost Structure**: The cost of vehicles includes hardware components such as multiple radars and cameras, with estimates suggesting that the total cost could be reduced to around 200,000 yuan [15][19]. 7. **Computational Resources**: The company is facing challenges with computational capacity, particularly with the integration of various models across different chips. There is a focus on optimizing the use of existing resources while planning for future upgrades [19][20]. 8. **Profitability Goals**: The company aims to achieve a break-even point by deploying a fleet of over 10,000 vehicles by 2027 or 2028. Current estimates suggest that achieving profitability may require a fleet size closer to 100,000 vehicles [26]. 9. **Market Positioning**: The company acknowledges competition from other players in the autonomous driving space, particularly in terms of regulatory approvals and operational capabilities. It aims to maintain a competitive edge by leveraging its faster acquisition of commercial licenses [27][28]. Other Important Content - The discussion highlights the ongoing evolution of the autonomous driving technology landscape, with a focus on the balance between technological advancement and operational scalability. The company is committed to addressing challenges in data acquisition, model training, and fleet management to enhance its market position [22][23][30].
双非研究生,今年找工作有些迷茫。。。
自动驾驶之心· 2025-07-14 14:04
Core Viewpoint - The article emphasizes the importance of staying updated with cutting-edge technologies in the fields of autonomous driving and embodied intelligence, highlighting the need for strong technical skills and knowledge in advanced areas such as large models, reinforcement learning, and 3D graphics [4][5]. Group 1: Industry Trends - There is a growing demand for talent in the fields of robotics and embodied intelligence, with many startups receiving significant funding and showing rapid growth potential [4][5]. - Major companies are shifting their focus towards more advanced technologies, moving from traditional methods to end-to-end solutions and large models, indicating a technological evolution in the industry [4][5]. - The community aims to build a comprehensive ecosystem that connects academia, products, and recruitment, fostering a collaborative environment for knowledge sharing and job opportunities [6]. Group 2: Technical Directions - The article outlines four key technical directions in the industry: visual large language models, world models, diffusion models, and end-to-end autonomous driving [9]. - It provides resources and summaries of various research papers and datasets related to these technologies, indicating a strong emphasis on research and development [10][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][35][36][38]. Group 3: Community and Learning Resources - The community offers a variety of learning materials, including video courses, hardware, and coding resources, aimed at equipping individuals with the necessary skills for the evolving job market [6]. - There is a focus on creating a supportive environment for discussions on the latest industry trends, technical challenges, and job opportunities, which is crucial for professionals looking to advance their careers [6].