世界模型
Search documents
AI无限生成《我的世界》,玩家动动键盘鼠标自主控制!国产交互式世界模型来了
量子位· 2025-05-13 03:01
Core Viewpoint - The article discusses the launch of Matrix-Game, an interactive world modeling tool developed by Kunlun Wanwei, which allows users to create and explore virtual environments in a highly realistic manner using simple mouse and keyboard commands. This tool leverages AI to generate content in real-time, significantly lowering the barriers to entry for users and enhancing creative freedom while adhering to physical realism. Group 1: Matrix-Game Overview - Matrix-Game enables users to interact with and create detailed virtual content that aligns with real-world physics, offering a low operational threshold for users [10][41]. - The tool supports various environments, including forests, beaches, deserts, glaciers, rivers, and plains, and allows for basic and complex movements, perspective shifts, and actions like jumping and attacking [5][6][10]. - The Matrix-Game-MC dataset is a large-scale dataset that includes unlabelled Minecraft game videos and controllable video data, facilitating the model's learning of complex environmental dynamics and interaction patterns [14][15]. Group 2: Technical Implementation - The main model framework is based on diffusion models, which include image-to-world modeling, autoregressive video generation, and controllable interaction design [18][20]. - The image-to-world modeling process generates interactive video content from a single image, integrating user actions without relying on language prompts [21]. - The autoregressive video generation ensures temporal consistency by generating video segments based on previous frames, while controllable interaction design enhances the model's responsiveness to user inputs [23][27]. Group 3: Evaluation and Performance - The GameWorld Score evaluation system assesses the performance of interactive world generation models across four dimensions: visual quality, temporal quality, action controllability, and physical rule understanding [29][30]. - Matrix-Game outperforms existing models like Decart's Oasis and Microsoft's MineWorld in all evaluated dimensions, achieving a user preference rate of 96.3% in blind tests [36][39]. - In specific actions such as movement and attack, Matrix-Game maintains over 90% accuracy, demonstrating high precision in fine-grained control [39]. Group 4: Industry Implications - Matrix-Game has potential applications in rapidly building virtual game worlds, producing content for film and the metaverse, training embodied agents, and generating data [41][42]. - The trend towards 3D AI-generated content (AIGC) is gaining traction, with major companies investing in this area, indicating a shift from 2D to 3D technologies [43][46]. - The advancements in 3D AIGC and world modeling are expected to provide new interactive experiences, making it a focal point for future AI developments [48][49].
生成视频好看还不够,还要能自由探索!昆仑万维开源Matrix-Game,单图打造游戏世界
机器之心· 2025-05-13 02:37
Core Viewpoint - The rapid advancement of world models, particularly with the introduction of interactive world models like Matrix-Game, signifies a pivotal moment in AI development, enabling more immersive and controllable virtual environments [4][50]. Group 1: Development of World Models - The Oasis project marked the first real-time, interactive open-source world model, showcasing a significant leap in understanding physical and game rules [1]. - Microsoft's MineWorld further enhanced visual effects and action generation consistency in interactive world models [2]. - The recent launch of Matrix-Game by Kunlun Wanwei represents a major milestone in interactive world generation, being the first open-source model in the industry with over 10 billion parameters [10][50]. Group 2: Features of Matrix-Game - Matrix-Game allows for fine-grained user interaction control, enabling players to experience seamless movement and environmental feedback in a game world [17]. - The model demonstrates high fidelity in visual and physical consistency, generating realistic interactions and maintaining visual coherence during gameplay [19][20]. - It exhibits multi-scene generalization capabilities, allowing for the generation of diverse environments beyond just Minecraft, including cities and historical buildings [25][26]. Group 3: Evaluation and Performance - Kunlun Wanwei introduced a comprehensive evaluation framework called GameWorld Score, assessing visual quality, temporal consistency, controllability, and understanding of physical rules [29]. - In comparative assessments, Matrix-Game outperformed other models like Oasis and MineWorld across all evaluation dimensions [31]. - The model achieved over 90% accuracy in action control, demonstrating its robustness in responding to user inputs [35]. Group 4: Technological Innovations - Matrix-Game's success is attributed to its innovative data collection and model architecture, utilizing a large dataset for training that includes both unlabelled and labelled data [41][42]. - The architecture focuses on image-to-world modeling, allowing the model to generate interactive video content based solely on visual inputs without relying on language prompts [44][45]. - The model's ability to maintain temporal coherence during video generation is a significant advancement, addressing previous challenges in long-sequence content generation [45]. Group 5: Broader Implications - Matrix-Game's capabilities extend beyond gaming, impacting content production in various fields such as film, advertising, and XR [51]. - The development of spatial intelligence through models like Matrix-Game is crucial for advancing embodied intelligence and enhancing machine understanding of the three-dimensional world [49][50]. - Kunlun Wanwei aims to create a comprehensive AI creative ecosystem, facilitating innovation and expression in a new dimension of interaction [52].
21对话|卓驭陈晓智:用有限算力做极致性能,这是我们血液里的东西
2 1 Shi Ji Jing Ji Bao Dao· 2025-05-10 00:36
Core Insights - The article discusses the rise of intelligent driving technology in the automotive market, particularly focusing on Zhuoyue Technology's approach to providing cost-effective driving assistance solutions [1][2][3]. Group 1: Company Overview - Zhuoyue Technology, formerly known as DJI Automotive, has transitioned from a team within DJI focused on intelligent driving technology to an independent entity, leveraging its expertise in sensors and computer vision from the drone industry [2]. - The company aims to provide high-performance driving assistance features at lower costs, utilizing its self-developed hardware and software [1][2]. Group 2: Product Development - Zhuoyue's 7V (7 cameras) + 32 TOPS configuration has become standard in vehicles priced between 80,000 to 150,000 RMB, enabling features like urban memory navigation and highway driving [1]. - The company plans to launch the "Chengxing Platform" in November 2024, offering 7V and 9V solutions that reduce reliance on high-precision maps and LiDAR, thus lowering costs for advanced driving assistance [2]. Group 3: Market Position and Strategy - The mid-to-low-end market is expected to grow significantly by 2025, which aligns with Zhuoyue's strengths [3]. - Zhuoyue has established partnerships with major automotive manufacturers, including FAW, Volkswagen, and BYD, with over 20 models already in production and more than 30 models set to launch soon [2]. Group 4: Technological Innovations - The company is focusing on enhancing its capabilities through the introduction of the Thor platform, which offers higher computing power at a lower cost compared to existing solutions [3][6]. - Zhuoyue is also exploring the integration of reinforcement learning and world models to improve safety and decision-making in driving assistance systems [12][19]. Group 5: Future Directions - The company is preparing to develop hardware for L3 and L4 autonomous driving, including necessary sensors and controllers, while emphasizing the importance of first perfecting L2 assistance before advancing to higher levels of automation [9][10]. - Zhuoyue aims to enhance user experience by implementing a more intuitive point-to-point navigation system that mimics human driving behavior [20].
MCP:AI时代的“万能插座”,大厂竞逐的焦点
3 6 Ke· 2025-04-29 08:11
Core Insights - The emergence of the Model Context Protocol (MCP) is reshaping the AI landscape, providing a standardized interface for large models and clients to efficiently access external data sources and tools, thus enhancing the capabilities of AI agents [1][16] - Major tech companies like Baidu, Alibaba, Tencent, and ByteDance are actively developing the MCP ecosystem, which is transforming the development paradigm of AI applications and the competitive dynamics of the tech industry [1][16] Group 1: MCP Overview - MCP is likened to a universal connector for AI applications, enabling seamless integration with external tools and data sources, significantly improving development efficiency and operational costs [2] - The protocol allows for a modular approach to AI development, where developers can easily assemble complex functionalities by utilizing various external services [2] Group 2: Company Strategies - Baidu is rapidly advancing in the MCP space, launching multiple MCP servers for e-commerce and search functionalities, thereby enhancing developer capabilities and application intelligence [3][5] - Alibaba is building a comprehensive MCP ecosystem through its Baidian MCP platform, offering over 50 pre-configured services and integrating core applications like Alipay and Gaode Map to create a robust collaborative environment [5] - Tencent focuses on integrating MCP within its WeChat ecosystem, facilitating the incorporation of AI capabilities into social and payment applications, thus enhancing user experience [7] - ByteDance's Coze Space is emerging as a strong player by leveraging MCP to create a powerful AI agent platform capable of automating complex tasks through external tool integration [9] Group 3: Future of MCP - The MCP ecosystem is still in its early stages, with competition among companies centered on ecosystem development, but differences in implementation may lead to fragmentation [16] - As the standardization of MCP progresses and the demand for interoperability increases, there is potential for collaboration and integration among different MCP ecosystems [16] - The evolution of MCP will likely incorporate new technologies such as quantum computing and blockchain, expanding its capabilities and applications [16]
2025上海车展:当智驾不再让人兴奋,汽车智能化暗战升级
Xin Lang Cai Jing· 2025-04-29 07:10
Group 1: Industry Trends - The 2025 Shanghai Auto Show reflects a shift towards balancing technology pursuit, commercial value, and social benefits in the automotive industry [1] - The focus on L3 conditional autonomous driving is becoming a common goal among major Chinese automakers, with many aiming for commercial viability by 2025 [2][3] - The automotive industry is transitioning from a marketing-driven approach to one centered on product and user needs, indicating a return to the essence of the industry [1][23] Group 2: Technological Developments - Huawei's ADS 4.0 system was introduced as a high-level intelligent driving solution, with expectations for L3 commercial capabilities by 2025 [2] - The L3 level of automation allows vehicles to perform all driving tasks under specific conditions, marking a significant advancement from L2 [3][7] - The integration of world models and reinforcement learning is seen as a new generation of intelligent driving technology, enhancing decision-making and safety [10][12] Group 3: Regulatory Changes - New regulations in Shenzhen and Beijing clarify accident liability in L3 autonomous driving scenarios, placing responsibility on manufacturers for system failures [16] - These regulatory changes are expected to drive automakers to invest more in technology development and safety testing [16] Group 4: Market Dynamics - The automotive industry is experiencing a shift in consumer expectations, with users increasingly demanding vehicles that understand their needs rather than just perform tasks [18][19] - Companies are focusing on self-research and development of core technologies to enhance brand competitiveness and reduce reliance on external suppliers [20][22] - The competitive landscape is evolving, with traditional automakers needing to innovate while new entrants focus on applying the latest technologies [20][23]
车展观察:安全、出海、世界模型
HTSC· 2025-04-24 09:38
Investment Rating - The industry investment rating is "Overweight" [5] Core Insights - The focus of the automotive industry has shifted from "smart driving" to safety, with companies emphasizing the importance of safety features in their products [2] - The presence of international journalists and bloggers at the auto show indicates that "going global" is becoming a key strategy for Chinese electric vehicle manufacturers [3] - The concept of "World Model" is emerging as a new technological trend in AI-assisted driving, highlighting the competition among companies to develop digital representations of the physical world [4] Summary by Sections Observation 1: Shift to Safety in Smart Driving - Major companies have downplayed "smart driving" in their presentations, focusing instead on safety enhancements in their products, which is expected to benefit companies involved in lidar, smart driving chips, and algorithms [2] Observation 2: Increased International Presence and Global Strategy - The auto show saw a rise in international media coverage, reflecting the growing global competitiveness of Chinese smart electric vehicles, with "going global" likely to be a new development strategy for major manufacturers by 2025 [3] Observation 3: Emergence of "World Model" in AI-Assisted Driving - Companies are increasingly emphasizing the "World Model," which represents the digital understanding and predictive capabilities of smart driving systems, indicating a shift in competition towards cloud-based world model capabilities [4]
王晓刚:物理世界模型用于驾驶辅助训练很重要
Xin Lang Cai Jing· 2025-04-24 09:04
Core Insights - The Shanghai Auto Show, held on April 23, focuses on innovation and the future of the automotive industry, showcasing traditional fuel vehicles, new energy vehicles, smart driving, and supply chain technologies [1] - The event highlights the rapid advancement of technologies such as high-level intelligent driving, AI models, and multi-modal perception, with many new technologies and products set to be unveiled [1] Group 1: Industry Trends - The ongoing price war in the automotive sector has extended to supply chain companies, prompting a need for balance between pricing and cost management [3] - The consensus among industry leaders is shifting towards platformization in sensor design, which reduces the need for repetitive development and adaptation for specific vehicle models [4] Group 2: Technological Innovations - The development of generative intelligent driving is seen as a significant opportunity for the industry, addressing limitations of current end-to-end models that require vast amounts of high-quality data [5] - The concept of a "world model" is introduced, allowing for the reconstruction of physical driving scenarios to enhance model training through simulation and reinforcement learning [5][6] - Multi-modal large models are transforming user interaction within smart cabins, enabling more complex and engaging conversations rather than simple one-on-one interactions [6][10] Group 3: Data Utilization - It is noted that 99% of real user data may not be useful for training models, as most driving scenarios involve minimal information gain [7] - The importance of high-quality data is emphasized, with a focus on capturing complex driving behaviors in challenging scenarios [7][8] Group 4: Future Developments - The emergence of proactive interaction capabilities in smart cabins is anticipated to significantly enhance user experience, allowing for multi-party conversations and engagement [10][12] - The integration of AI with hardware is viewed as a trend that could lower costs and improve the overall ecosystem, with a focus on creating a robust software environment [13]
上海车展
数说新能源· 2025-04-24 06:29
1. 智能驾驶改名,企业强调安全 受前期事件影响,今天的发布会上主要公司的产品发布中都弱化了 对"智能驾驶"的宣传。例如比亚迪对"天神之眼"的宣传改为驾驶辅助。理想等企业在发布中也强调自己 产品能够提高紧急情况下用户的安全性。 2. 各种肤色的外国记者/博主增多,出海是企业主线 本次另一个感受是主要车企在SUV/MPV等热门车型 上发布的新品增多。国内市场产品同质化趋势加剧。但我们注意到来自欧美,日本,中东,东南亚等地 的记者和博主增多,反映中国智能电动车在海外市场还是极具竞争力。 3. 世界模型成为新技术趋势 在小鹏,理想,华为,地平线等各个车企和平台供应商的展示中都在强 调"世界模型"。企业在智能驾驶上的竞争从过去拼车端算力,到拼云端构建的世界模型(一个能够理解 物理世界规则的虚拟世界)的能力。 主机厂电芯采购:兼顾性能和成本 比亚迪出海:发力东南亚 往期推荐 加入社群 CATL :储能市场增长高于动力 添加半仙微信,备注"进群",邀请你加入锂电行业社群,获得行业最新动态、行业干货报 告和精准人脉。 本公众号基于分享的目的转载,转载文章的版权归原作者或原公众号所有,如有涉及侵权请及时告知,我们将予以核实并 ...
商汤绝影打造智能驾驶新路标——生成式智驾R-UniAD,让安全更有确定性,超越人类驾驶极限
Guan Cha Zhe Wang· 2025-04-24 01:18
Core Insights - The article discusses the advancements in autonomous driving technology by SenseTime's "绝影" (Jueying), particularly focusing on the R-UniAD technology framework that integrates reinforcement learning and world models to overcome existing limitations in end-to-end autonomous driving systems [1][2][3]. Group 1: Technology Advancements - SenseTime has developed the R-UniAD technology solution, which incorporates reinforcement learning to enhance the interaction between end-to-end autonomous driving systems and the real world, thereby improving safety and reliability [2][3]. - The VLAR architecture, which combines "vision-language-action-reinforcement learning," is a key breakthrough in achieving generative autonomous driving capabilities [6][9]. - The R-UniAD framework consists of a three-stage process: initial training through imitation learning, reinforcement learning with world model interaction, and efficient distillation for deployment in vehicles [9]. Group 2: Safety and Performance Improvements - The R-UniAD technology aims to significantly reduce the need for real-world data by generating virtual scenarios, thus lowering the requirement for high-quality corner case data by two orders of magnitude [9]. - The model's performance is designed to exceed human driving capabilities, with a reported reduction in collision rates by an order of magnitude compared to human drivers [9]. - The system's ability to handle complex scenarios, such as construction site interruptions, is enhanced through 4D simulation and reinforcement learning, allowing for better prediction and response to unforeseen obstacles [10][12][16]. Group 3: Commercialization and Partnerships - SenseTime's autonomous driving solutions are currently in collaboration with four automotive manufacturers, with seven vehicle models already equipped with their technology [1][21]. - The company is accelerating the mass production of its autonomous driving solutions, with plans for further deployment in 2025, including partnerships with major automotive brands like Dongfeng and Chery [21][23]. - The R-UniAD technology has received certification from the China Automotive Technology and Research Center, marking it as a leading product in the field of autonomous driving [23]. Group 4: Future Developments - The "绝影开悟" (Jueying Kaiwu) world model has been upgraded to version 2.0, enabling near real-time interaction and 4D scenario generation, which is crucial for training autonomous driving models [17][19][20]. - This upgraded model can generate diverse and complex driving scenarios, including extreme risk situations, which are essential for training robust autonomous systems [19][20]. - SenseTime aims to integrate its advanced AI technologies with the automotive industry to create a comprehensive ecosystem for intelligent driving, focusing on safety, adaptability, and user experience [24][25].
大模型驱动空间智能综述:具身智能体、智慧城市与地球科学的进展
欧米伽未来研究所2025· 2025-04-20 14:32
" 欧米伽未来研究所 " 关注科技未来发展趋势,研究人类向欧米伽点演化过程中面临的重大机遇与挑战。将不定期推荐和发布世界范围重要科技研究进展和未 来趋势研究。( 点击这里查看欧米伽理论 ) 我们生活在一个由空间构成的世界中。从每天在家居、办公环境或城市街道中的移动,到规划一次跨越山海的旅行,乃至科学家们研究气候变迁的地理模 式、城市扩张的复杂格局,这一切都深刻地依赖于我们对空间的感知、理解和运用能力。这种核心能力,我们称之为"空间智能"。 长久以来,人类凭借自身的感官系统和发达的大脑,不断地探索、适应并改造着周遭的空间环境,演化出了独特的空间认知机制。而今,随着人工智能 (AI)技术的日新月异,特别是大语言模型(LLMs)的横空出世,机器也开始显露出令人瞩目的空间智能潜力。这场由大模型引领的技术浪潮,正以前 所未有的深度和广度,渗透到从微观尺度的机器人导航,到中观尺度的城市规划管理,再到宏观尺度的地球科学研究等诸多领域。 这部报告由清华大学和芬兰赫尔辛基大学共同发布,将带领读者一同深入探究,大模型是如何被赋予"空间感"的?它们在跨越不同尺度的空间智能任务中 扮演着怎样日益重要的角色?以及在迈向更高级空间智能的 ...