Workflow
世界模型
icon
Search documents
2025年几家自动驾驶公司的采访总结
自动驾驶之心· 2026-01-22 09:07
Core Algorithm - The industry has shifted towards end-to-end solutions, moving away from modular approaches, at least in public discourse [1] - The introduction of world models is prevalent, with some companies using them to generate training data, while others incorporate them into end-to-end models to enhance performance [1][8] - There is a divergence in opinions regarding the necessity of language models (VLA) in autonomous driving, with some companies arguing that language is not essential for driving tasks [1][11] Simulation and Infrastructure - The closed-loop systems have evolved from data-driven to simulation testing and training loops [2] - 3DGS is highlighted as a crucial technology for building simulation environments, as emphasized by Tesla at CVPR 2025 [5] - Infrastructure is critical, with companies like Xiaomi and Li Auto noting its benefits for development efficiency [3][14] Organizational Capability - Organizational ability is vital, as large autonomous driving teams face significant management challenges [4] - Team culture and collaboration are emphasized as essential for overcoming complex technical and management issues [5] Technical Choices Comparison - A comparison of various companies' technical choices reveals differing approaches to core technologies and the role of world models and simulation tools [9] - Companies like Li Auto advocate for a training loop that evolves from imitation to self-learning, while NVIDIA emphasizes interpretability and reasoning in AI [9] Key Non-Core Factors - R&D infrastructure and engineering efficiency are crucial for the success of autonomous driving technologies [14] - Simulation and synthetic data are becoming essential for addressing corner cases that real-world data cannot cover [14] - The scale of computing power and chip adaptation is critical, as autonomous driving is not just a software issue but also a hardware challenge [15] User Experience and Safety - User experience and safety are paramount, with companies like Xiaomi stressing the importance of balancing advanced technology with user concerns [17] - The need for a dual-stack safety mechanism is highlighted, ensuring that even aggressive end-to-end models have a fallback to traditional rule-based systems for safety [19]
最近咨询世界模型岗位的同学越来越多了......
自动驾驶之心· 2026-01-22 00:51
Core Viewpoint - The article emphasizes the growing demand for positions in the field of autonomous driving, particularly in the areas of world models, end-to-end systems, and VLA, highlighting the importance of practical experience and advanced knowledge in these domains [2][4]. Course Overview - The course on world models in autonomous driving is being launched in collaboration with industry experts, focusing on various algorithms and applications, including Tesla's world model and the Marble project by Fei-Fei Li's team [2][4]. - The course aims to provide a comprehensive understanding of world models, covering their development history, current applications, and different approaches such as pure simulation, simulation + planning, and generative sensor input [7]. Course Structure - **Chapter 1: Introduction to World Models** This chapter reviews the relationship between world models and end-to-end autonomous driving, discussing the evolution and current applications of world models, as well as various streams within the field [7]. - **Chapter 2: Background Knowledge of World Models** This chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding subsequent chapters [8][12]. - **Chapter 3: General World Model Exploration** Focuses on popular models such as Marble, Genie 3, and the latest discussions around VLA + world model algorithms, providing insights into their core technologies and design philosophies [9]. - **Chapter 4: Video Generation-Based World Models** This chapter delves into video generation algorithms, starting with notable works like GAIA-1 & GAIA-2 and extending to recent advancements, ensuring a balance between classic and cutting-edge research [10]. - **Chapter 5: OCC-Based World Models** Concentrates on OCC generation methods, discussing three major papers and a practical project, highlighting their applicability in trajectory planning and end-to-end systems [11]. - **Chapter 6: World Model Job Specialization** This chapter shares practical insights from the instructor's experience, addressing industry applications, pain points, and interview preparation for related positions [12]. Learning Outcomes - The course is designed to elevate participants to a level equivalent to one year of experience as a world model algorithm engineer, covering key technologies and enabling practical application in projects [15].
死磕机器人大脑的北大副教授,和我们聊了聊具身领域最大的「偏见」
36氪· 2026-01-21 14:33
以下文章来源于智能涌现 ,作者富充 智能涌现 . 直击AI新时代下涌现的产业革命。36氪旗下账号。 "软硬分化。" 软,是模型大脑,硬,是机器人本体;分化,是不同的公司各有所长,各司其职。 "软硬一体"在机器人没有规模化落地之前, 是包袱,不是优势。 文 | 富充 编辑 | 苏建勋 来源| 智能涌现(ID:AIEmergence) 封面来源 | 企业官方 2026年,具身智能会有怎样的分化?北京大学计算机学院副教授、"智在无界"CEO卢宗青向我们抛出一个判断: "智在无界"所在的北京鼎好大厦,是个被智源研究院、零一万物、银河通用等一众明星AI机构坐拥的大楼。在这里,人工智能的非共识,每天都在发生。 卢宗青的观点也和具身行业发展现状大相径庭。如今,获得高估值的具身创业公司,不论是已成为"独角兽"的智元机器人、银河通用,还是融资势头强劲的 星动纪元、星海图,都在执着地追求一件事:软硬一体,做全栈。 尽管如此,卢宗青与他于2025年创立的"智在无界",还是选择"逆势"做一家模型公司,只研发机器人大脑,并不涉足硬件制造。 智能涌现独家获悉,智在无界已于近日完成天使轮,融资金额为数千万元,由拉卡拉旗下考拉基金领投,领航 ...
魔都美术馆迎来首个官方AI讲解员
第一财经· 2026-01-21 12:44
Core Viewpoint - The collaboration between ByteDance's Doubao and Shanghai Pudong Art Museum marks a significant step in integrating AI into everyday experiences, transforming museum visits into immersive, interactive events through AI-guided tours [3][5]. Group 1: AI Integration in Museums - Doubao has become the official AI guide for two international exhibitions at the Shanghai Pudong Art Museum, enhancing the accuracy of its recognition and explanations through exclusive data collaboration and targeted search optimization [3][5]. - Users can interact with Doubao to gain insights on artworks from various dimensions, such as artistic style, historical context, and cultural significance, creating a more engaging experience [5]. - The AI's ability to maintain accurate content delivery while users move and observe artworks from different angles presents a significant technical challenge [5][6]. Group 2: Technological Advancements - The Seed1.8 model, released by ByteDance in December 2025, is designed to facilitate complex task execution and enhance multi-modal interactions, marking a shift from simple information output to real-world task execution [6][7]. - Multi-modal AI is seen as a crucial step towards achieving AGI (Artificial General Intelligence), with industry experts predicting that 2025 will be a year of adaptation for multi-modal technologies [7][8]. - The concept of world models is emerging as a core technology for multi-modal capabilities, enabling AI to understand and interact with both virtual and real-world environments [8][10]. Group 3: Industry Trends and Challenges - The increasing focus on world models reflects a broader industry shift towards understanding physical world laws, with the aim of integrating AI into real-world applications [10][11]. - Current trends indicate a movement towards the integration of understanding and generation in multi-modal models, although challenges such as high costs and low commercialization rates persist [11][12]. - Experts highlight that the lack of a unified technical route in multi-modal development is a significant barrier, with many models still relying on separate understanding and generation processes [11].
AI视频迎来了它的DeepSeek时刻
经济观察报· 2026-01-21 07:15
Core Viewpoint - The article discusses the launch and significance of PixVerse R1, a universal real-time world model developed by Aishi Technology, which is seen as a transformative moment in the AI video industry, akin to a "DeepSeek moment" for AI video [1][2]. Group 1: Product Features and Innovations - PixVerse R1 allows users to generate videos without inputting prompts, automatically creating content in real-time, which represents a shift in video generation logic [2][6]. - The model utilizes an Omni native multimodal architecture, integrating text, images, audio, and video into a unified processing framework [5]. - PixVerse R1 employs autoregressive flow-based generation, enabling it to reference previously generated content and maintain "long-term memory" of user inputs [6]. - The model's instant response engine compresses the traditional sampling steps from over 50 to just 1-4, significantly enhancing computational efficiency and enabling real-time video generation [6]. Group 2: Market Impact and Collaborations - Aishi Technology secured a strategic investment of $14.2 million from Chinese Ru Yi, which will facilitate copyright sharing and collaboration in film, streaming, and gaming sectors [8]. - The partnership aims to explore innovative applications of AI technology in the film industry, indicating significant potential for PixVerse R1 in transforming content creation [8][9]. - The article highlights that PixVerse R1's capabilities position Aishi Technology as a leader in the real-time video generation and world model sectors, with no other companies having launched similar products [9][10]. Group 3: Future Prospects and Industry Implications - The real-time generation capabilities of PixVerse R1 could lead to the development of AI-native games and interactive films, where narratives evolve based on user interactions [10][11]. - The article suggests that the boundaries between video production and consumption are blurring, allowing users to become co-creators of content in real-time [9][14]. - Aishi Technology's rapid growth and technological advancements have positioned it among the top players in the global AI video market, with significant user engagement and revenue growth [13][14].
AI视频迎来了它的DeepSeek时刻
Jing Ji Guan Cha Wang· 2026-01-21 06:39
Core Insights - PixVerse R1, launched by Aishi Technology, represents a significant advancement in AI video generation, allowing users to create videos in real-time without needing prompts, marking a transformative moment in the AI video industry [1][2][4] Group 1: Product Features - PixVerse R1 can generate videos instantly, adapting to user commands with remarkable speed, creating an immersive digital world where user input directly influences the narrative [1][3] - The model utilizes an Omni native multimodal architecture, integrating text, images, audio, and video into a unified processing framework, enhancing its generative capabilities [3][4] - It employs a self-regressive flow generation method, allowing it to remember previous inputs and generate content with a "long-term memory," which differentiates it from traditional video generation methods [4][7] Group 2: Market Impact - Aishi Technology secured a strategic investment of $14.2 million from Chinese company Ruyi, which will facilitate collaboration in film, streaming, and gaming sectors, indicating strong market interest in PixVerse R1 [5][6] - The partnership aims to explore innovative applications of AI technology in the film industry, highlighting the potential for significant transformation in content creation [6][7] - The product has already attracted attention from various game companies, indicating its potential to revolutionize interactive media and gaming experiences [8][9] Group 3: Competitive Landscape - Aishi Technology is positioned as a leader in the real-time video generation space, with no other companies having launched similar products, showcasing its competitive edge [7][9] - The company has rapidly gained traction, with over 100 million global users and a monthly active user count exceeding 16 million, reflecting its strong market presence [9][10] - The PixVerse R1 is recognized as the first universal real-time world model supporting up to 1080P resolution, setting a new standard in the industry [9][10] Group 4: Future Prospects - The introduction of PixVerse R1 is expected to blur the lines between video production and consumption, allowing users to generate and edit content in real-time, thus redefining user engagement in media [7][11] - The technology is anticipated to enable new forms of interactive storytelling and AI-native games, where narratives evolve based on user interactions, creating a dynamic digital ecosystem [7][8] - Aishi Technology's founder emphasizes that PixVerse R1 represents a new media form, where AI can create a continuously evolving world based on user intent, marking the beginning of a new era in real-time content generation [11]
鸣鸣很忙今起招股,发售价不高于236.6港元;奈飞提出以全现金方式收购华纳兄弟
Sou Hu Cai Jing· 2026-01-21 02:06
Group 1 - Hunan Mingming Hen Mang Commercial Chain Co., Ltd. has officially launched its global offering, planning to list on the Hong Kong Stock Exchange on January 28, with a share price not exceeding HKD 236.6 [2] - The company plans to issue 14.1011 million shares, with approximately 12.6909 million shares for international offering and about 1.4102 million shares for public offering in Hong Kong, estimating a net amount of approximately HKD 3.124 billion from the offering at a median price of HKD 233.10 per share [2] - Netflix has adjusted its acquisition proposal for Warner Bros. to an all-cash offer of USD 82.7 billion, with a cash price of USD 27.75 per share, receiving unanimous support from Warner Bros. Discovery's board [2] Group 2 - Yupan Intelligent has completed a Pre-IPO+ round of financing amounting to approximately RMB 513 million, with investments from various entities including Wenzhou Cangnan Shanhai Industrial Group and Crewstone International [2] - Nature Select has recently completed a new financing round exceeding USD 30 million, with investments from Alibaba, Ant Group, and several venture capital firms [3] - The company "Today Yixiu" has announced the completion of a seed round financing of several tens of millions, with investors including Hillhouse Ventures and Yunjiu Capital, planning to launch a series of hardware and software products later this year [4] Group 3 - Potensic has launched the AI-powered ATOM series of drones, featuring smart functions and compliance with global regulations, aimed at enhancing user experience in flight and control [4] - Tesla's second-generation humanoid robot figurine will be available for sale on January 21, priced at RMB 199, consisting of over 40 independent parts and designed to closely resemble the second-generation humanoid robot [5]
魔都美术馆迎来首个官方AI讲解员
Di Yi Cai Jing Zi Xun· 2026-01-20 13:17
Core Insights - ByteDance's Doubao has partnered with Shanghai Pudong Art Museum to serve as the official AI guide for two international exhibitions, enhancing the visitor experience through interactive AI explanations [1][3] - The collaboration exemplifies the practical application of AI in everyday life, showcasing the "perception-reasoning-action" capabilities of multimodal models [1][6] Industry Trends - The integration of AI in museum settings allows users to engage with art through various dimensions, such as artistic style and historical context, creating a more immersive experience [3] - The Seed 1.8 model, launched by ByteDance, focuses on bridging perception, reasoning, and action, enabling complex task execution beyond mere information output [4][10] - Multimodal AI is seen as a critical step towards achieving AGI (Artificial General Intelligence), with industry experts predicting that 2025 will be a pivotal year for multimodal adaptation [6][10] Technical Challenges - Ensuring content accuracy in AI explanations is a significant challenge, particularly in distinguishing similar artifacts and maintaining recognition stability as viewers move [3][6] - The development of world models is essential for advancing multimodal capabilities, as they serve as the foundational technology for processing various information types [8][9] Future Directions - The industry is increasingly focused on understanding physical world laws through world models, which are expected to enhance AI's ability to interact with the physical environment [10][11] - There is a trend towards integrating multimodal understanding and generation, with models like Google's Gemini3 demonstrating advanced capabilities in image editing [11]
机器人专用芯片是伪命题?英特尔宋继强:市场太小,目前难盈利
Feng Huang Wang· 2026-01-20 13:07
Core Insights - The core viewpoint is that for embodied intelligence to be effectively integrated into factories and homes, it must overcome the challenge of "reliability," which can be addressed by implementing a "triple system" approach in robotics [1][3]. Group 1: Current Challenges in Embodied Intelligence - Current embodied intelligence robots are likened to "genius children," performing well under ideal conditions but struggling with unexpected situations, highlighting the industry's common challenges [1]. - The accuracy of action generation in robots based on visual language models (VLA) is currently around 60-70%, with issues such as hallucinations, poor environmental adaptability, and weak long-term task planning capabilities [1][8]. Group 2: Proposed Solutions - A reliable embodied intelligence system should consist of three layers: a primary system for decision-making, a safety system for monitoring, and a fallback system for emergency handling [3]. - The primary system utilizes a "neuro-symbolic AI" approach, combining the generalization capabilities of neural networks with the reliability and interpretability of symbolic logic [3]. - The safety system continuously monitors the robot's execution status against preset safety rules, intervening when deviations occur, while the fallback system guides the robot into a safe state during emergencies [3][4]. Group 3: Industry Outlook and Hardware Considerations - The current market for robotics is small, making dedicated chips economically unfeasible; thus, the industry primarily adapts existing chips from other sectors like mobile and automotive [6]. - Intel's long-standing position in industrial automation provides it with a competitive edge, leveraging its expertise in high-precision motion control for robotics [6]. - The anticipated deployment model for future robotics involves a combination of robot terminals and edge servers to facilitate low-latency operations [7]. Group 4: Bottlenecks and Future Projections - Major bottlenecks include the limitations of VLA technology, which struggles with accuracy and understanding of physical relationships, leading to a shift towards "world models" that incorporate physical laws [8]. - Data isolation remains a critical issue, with significant variations in data requirements across different industries and robot types, complicating the establishment of unified data standards [8]. - The path to reliable embodied intelligence is projected to take two to three years, with initial deployments in semi-structured environments like logistics and manufacturing, followed by broader applications as reliability improves [10][11]. Group 5: Integration of Technologies - The development of embodied intelligence will not rely on a single technological breakthrough but rather on the integration of new AI models with established control technologies and safety engineering [12]. - The focus is on creating a reliable solution that minimizes errors in real-world applications, emphasizing the importance of a robust foundational system for robotics [12].
华为哈勃押注,成立仅半年融资三连跳,这家公司凭什么成为“世界模型黑马”?
Sou Hu Cai Jing· 2026-01-20 11:29
Core Insights - Manifold AI, founded by Dr. Wu Wei, aims to redefine embodied intelligence through a world model that allows robots to predict physical interactions rather than just perceive the environment [1][4][14] - The company has completed over 300 million yuan in funding within just seven months of its establishment, indicating strong investor interest in the "physical AI" sector [5][11] Funding and Growth - Manifold AI was established in May 2025 and has rapidly completed three rounds of financing, including a seed round led by Inno Fund, followed by two angel rounds totaling over 300 million yuan [4][5] - The latest funding round included prominent investors such as Meihua Venture Capital, Junlian Capital, and Huawei Hubble, highlighting the strategic importance of the company's technology [1][6] Technology and Innovation - The company is developing a unique approach called World Model Action (WMA), which allows AI to not only see but also simulate physical interactions based on first-person perspective videos [7][10] - Manifold AI's models, including DriveScape, RoboScape, and AirScape, are designed for various applications such as autonomous driving and robotics, all built on the foundational WorldScape model [10][12] Market Position and Future Goals - The company aims to have its robots equipped with the "Manifold Brain" in 10% of the market, focusing on product-driven development while also commercializing sub-domain models [11][12] - The long-term vision includes transitioning world models from experimental phases to practical applications in warehouses, factories, and homes within the next three years [13][14] Industry Context - The growing interest in world models is attributed to their potential to provide AI systems with the long-missing "physical intuition," which is essential for real-world intelligent behavior [14][15] - The entry of strategic investors like Huawei signals a strong alignment between Manifold AI's technology and the future of industrial digitalization and robotics [6][10]