预训练范式
Search documents
英伟达Jim Fan:「世界建模」是新一代预训练范式
量子位· 2026-02-05 04:10
Core Viewpoint - The article discusses the emergence of "world modeling" as a new pre-training paradigm in AI, particularly in robotics and multimodal AI, predicting that 2026 will be a pivotal year for its application [3][8][28]. Group 1: Definition and Transition - World modeling is defined as predicting the next reasonable state of the world given an action, marking a shift from the previous paradigm of next word prediction [5][6][9]. - The current hype around world models is primarily focused on AI video applications, but the real breakthrough is expected in physical AI by 2026 [7][10]. Group 2: Implications for Robotics - The article emphasizes that world models will serve as a foundation for robotics and multimodal AI, enabling a new reasoning form based on visual space rather than language [10][25][45]. - The transition from pixel-based models to physical action generation remains challenging, requiring advancements in data and computational needs [41][42]. Group 3: Visual-Centric Reasoning - Visual reasoning is highlighted as a crucial aspect, where geometric and motion simulations can facilitate reasoning processes without relying on language [43][46]. - The article draws parallels with biological intelligence, suggesting that high dexterity in physical tasks does not necessarily depend on language skills, as exemplified by primates [19][21][46]. Group 4: Industry Developments - Major players like Google and NVIDIA are investing in world modeling technologies, with significant funding rounds reported for startups like World Labs and AMI Labs [40][47]. - The article suggests that 2026 may mark a shift away from language models in robotics, focusing instead on building native systems that leverage visual capabilities [46].