Core Viewpoint - Ant Group's Lingbo has released and open-sourced four embodied intelligence models, marking a significant step in its AGI strategy, extending from the digital to the physical world [1] Group 1: Model Development and Features - The newly released embodied world model LingBot-VA introduces a self-regressive video-action world modeling framework, integrating large-scale video generation capabilities with robot control [2] - LingBot-World offers real-time interaction with generated worlds, providing a high-fidelity, dynamic, and controllable "digital rehearsal space" for embodied intelligence, autonomous driving, and game development [2] - The embodied large model LingBot-VLA has been pre-trained with over 20,000 hours of real machine data, covering nine mainstream dual-arm robot configurations, and aims to advance embodied intelligence into a reusable and scalable phase [3] Group 2: Challenges and Solutions - A major challenge in embodied intelligence is the lack of physical world data, which leads to reliance on pre-trained models from the digital world [4] - Ant Group's research indicates that augmenting digital pre-training with physical world data significantly enhances the capabilities of embodied models [4] - The team combines various pre-training methods from the digital world to address scene understanding and logical reasoning, continuously exploring the limits of embodied intelligence [4] Group 3: Future Directions and Collaborations - Ant Group emphasizes the importance of training foundational models in the embodied field rather than fine-tuning for specific scenarios, focusing on real-world applications to drive model iteration [5] - The company is collaborating with various industry partners, including a strategic partnership with Orbbec to launch a new generation of depth cameras based on the LingBot-Depth model [5] - The goal is to achieve "one-shot" capabilities in embodied models, allowing them to complete tasks with high success rates after observing human demonstrations [6] Group 4: AGI Landscape and Open Source Strategy - AGI is a key competitive focus for global tech companies, with three main paths: language models, multimodal generation models, and embodied intelligence models [7] - Ant Group has made significant strides in these areas, releasing a range of models, including the trillion-parameter thinking model Ring-1T and the trillion-parameter general language model Ling-1T, forming a comprehensive multimodal system [7] - The open-sourcing of Lingbo's four models is a critical practice in building an inclusive AGI ecosystem, aimed at deep integration with real-world applications [8]
蚂蚁灵波CEO朱兴:聚焦具身领域基模训练,为机器人打造更聪明的大脑
财联社·2026-02-04 12:11