SIASUN-能讲PPT、懂指令！商汤“悟能”平台让机器人“玩转”现实世界｜聚焦世界人工智能大会

Core Insights - The evolution of AI has transitioned from perceptual intelligence to generative intelligence, with future breakthroughs dependent on AI's ability to actively explore and interact with the real world [1][3] Group 1: AI Development Stages - The development of human intelligence is rooted in continuous interaction with the physical world, while machine intelligence has been limited by the finite supply of human knowledge [3] - The rise of deep learning algorithms, such as CNN and ResNet, from 2011 to 2012 led to an explosive growth in perceptual AI, but these models are constrained by their reliance on manually labeled data [3] - The introduction of the Transformer architecture around 2017 to 2018 enabled AI to extract knowledge from natural language, with models like GPT-3 processing text equivalent to 100,000 years of human creative output [3] Group 2: Challenges in AI Evolution - A looming crisis is anticipated as natural language data may be exhausted by 2027 to 2028, while visual data, although abundant, is challenging to extract knowledge from effectively [3] - The production rate of visual data lags behind the growth of computational power, leading to a mismatch in data requirements for models [3] - The next phase of AI development necessitates overcoming the challenge of scarce high-quality interactive data, drawing inspiration from human learning through physical world interactions [3] Group 3: Solutions and Innovations - The high cost of real-world interactions presents a significant barrier, with traditional solutions relying on simulators that often fail to bridge the "Sim-to-Real Gap" [4] - To address these challenges, the company has introduced the "KAIWU" world model, which generates high-quality simulated data by considering temporal and spatial consistency [4] - The "WUNENG" embodied intelligence platform, powered by the company's world model, provides robust perception, visual navigation, and multimodal interaction capabilities for robots and smart devices [6] Group 4: Practical Applications - The "WUNENG" platform enables various hardware to achieve a deep understanding of the world, supporting integration into edge chips for enhanced adaptability [6] - The embodied world model can generate multi-perspective videos while ensuring temporal and spatial consistency, allowing machines to understand, generate, and edit real-world scenarios [8] - Users can interact with the embodied world model through simple prompts, enabling autonomous generation of poses, actions, and commands in real-world contexts [8]