VLM(多模态大模型)
Search documents
机器人何时能迎来自己的“DeepSeek时刻”?
虎嗅APP· 2025-10-24 09:53
Core Viewpoint - The article discusses the evolution of AI from "cognition" to "action," emphasizing the importance of experience-driven control in achieving practical applications in autonomous driving and robotics [5][6]. Group 1: Experience-Driven Control - The transition from traditional mathematical modeling to experience-driven control is highlighted as essential for real-world applications in complex environments [9][10]. - Experience-driven control allows AI systems to learn from historical data, enabling effective decision-making without precise mathematical models [10][11]. Group 2: Embodied Intelligence - The complexity of embodied intelligence is noted, with a focus on its higher dimensionality compared to autonomous driving, requiring advanced understanding and generalization capabilities [12][14]. - The current state of embodied intelligence is compared to the "DeepSeek moment," indicating that while significant progress has been made, a breakthrough akin to ChatGPT has not yet occurred [15][16]. Group 3: World Models - World models are identified as crucial for enabling robots to understand and interact with the physical world, serving as a foundational element for embodied intelligence [21][25]. - The article outlines three primary uses of world models: facilitating a feedback loop with the robot's brain, generating trajectory data, and integrating physical understanding into robot operations [25][26]. Group 4: Future Directions - The need for world models in the industry is emphasized, particularly for enhancing the generalization capabilities of robots in complex environments [28][31]. - The article suggests that the evolution of world models is still in its early stages, with ongoing developments aimed at improving their application in robotic training and task execution [29][30].
独家|对话北京人形机器人创新中心CTO唐剑:世界模型有望带来具身智能的“DeepSeek时刻”
Hu Xiu· 2025-10-23 07:06
Core Insights - The article discusses the evolution of AI from "cognition" to "action," highlighting the transition of Tang Jian from academia to industry, particularly in the fields of autonomous driving and embodied intelligence [1][2] - Tang Jian emphasizes the importance of experience-driven control methods over traditional mathematical modeling in complex environments, suggesting that AI systems can learn from historical data to make effective decisions [4][5] - The concept of a "world model" is introduced as essential for embodied intelligence, enabling robots to understand and predict their environment, thus enhancing their operational capabilities [13][14] Summary by Sections Transition from Academia to Industry - Tang Jian, a former tenured professor, shifted focus to practical applications of AI in industry, particularly in autonomous driving and robotics [1][3] - His experience in various companies, including Didi and Midea, has informed his approach to AI-driven system control [3][6] Experience-Driven Control - The article outlines the difference between traditional control methods and experience-driven approaches, with the latter relying on data and historical experiences rather than precise mathematical models [4][5] - This experience-driven philosophy is evident in autonomous driving applications, where end-to-end control merges perception, planning, and control into a single learning process [6][7] Embodied Intelligence and World Models - Tang Jian argues that embodied intelligence presents a higher complexity than autonomous driving, requiring robots to manage multiple joints and navigate dynamic environments [7][8] - The world model is described as a critical component for robots to understand and interact with the physical world, enabling them to perform tasks that require nuanced understanding and adaptability [14][15] - The article highlights the need for a world model to facilitate the development of robots that can generalize across various tasks and environments, which is crucial for their deployment in real-world scenarios [21][22] Future Directions and Challenges - The discussion includes the potential for world models to achieve a "DeepSeek moment" in embodied intelligence, drawing parallels to breakthroughs in AI performance under limited resources [9][10] - Tang Jian acknowledges the current limitations in data and model architecture, indicating that further iterations and improvements are necessary for the field to progress [2][13] - The article concludes with the assertion that the world model is not just a technical choice but a fundamental requirement for the advancement of embodied intelligence [13][22]