Workflow
Gemini Robotics 1.5
icon
Search documents
赵何娟独家对话李飞飞:“我信仰的是人类,不是AI”
Xin Lang Cai Jing· 2025-12-22 05:27
炒股就看金麒麟分析师研报,权威,专业,及时,全面,助您挖掘潜力主题机会! 来源:Barrons巴伦 最新一期'赵何娟Talk'里,李飞飞教授认为,从"语言生成"到"世界生成",空间智能将在两年内迎来应 用级爆发——但AI永远只是工具,选择权应该始终在我们人类手里。 作者|赵何娟 一切进展都已经比一年前大家的预期要快了很多。李飞飞也在钛媒体这期'赵何娟Talk'里对话里透露, 从"语言生成"到"世界生成",空间智能将在两年内迎来应用级爆发。 随着2025年渐入尾声,有着"AI教母"之称的斯坦福大学教授李飞飞,带着她创立的World labs迎来了一 波又一波的新进展,包括首款商用"世界模型"Marble的发布,这开始让大家终于意识到,原来"世界模 型"并非只是概念,而已经是真实有用的。 回想我第一次见李飞飞教授,已经可以追溯到2017年,在斯坦福大学教学楼内。那一年,刚刚定居硅谷 的陈天桥先生向我和其他几位老朋友介绍了李飞飞教授,他当时特别提到:这是美国最杰出的华人科学 家之一。那时,李飞飞教授发起的ImageNet行动还在如火如荼的进行。我也第一次在与飞飞教授的见面 和交流中学到了一个新的概念:为什么是数据集 ...
机器人行业点评报告:GoogleDeepMind加大布局机器人项目,软硬件同步发力
行 业 及 产 业 机械设备 行 业 研 究 / 行 业 点 评 证 券 研 究 报 王珂 A0230521120002 wangke@swsresearch.com 胡书捷 A0230524070007 husj@swsresearch.com 联系人 胡书捷 A0230524070007 husj@swsresearch.com 告 - 证券分析师 ⚫ Google DeepMind 加大机器人布局,软硬件同步发力。DeepMind CEO Hassabis 在 采访中谈到,希望构建一个以 Gemini 为基础的通用 AI 系统,能配置各种物理形态,包 括人形、四足、轮式等;即把 Gemini 做成机器人界的安卓,他预测,AI 驱动的机器人 技术将在未来几年内迎来突破性时刻。当前机器人软硬件技术尚未解耦,软件的开发脱 离不了硬件,为此,Google DeepMind 聘请波士顿动力前 CTO Aaron Saunders 担 任硬件工程副总裁。Saunders 表示,他将致力于解决在物理世界中实现 AGI 全部潜力 的基础硬件问题。 ⚫ DeepMind 内部已启动 Gemini Robotics 项目, ...
清北推出Motion Transfer,机器人直接从人类数据中端到端学习技能
具身智能之心· 2025-11-07 00:05
Core Insights - The article discusses the release of Gemini Robotics 1.5 by Google DeepMind, highlighting its Motion Transfer Mechanism (MT) for transferring skills between different robots without retraining [1][2] - A collaborative team from Tsinghua University, Peking University, Wuhan University, and Shanghai Jiao Tong University has developed a new framework called MotionTrans, which enables zero-shot skill transfer from humans to robots using VR data [2][4] MotionTrans Framework - MotionTrans is an end-to-end, zero-shot, multi-task skill transfer framework that allows robots to learn human skills directly from VR data without prior demonstrations [4][7] - The framework supports zero-shot transfer, meaning robots can learn tasks like pouring water and plugging/unplugging devices solely from human data collected via VR [7][16] - It also allows for fine-tuning with a small number of robot data samples (5-20), significantly improving success rates for 13 human skills [7][17] Technical Details - The MotionTrans framework is designed to be architecture-agnostic, allowing it to be integrated with popular models like Diffusion Policy and VLA [7][10] - The team developed a human data collection system that captures first-person video, head movement, wrist poses, and hand actions, which are then transformed into a format suitable for robots [9][10] - The framework employs techniques like coordinate transformation and hand retargeting to bridge the gap between human and robot actions [10][11] Performance Evaluation - In zero-shot evaluations, the robot achieved an average success rate of 20% across 13 tasks, with some tasks like Pick-and-Place reaching success rates of 60%-80% [14][16] - After fine-tuning with a small number of robot trajectories, the average success rate improved to approximately 50% with 5 trajectories and up to 80% with 20 trajectories [17][18] - The results indicate that even tasks with initially zero success rates showed the model could learn the correct action direction, demonstrating the framework's ability to capture task semantics [14][22] Conclusion - MotionTrans has proven that even advanced models can learn new skills under zero-robot demonstration conditions using only human VR data, changing the perception of human data from a supplementary role to a primary one in skill acquisition [22][23] - The team has open-sourced all data, code, and models to support further research in this area [23]
清北联合推出Motion Transfer,比肩Gemini Robotics,让机器人直接从人类数据中端到端学习技能
机器之心· 2025-11-05 04:15
Core Insights - The article discusses the release of Gemini Robotics 1.5 by Google DeepMind, highlighting its Motion Transfer Mechanism (MT) which allows skill transfer between different robot forms without retraining [2] - A collaborative team from Tsinghua University, Peking University, Wuhan University, and Shanghai Jiao Tong University has developed a new paradigm for zero-shot action transfer from humans to robots, releasing a comprehensive technical report and open-source code [3] MotionTrans Framework - MotionTrans is an end-to-end, zero-shot RGB-to-Action skill transfer framework that enables robots to learn human skills without prior demonstrations [8] - The framework includes a self-developed human data collection system using VR devices, capturing first-person videos, head movements, wrist poses, and hand actions [9] Implementation of MotionTrans - The framework allows for zero-shot transfer, enabling robots to learn tasks like pouring water and unplugging devices using only human VR data, achieving a 20% average success rate across 13 tasks [12][17] - Fine-tuning with a small number of robot data (5-20 samples) can increase the success rate to approximately 50% and 80%, respectively [20] Data and Training Techniques - The team utilized a large-scale human-robot dataset with over 3200 trajectories and 15 tasks, demonstrating the framework's ability to learn from human data alone [14][16] - The approach includes techniques like hand redirection and unified action normalization to bridge the gap between human and robot actions [10][13] Results and Contributions - MotionTrans has proven that even advanced end-to-end models can unlock new skills under zero-robot demonstration conditions, changing perceptions of human data from a supplementary role to a primary one [25] - The team has open-sourced all data, code, and models to support future research in this area [26]
人形机器人前沿:大型科技公司 “投身机器人领域”…… 软银 ABB、苹果、Meta、擎天柱 v3Humanoid Horizons Big Tech 'Doing the Robot'... SoftbankABB, Apple, Meta, Optimus v3
2025-10-27 12:06
Summary of Key Points from the Conference Call Industry Overview - The focus is on the humanoid robotics and physical AI sector, with major players including SoftBank, ABB, Apple, Meta, Google, and Tesla [1][2][3][5][6]. Core Developments 1. **SoftBank's Acquisition of ABB Robotics**: - SoftBank agreed to purchase ABB's Robotics division for $5.4 billion, shifting from a previous plan to spin off the business due to competition from Chinese firms [5][39]. - Masayoshi Son, SoftBank's founder, emphasized that "SoftBank's next frontier is Physical AI," aiming to integrate AI and robotics to drive innovation [5][39]. 2. **Meta's Humanoid Robot Initiative**: - Meta is developing a humanoid robot called 'Meta-Bot' and aims to become a software/AI provider for various hardware developers [5][39]. - The company has formed a robotics team to create datasets and world models for enhanced robot capabilities [5][39]. 3. **Google's Robotics Advancements**: - Google DeepMind released the Gemini Robotics series, enhancing robots' ability to perform complex tasks through embodied reasoning [5][46]. - Google and Meta are both building world models that allow agents to interact in simulations, with potential applications in robotics [5][6]. 4. **Tesla's Optimus Robot**: - Tesla plans to unveil the fully redesigned Optimus v3 in Q1 2026, with ambitious production goals of 1 million units for v3 and up to 100 million for future versions [7][53]. - CEO Elon Musk highlighted the challenges in developing humanoid robots, particularly in creating dextrous hands [7][53]. 5. **China's Dominance in Industrial Robotics**: - China accounted for 54% of global industrial robot installations in 2024, marking a significant increase from 26% a decade ago [7][8]. Financial Insights - The Humanoid 100 index has increased by 27% since its inception on February 6, 2025, outperforming the S&P 500 and other indices [11]. - Tesla's stock rating is currently "Overweight" with a price target of $410, while its market cap stands at approximately $1.58 trillion [3][7]. Notable Partnerships and Funding 1. **Figure AI's Series C Funding**: - Figure AI raised $1 billion in a Series C round, valuing the company at $39 billion, aimed at scaling humanoid robots for home and commercial use [29]. 2. **Strategic Partnerships**: - Figure AI partnered with Brookfield to build a real-world database for its Helix VLA model [35]. - Telexistence and Seven-Eleven Japan are collaborating to deploy humanoid robots in stores by 2029 [35]. 3. **Apple's Robotics Development**: - Apple is reportedly collaborating with BYD to manufacture AI-enabled robots, with products expected to launch in 2026 and 2027 [7][39]. Emerging Trends and Future Outlook - The development of humanoid robots is seen as a significant opportunity, with many companies investing heavily in AI and robotics [5][6][39]. - The integration of AI with robotics is expected to drive advancements in various sectors, including manufacturing, logistics, and consumer applications [5][39]. Conclusion - The humanoid robotics and physical AI industry is rapidly evolving, with significant investments and developments from major tech companies. The competitive landscape is intensifying, particularly with China's growing influence in industrial robotics. The future of humanoid robots appears promising, with potential applications across various sectors.
Google最新!Gemini Robotics 1.5:通用机器人领域的突破进展
具身智能之心· 2025-10-16 00:03
Core Insights - The article discusses the breakthrough advancements in the field of general robotics presented in the "Gemini Robotics 1.5" report by Google DeepMind, highlighting the innovative models and their capabilities in perception, reasoning, and action [1][39]. Technical Architecture - The core architecture of Gemini Robotics 1.5 consists of a "Coordinator + Action Model" framework, enabling a functional closed loop through multimodal data interaction [2]. - The Coordinator (Gemini Robotics-ER 1.5) processes user inputs and environmental feedback, controlling the overall task flow and breaking down complex tasks into executable sub-steps [2]. - The Action Model (Gemini Robotics 1.5) translates natural language sub-instructions into robot action trajectories, supporting direct control of various robot forms without additional adaptation [2][4]. Motion Transfer Mechanism - The Motion Transfer (MT) mechanism addresses the "data silo" issue in traditional robotics by enabling skill generalization across different robot forms, validated through experimental comparisons [5][7]. - The Gemini Robotics 1.5 model, utilizing mixed data from multiple robot types, demonstrated superior performance in skill transfer compared to single-form training approaches [7][8]. Performance Validation - The introduction of a "thinking VLA" mechanism allows for a two-step process in task execution, enhancing performance in multi-step tasks by breaking down complex instructions into manageable sub-steps [8][11]. - Quantitative results show a performance improvement of approximately 21.8% in task completion scores when the thinking mode is activated [11]. - The model's ability to generalize skills across different robot forms was evidenced by significant performance gains in scenarios with limited training data [13][28]. Safety Mechanisms - The ER model incorporates safety mechanisms that assess risks and provide intervention strategies in various scenarios, ensuring safe task execution [36][38]. - Performance comparisons indicate that ER 1.5 excels in risk identification and mitigation, demonstrating a high accuracy rate in predicting potential hazards [36][38]. Conclusion and Future Directions - The Gemini Robotics 1.5 model represents a significant advancement in universal control for multiple robots, reducing deployment costs and enhancing task execution capabilities [39]. - The integration of reasoning and action is identified as a critical factor for achieving complex task completion, emphasizing the importance of the ER and VLA collaboration [39].
具身智能机器人实验平台:自然语言交互学习
Sou Hu Cai Jing· 2025-10-13 09:35
Core Insights - The embodied intelligent robot experimental platform focuses on natural language interaction learning, aiming to enable robots to understand natural language commands and interact dynamically with the physical environment through multimodal fusion and deep learning technologies [2][3]. Multimodal Perception and Fusion Technology - The platform integrates various sensors such as RGB-D cameras, LiDAR, microphones, and flexible sensors to achieve complex tasks like object sorting and massage with high precision [4]. - Tencent's multimodal neural SLAM model combines vision and language for environmental exploration, improving generalization performance by 20% in the ALFRED benchmark [4]. Natural Language Interaction Framework - The DialFRED benchmark developed by Tencent includes 53,000 manually annotated dialogues, achieving a success rate of 33.6% in active interaction, significantly higher than the passive model's 18.3% [4]. Layered Decision-Making and Control - The LangWBC framework from Berkeley maps language commands to robot actions using conditional variational autoencoders, demonstrating robust performance even under external disturbances [4]. Large-Scale Multimodal Datasets - The dataset released by the National Land and Resources Innovation Center includes 279 tasks and supports cross-body strategy transfer, covering various real-world scenarios [4]. Efficient Training Methods - Models like CLIP and PaLM-E utilize extensive pre-training on multimodal data to enhance robustness and zero-shot task generalization [4]. Applications in Healthcare and Industry - Huawei's CloudRobo platform enables remote surgeries with a latency of only 38ms, while its intelligent disinfection robots achieve 100% coverage with a 60% reduction in labor costs [4]. - The dual-arm robot platform from Shanghai Jiao Tong University achieves 98% accuracy in industrial part sorting through video imitation learning [4]. Challenges and Frontiers - The integration of symbolic reasoning with neural networks is being explored to enhance decision-making transparency and effectiveness in complex tasks [4]. - Ethical considerations and safety mechanisms are critical in sensitive fields like healthcare, necessitating ongoing improvements in compliance and operational safety [4].
光会“看”和“说”还不够,还得会“算”!Tool-Use+强化学习:TIGeR让机器人实现精准操作
具身智能之心· 2025-10-11 16:02
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in accurately interpreting and executing spatial commands in robotics, emphasizing the need for precise geometric reasoning and tool integration [2][5]. Group 1: TIGeR Framework - The Tool-Integrated Geometric Reasoning (TIGeR) framework enhances VLMs by integrating tool usage and reinforcement learning to improve their ability to perform precise calculations in a three-dimensional space [2][6]. - TIGeR allows AI models to transition from qualitative perception to quantitative computation, addressing the core pain points of existing VLMs [2][7]. Group 2: Advantages of TIGeR - TIGeR provides precise localization by integrating depth information and camera parameters, enabling the accurate conversion of commands like "10 centimeters above" into three-dimensional coordinates [7]. - The framework supports multi-view unified reasoning, allowing information from various perspectives to be merged and reasoned within a consistent world coordinate system [7]. - The model's reasoning process is transparent, making it easier to debug and optimize by clearly showing the tools used, parameters input, and results obtained [7]. Group 3: Training Process - The training of TIGeR involves a two-phase process: first, supervised learning to teach basic tool usage and reasoning chains, followed by reinforcement learning to refine the model's tool usage skills through a hierarchical reward mechanism [8][10]. - The hierarchical reward mechanism evaluates not only the correctness of the final answer but also the accuracy of the process, including tool selection and parameter precision [8]. Group 4: Data Utilization - The TIGeR-300K dataset, consisting of 300,000 samples, was created to train the model in solving geometric problems, ensuring both accuracy and diversity in the tasks covered [10][13]. - The dataset construction involved template-based generation and large model rewriting to enhance generalization and flexibility, ensuring the model can handle complex real-world instructions [13]. Group 5: Performance Metrics - TIGeR outperforms other leading VLMs in spatial understanding benchmarks, achieving scores such as 93.85 in 2D-Rel and 96.33 in 3D-Depth [10][14]. - The model's performance in various spatial reasoning tasks demonstrates its capability to execute operations that require precise three-dimensional positioning, which other models struggle to achieve [16].
综述丨9月全球人工智能领域发展盘点
Xin Hua She· 2025-10-01 08:53
Core Insights - The global AI sector is experiencing significant advancements in key technologies such as large models and chips, alongside the evolution of application technologies, which are increasingly driving the intelligent and digital upgrade of various industries [1] Technological Advancements - DeepSeek launched the open-source AI large model DeepSeek-R1, recognized as the first major language model to undergo peer review, utilizing a "pure reinforcement learning" method for training [2] - Google's DeepMind introduced Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, AI models designed for robots that enhance their ability to generate action commands and reason about the physical world [2] - OpenAI released Sora 2, an upgraded audio-video generation model, which improves accuracy and realism while introducing new features for interactive video experiences [3] - The German Cancer Research Center developed Delphi-2M, an AI tool that predicts disease risk over 20 years based on various health factors [3] - NVIDIA announced a $100 billion investment in OpenAI to co-develop large-scale data centers, while a collaboration between OpenAI, Oracle, and SoftBank aims to establish five AI data centers in the U.S. [3] Integration into Economic and Social Development - The deployment of AI technologies across various sectors is transforming production and lifestyle, with significant partnerships emerging, such as ASML's collaboration with Mistral AI to enhance product development and operational efficiency [4] - The Universal Postal Union introduced an AI agent for analyzing postal network data to improve coverage and reliability at the national level [5] - AI technologies were highlighted at various exhibitions, showcasing practical applications and future trends, including automated jewelry photography and smart home appliances [5] Ongoing International Cooperation - There is a growing consensus among nations to enhance international cooperation in AI to foster innovation and ensure the technology benefits a wider population [6] - The Shanghai Cooperation Organization emphasized the importance of AI in international decision-making and infrastructure development [6] - The China-ASEAN AI Ministerial Roundtable announced the establishment of an AI application cooperation center [6] - A high-level meeting at the UN focused on establishing a global dialogue mechanism for AI governance, aiming to create a safe and reliable AI system based on international law and human rights [6]
综述|9月全球人工智能领域发展盘点
Xin Hua She· 2025-10-01 05:02
Key Developments in AI Technology - DeepSeek launched the open-source AI model DeepSeek-R1, recognized as the first significant peer-reviewed large language model, utilizing a "pure reinforcement learning" method for training [1] - Google's DeepMind introduced Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, AI models designed for robots to enhance their ability to generate action commands and reason about the physical world [1] - OpenAI released Sora 2, an upgraded audio-video generation model, which significantly improves accuracy and realism while adding features for synchronized dialogue and sound effects [2] AI Applications in Various Industries - AI is rapidly transforming various sectors, with companies like ASML collaborating with Mistral AI to integrate AI into product development and operations to shorten time-to-market [3] - The Universal Postal Union launched an AI agent to analyze postal network data, providing insights for policy and operational improvements [3] - AI technologies were highlighted at multiple exhibitions, showcasing practical applications and future trends across industries [4] International Cooperation in AI Development - The Shanghai Cooperation Organization emphasized the importance of international collaboration in AI, supporting the UN's role in AI decision-making and infrastructure development [5] - The China-ASEAN AI Ministerial Roundtable announced the establishment of a cooperation center for AI applications [5] - The UN initiated a global dialogue mechanism for AI governance, aiming to create a secure and reliable AI system based on international law and human rights [5]