Workflow
Gemini Robotics 1.5
icon
Search documents
清北推出Motion Transfer,机器人直接从人类数据中端到端学习技能
具身智能之心· 2025-11-07 00:05
作者丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文的作者来自清华大学、北京大学、武汉大学和上海交通大学,主要作者为清华大学硕士生袁承博、武汉大学本科生周睿和北京大学博士生刘梦真,通讯作者 为清华大学交叉信息研究院的高阳助理教授。 近期,Google DeepMind 发布新一代具身大模型 Gemini Robotics 1.5,其核心亮点之一便是被称为 Motion Transfer Mechanism(MT)的端到端动作迁移算法 —— 无需重新训练,即可把不同形态机器人的技能「搬」到自己身上。不过,官方技术报告对此仅一笔带过,细节成谜。 正当业内还在揣摩 MT 的「庐山真面目」时, 清华、北大等高校联合团队率先把同类思路推到更高维度:直接把「动作迁移」做到人类 VR 数据上! 更难得的是,他们第一时间放出完整技术报告、训练代码与权重,全部开源可复现。下面带你快速拆解这项「人类→机器人」零样本动作迁移新范式。 什么是 MotionT ...
清北联合推出Motion Transfer,比肩Gemini Robotics,让机器人直接从人类数据中端到端学习技能
机器之心· 2025-11-05 04:15
Core Insights - The article discusses the release of Gemini Robotics 1.5 by Google DeepMind, highlighting its Motion Transfer Mechanism (MT) which allows skill transfer between different robot forms without retraining [2] - A collaborative team from Tsinghua University, Peking University, Wuhan University, and Shanghai Jiao Tong University has developed a new paradigm for zero-shot action transfer from humans to robots, releasing a comprehensive technical report and open-source code [3] MotionTrans Framework - MotionTrans is an end-to-end, zero-shot RGB-to-Action skill transfer framework that enables robots to learn human skills without prior demonstrations [8] - The framework includes a self-developed human data collection system using VR devices, capturing first-person videos, head movements, wrist poses, and hand actions [9] Implementation of MotionTrans - The framework allows for zero-shot transfer, enabling robots to learn tasks like pouring water and unplugging devices using only human VR data, achieving a 20% average success rate across 13 tasks [12][17] - Fine-tuning with a small number of robot data (5-20 samples) can increase the success rate to approximately 50% and 80%, respectively [20] Data and Training Techniques - The team utilized a large-scale human-robot dataset with over 3200 trajectories and 15 tasks, demonstrating the framework's ability to learn from human data alone [14][16] - The approach includes techniques like hand redirection and unified action normalization to bridge the gap between human and robot actions [10][13] Results and Contributions - MotionTrans has proven that even advanced end-to-end models can unlock new skills under zero-robot demonstration conditions, changing perceptions of human data from a supplementary role to a primary one [25] - The team has open-sourced all data, code, and models to support future research in this area [26]
人形机器人前沿:大型科技公司 “投身机器人领域”…… 软银 ABB、苹果、Meta、擎天柱 v3Humanoid Horizons Big Tech 'Doing the Robot'... SoftbankABB, Apple, Meta, Optimus v3
2025-10-27 12:06
Summary of Key Points from the Conference Call Industry Overview - The focus is on the humanoid robotics and physical AI sector, with major players including SoftBank, ABB, Apple, Meta, Google, and Tesla [1][2][3][5][6]. Core Developments 1. **SoftBank's Acquisition of ABB Robotics**: - SoftBank agreed to purchase ABB's Robotics division for $5.4 billion, shifting from a previous plan to spin off the business due to competition from Chinese firms [5][39]. - Masayoshi Son, SoftBank's founder, emphasized that "SoftBank's next frontier is Physical AI," aiming to integrate AI and robotics to drive innovation [5][39]. 2. **Meta's Humanoid Robot Initiative**: - Meta is developing a humanoid robot called 'Meta-Bot' and aims to become a software/AI provider for various hardware developers [5][39]. - The company has formed a robotics team to create datasets and world models for enhanced robot capabilities [5][39]. 3. **Google's Robotics Advancements**: - Google DeepMind released the Gemini Robotics series, enhancing robots' ability to perform complex tasks through embodied reasoning [5][46]. - Google and Meta are both building world models that allow agents to interact in simulations, with potential applications in robotics [5][6]. 4. **Tesla's Optimus Robot**: - Tesla plans to unveil the fully redesigned Optimus v3 in Q1 2026, with ambitious production goals of 1 million units for v3 and up to 100 million for future versions [7][53]. - CEO Elon Musk highlighted the challenges in developing humanoid robots, particularly in creating dextrous hands [7][53]. 5. **China's Dominance in Industrial Robotics**: - China accounted for 54% of global industrial robot installations in 2024, marking a significant increase from 26% a decade ago [7][8]. Financial Insights - The Humanoid 100 index has increased by 27% since its inception on February 6, 2025, outperforming the S&P 500 and other indices [11]. - Tesla's stock rating is currently "Overweight" with a price target of $410, while its market cap stands at approximately $1.58 trillion [3][7]. Notable Partnerships and Funding 1. **Figure AI's Series C Funding**: - Figure AI raised $1 billion in a Series C round, valuing the company at $39 billion, aimed at scaling humanoid robots for home and commercial use [29]. 2. **Strategic Partnerships**: - Figure AI partnered with Brookfield to build a real-world database for its Helix VLA model [35]. - Telexistence and Seven-Eleven Japan are collaborating to deploy humanoid robots in stores by 2029 [35]. 3. **Apple's Robotics Development**: - Apple is reportedly collaborating with BYD to manufacture AI-enabled robots, with products expected to launch in 2026 and 2027 [7][39]. Emerging Trends and Future Outlook - The development of humanoid robots is seen as a significant opportunity, with many companies investing heavily in AI and robotics [5][6][39]. - The integration of AI with robotics is expected to drive advancements in various sectors, including manufacturing, logistics, and consumer applications [5][39]. Conclusion - The humanoid robotics and physical AI industry is rapidly evolving, with significant investments and developments from major tech companies. The competitive landscape is intensifying, particularly with China's growing influence in industrial robotics. The future of humanoid robots appears promising, with potential applications across various sectors.
Google最新!Gemini Robotics 1.5:通用机器人领域的突破进展
具身智能之心· 2025-10-16 00:03
Core Insights - The article discusses the breakthrough advancements in the field of general robotics presented in the "Gemini Robotics 1.5" report by Google DeepMind, highlighting the innovative models and their capabilities in perception, reasoning, and action [1][39]. Technical Architecture - The core architecture of Gemini Robotics 1.5 consists of a "Coordinator + Action Model" framework, enabling a functional closed loop through multimodal data interaction [2]. - The Coordinator (Gemini Robotics-ER 1.5) processes user inputs and environmental feedback, controlling the overall task flow and breaking down complex tasks into executable sub-steps [2]. - The Action Model (Gemini Robotics 1.5) translates natural language sub-instructions into robot action trajectories, supporting direct control of various robot forms without additional adaptation [2][4]. Motion Transfer Mechanism - The Motion Transfer (MT) mechanism addresses the "data silo" issue in traditional robotics by enabling skill generalization across different robot forms, validated through experimental comparisons [5][7]. - The Gemini Robotics 1.5 model, utilizing mixed data from multiple robot types, demonstrated superior performance in skill transfer compared to single-form training approaches [7][8]. Performance Validation - The introduction of a "thinking VLA" mechanism allows for a two-step process in task execution, enhancing performance in multi-step tasks by breaking down complex instructions into manageable sub-steps [8][11]. - Quantitative results show a performance improvement of approximately 21.8% in task completion scores when the thinking mode is activated [11]. - The model's ability to generalize skills across different robot forms was evidenced by significant performance gains in scenarios with limited training data [13][28]. Safety Mechanisms - The ER model incorporates safety mechanisms that assess risks and provide intervention strategies in various scenarios, ensuring safe task execution [36][38]. - Performance comparisons indicate that ER 1.5 excels in risk identification and mitigation, demonstrating a high accuracy rate in predicting potential hazards [36][38]. Conclusion and Future Directions - The Gemini Robotics 1.5 model represents a significant advancement in universal control for multiple robots, reducing deployment costs and enhancing task execution capabilities [39]. - The integration of reasoning and action is identified as a critical factor for achieving complex task completion, emphasizing the importance of the ER and VLA collaboration [39].
具身智能机器人实验平台:自然语言交互学习
Sou Hu Cai Jing· 2025-10-13 09:35
Core Insights - The embodied intelligent robot experimental platform focuses on natural language interaction learning, aiming to enable robots to understand natural language commands and interact dynamically with the physical environment through multimodal fusion and deep learning technologies [2][3]. Multimodal Perception and Fusion Technology - The platform integrates various sensors such as RGB-D cameras, LiDAR, microphones, and flexible sensors to achieve complex tasks like object sorting and massage with high precision [4]. - Tencent's multimodal neural SLAM model combines vision and language for environmental exploration, improving generalization performance by 20% in the ALFRED benchmark [4]. Natural Language Interaction Framework - The DialFRED benchmark developed by Tencent includes 53,000 manually annotated dialogues, achieving a success rate of 33.6% in active interaction, significantly higher than the passive model's 18.3% [4]. Layered Decision-Making and Control - The LangWBC framework from Berkeley maps language commands to robot actions using conditional variational autoencoders, demonstrating robust performance even under external disturbances [4]. Large-Scale Multimodal Datasets - The dataset released by the National Land and Resources Innovation Center includes 279 tasks and supports cross-body strategy transfer, covering various real-world scenarios [4]. Efficient Training Methods - Models like CLIP and PaLM-E utilize extensive pre-training on multimodal data to enhance robustness and zero-shot task generalization [4]. Applications in Healthcare and Industry - Huawei's CloudRobo platform enables remote surgeries with a latency of only 38ms, while its intelligent disinfection robots achieve 100% coverage with a 60% reduction in labor costs [4]. - The dual-arm robot platform from Shanghai Jiao Tong University achieves 98% accuracy in industrial part sorting through video imitation learning [4]. Challenges and Frontiers - The integration of symbolic reasoning with neural networks is being explored to enhance decision-making transparency and effectiveness in complex tasks [4]. - Ethical considerations and safety mechanisms are critical in sensitive fields like healthcare, necessitating ongoing improvements in compliance and operational safety [4].
光会“看”和“说”还不够,还得会“算”!Tool-Use+强化学习:TIGeR让机器人实现精准操作
具身智能之心· 2025-10-11 16:02
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in accurately interpreting and executing spatial commands in robotics, emphasizing the need for precise geometric reasoning and tool integration [2][5]. Group 1: TIGeR Framework - The Tool-Integrated Geometric Reasoning (TIGeR) framework enhances VLMs by integrating tool usage and reinforcement learning to improve their ability to perform precise calculations in a three-dimensional space [2][6]. - TIGeR allows AI models to transition from qualitative perception to quantitative computation, addressing the core pain points of existing VLMs [2][7]. Group 2: Advantages of TIGeR - TIGeR provides precise localization by integrating depth information and camera parameters, enabling the accurate conversion of commands like "10 centimeters above" into three-dimensional coordinates [7]. - The framework supports multi-view unified reasoning, allowing information from various perspectives to be merged and reasoned within a consistent world coordinate system [7]. - The model's reasoning process is transparent, making it easier to debug and optimize by clearly showing the tools used, parameters input, and results obtained [7]. Group 3: Training Process - The training of TIGeR involves a two-phase process: first, supervised learning to teach basic tool usage and reasoning chains, followed by reinforcement learning to refine the model's tool usage skills through a hierarchical reward mechanism [8][10]. - The hierarchical reward mechanism evaluates not only the correctness of the final answer but also the accuracy of the process, including tool selection and parameter precision [8]. Group 4: Data Utilization - The TIGeR-300K dataset, consisting of 300,000 samples, was created to train the model in solving geometric problems, ensuring both accuracy and diversity in the tasks covered [10][13]. - The dataset construction involved template-based generation and large model rewriting to enhance generalization and flexibility, ensuring the model can handle complex real-world instructions [13]. Group 5: Performance Metrics - TIGeR outperforms other leading VLMs in spatial understanding benchmarks, achieving scores such as 93.85 in 2D-Rel and 96.33 in 3D-Depth [10][14]. - The model's performance in various spatial reasoning tasks demonstrates its capability to execute operations that require precise three-dimensional positioning, which other models struggle to achieve [16].
综述丨9月全球人工智能领域发展盘点
Xin Hua She· 2025-10-01 08:53
Core Insights - The global AI sector is experiencing significant advancements in key technologies such as large models and chips, alongside the evolution of application technologies, which are increasingly driving the intelligent and digital upgrade of various industries [1] Technological Advancements - DeepSeek launched the open-source AI large model DeepSeek-R1, recognized as the first major language model to undergo peer review, utilizing a "pure reinforcement learning" method for training [2] - Google's DeepMind introduced Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, AI models designed for robots that enhance their ability to generate action commands and reason about the physical world [2] - OpenAI released Sora 2, an upgraded audio-video generation model, which improves accuracy and realism while introducing new features for interactive video experiences [3] - The German Cancer Research Center developed Delphi-2M, an AI tool that predicts disease risk over 20 years based on various health factors [3] - NVIDIA announced a $100 billion investment in OpenAI to co-develop large-scale data centers, while a collaboration between OpenAI, Oracle, and SoftBank aims to establish five AI data centers in the U.S. [3] Integration into Economic and Social Development - The deployment of AI technologies across various sectors is transforming production and lifestyle, with significant partnerships emerging, such as ASML's collaboration with Mistral AI to enhance product development and operational efficiency [4] - The Universal Postal Union introduced an AI agent for analyzing postal network data to improve coverage and reliability at the national level [5] - AI technologies were highlighted at various exhibitions, showcasing practical applications and future trends, including automated jewelry photography and smart home appliances [5] Ongoing International Cooperation - There is a growing consensus among nations to enhance international cooperation in AI to foster innovation and ensure the technology benefits a wider population [6] - The Shanghai Cooperation Organization emphasized the importance of AI in international decision-making and infrastructure development [6] - The China-ASEAN AI Ministerial Roundtable announced the establishment of an AI application cooperation center [6] - A high-level meeting at the UN focused on establishing a global dialogue mechanism for AI governance, aiming to create a safe and reliable AI system based on international law and human rights [6]
综述|9月全球人工智能领域发展盘点
Xin Hua She· 2025-10-01 05:02
Key Developments in AI Technology - DeepSeek launched the open-source AI model DeepSeek-R1, recognized as the first significant peer-reviewed large language model, utilizing a "pure reinforcement learning" method for training [1] - Google's DeepMind introduced Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, AI models designed for robots to enhance their ability to generate action commands and reason about the physical world [1] - OpenAI released Sora 2, an upgraded audio-video generation model, which significantly improves accuracy and realism while adding features for synchronized dialogue and sound effects [2] AI Applications in Various Industries - AI is rapidly transforming various sectors, with companies like ASML collaborating with Mistral AI to integrate AI into product development and operations to shorten time-to-market [3] - The Universal Postal Union launched an AI agent to analyze postal network data, providing insights for policy and operational improvements [3] - AI technologies were highlighted at multiple exhibitions, showcasing practical applications and future trends across industries [4] International Cooperation in AI Development - The Shanghai Cooperation Organization emphasized the importance of international collaboration in AI, supporting the UN's role in AI decision-making and infrastructure development [5] - The China-ASEAN AI Ministerial Roundtable announced the establishment of a cooperation center for AI applications [5] - The UN initiated a global dialogue mechanism for AI governance, aiming to create a secure and reliable AI system based on international law and human rights [5]
华安基金科创板ETF周报:科创板ETF成立五周年 科创芯片指数涨9.05%
Xin Lang Ji Jin· 2025-09-30 02:54
Group 1: Core Insights - The investment ecosystem of the Sci-Tech Innovation Board (STAR Market) is gradually improving, with the number of STAR Market ETFs reaching 102 by September 26, 2025, including 61 newly established this year [1] - The successful IPO of Moore Threads, a full-function GPU company, highlights the STAR Market's support for "hard technology" and signals a new phase of deep integration between finance and technological innovation [1] - The hard technology sector is entering a critical phase of domestic substitution, with breakthroughs in fields like chips and innovative drugs, driven by both policy and capital [1][2] Group 2: Market Performance - The overall performance of the STAR Market has been positive, with the STAR 50 Index rising by 6.47%, the STAR Information Index by 7.76%, and the STAR Chip Index by 9.05% over the past week [3] - The top five industries on the STAR Market are electronics, biomedicine, computers, power equipment, and machinery, collectively accounting for 88.7% of the market capitalization [4] Group 3: Sector Analysis - The new generation information technology sector, particularly the electronic chip industry, is experiencing strong performance due to policy support, technological breakthroughs, and capital inflow [5] - In the storage chip sector, prices for SSDs and memory modules are rising, with DDR4 in short supply, indicating a recovery in industry inventory levels [6] - The high-end equipment manufacturing sector is crucial for enhancing the overall competitiveness of China's manufacturing industry, with ongoing technological advancements and capital investments [6] - The pharmaceutical sector is currently facing a downturn, but there are signs of recovery in medical device tenders and overseas revenue growth for some companies [6] Group 4: ETF Overview - The Sci-Tech Information ETF (588260) reflects the performance of major companies in next-generation information technology, including electronic core and emerging software sectors [7] - The STAR 50 Index (000688) includes 50 representative securities from the STAR Market, reflecting the overall performance of significant sci-tech enterprises [9] - The STAR Chip Index (000685) represents companies involved in semiconductor materials, design, manufacturing, and testing, showcasing the chip industry's performance [10]
【早报】石化化工、有色金属,稳增长方案出台;摩尔线程科创板IPO过会
财联社· 2025-09-28 23:14
Macro News - The People's Bank of China emphasized the importance of utilizing securities, funds, and insurance company swap facilities, as well as stock repurchase and increased re-loans, to maintain capital market stability [3] - In the first eight months, the total profit of industrial enterprises above designated size in China reached 46,929.7 billion yuan, showing a year-on-year growth of 0.9%. In August, profits turned from a decline of 1.5% in the previous month to a growth of 20.4% [3] Industry News - The Ministry of Industry and Information Technology and seven other departments issued a work plan for the non-ferrous metal industry, targeting an average annual growth of around 5% in added value from 2025 to 2026, with a 1.5% average annual growth in the production of ten non-ferrous metals [4] - The National Development and Reform Commission and six other departments released measures to strengthen the cultivation of innovative digital economy enterprises, including the construction of a national integrated computing network [4] - The 2025 classification evaluation of securities firms was released, with 53 companies rated as Class A, 43 as Class B, and 11 as Class C. Among Class A companies, 14 received an AA rating [4] - The Ministry of Commerce, the Ministry of Industry and Information Technology, and other authorities decided to implement export license management for pure electric passenger cars starting January 1, 2026, to promote healthy development in the new energy vehicle trade [4] Company News - Moer Thread's IPO was approved by the Shanghai Stock Exchange's listing committee [6] - Dalian Wanda Group and its legal representative Wang Jianlin were recently restricted from high consumption, attributed to possible information asymmetry in execution [6] - Jin Hai Tong announced that its shareholder Xunuo Investment plans to reduce its stake by no more than 3% [6]