机器人学习

Search documents
通研院团队斩获CoRL 2025 杰出论文奖:UniFP 技术突破足式机器人力-位控制难题,系中国籍团队首次获此殊荣
机器人大讲堂· 2025-10-12 02:08
国际机器人学习大会( Conference on Robot Learning, CoRL )是全球人工智能与机器人领域的顶级学 术会议之一,其研究成果直接引领机器人学习方向的技术前沿。近年来,伴随具身智能热潮的兴起,机器人学 习领域的关注度持续攀升, CoRL 会议也成为全球顶尖团队展示核心突破的关键平台。 在 2025 年 CoRL 会议上,大会共计收到近千篇投稿,经过严格评审,最终收录 264 篇论文 。 其中, 北 京 通 用 人 工 智 能 研 究 院 团 队 的 论 文 " UniFP: Learning a Unified Policy for Force and Position Control in Legged Loco-Manipulation " 从全球顶尖研究中脱颖而出,成功斩获大会最高奖项 —— 杰出 论文奖 。 这一成果具有里程碑意义,是 CoRL 会议创办以来,首次由全中国籍学者团队摘得该奖项,标志 着中国在机器人学习与具身智能领域的研究已跻身全球领先行列。 ▍ UniFP : 足式机器人首个力 - 位统一控制算法 本研究的核心成果是提出了 UniFP ( Unified Fo ...
机器人感知大升级,轻量化注入几何先验,成功率提升31%
3 6 Ke· 2025-09-28 12:09
Core Insights - The article discusses the challenges in enabling AI to truly "understand" the 3D world, particularly in the context of visual language action (VLA) models that rely on 2D image-text data [1][2]. Group 1: VLA Model Limitations - Current VLA models lack the necessary 3D spatial understanding for real-world operations, primarily relying on pre-trained visual language models [1]. - Existing enhancement methods based on explicit depth input face deployment difficulties and precision noise issues [1]. Group 2: Evo-0 Model Introduction - Shanghai Jiao Tong University and the University of Cambridge proposed a lightweight method called Evo-0 to enhance the spatial understanding of VLA models by implicitly injecting 3D geometric priors without requiring explicit depth input or additional sensors [2]. - Evo-0 utilizes the Visual Geometry Grounding Transformer (VGGT) to extract 3D structural information from multi-view RGB images, significantly improving spatial perception capabilities [2][3]. Group 3: Model Architecture and Training - Evo-0 integrates VGGT as a spatial encoder, introducing t3^D tokens that contain depth context and cross-view spatial correspondence [3]. - A cross-attention fusion module is employed to merge 2D visual tokens with 3D tokens, enhancing the understanding of spatial structures and object layouts [3][6]. - The model is trained efficiently by only fine-tuning the fusion module, LoRA adaptation layer, and action expert, reducing computational costs [6]. Group 4: Experimental Results - In RLBench simulation tasks, Evo-0 achieved an average success rate improvement of over 28.88% compared to baseline models, particularly excelling in tasks requiring complex spatial relationships [10][11]. - The robustness of Evo-0 was tested under five different interference conditions, consistently outperforming the baseline model pi0 [12][15]. Group 5: Conclusion - Evo-0's key innovation lies in extracting rich spatial semantics through VGGT, bypassing depth estimation errors and sensor requirements, thus enhancing the spatial modeling capabilities of VLA models [16].
宁波东方理工大学联培直博生招生!机器人操作/具身智能/机器人学习等方向
自动驾驶之心· 2025-08-21 09:04
Core Viewpoint - The article discusses the collaboration between Ningbo Dongfang University of Technology and prestigious institutions like Shanghai Jiao Tong University and University of Science and Technology of China to recruit doctoral students in the field of robotics, emphasizing a dual mentorship model and a focus on cutting-edge research in robotics and AI [1][2]. Group 1: Program Structure - The program allows students to register at either Shanghai Jiao Tong University or University of Science and Technology of China for the first year, followed by research work at Dongfang University under dual supervision [1]. - Graduates will receive a doctoral degree and diploma from either Shanghai Jiao Tong University or University of Science and Technology of China [1]. Group 2: Research Focus and Support - The research areas include robotics, control, and AI, with specific topics such as contact-rich manipulation, embodied intelligence, agile robot control, and robot learning [2]. - The lab provides ample research funding, administrative support, and encourages a balanced lifestyle for students, including physical exercise [2]. Group 3: Community and Networking - The article promotes a community platform for knowledge sharing in embodied intelligence, aiming to grow from 2,000 to 10,000 members within two years, facilitating discussions on various technical and career-related topics [3][5]. - The community offers resources such as technical routes, job opportunities, and access to industry experts, enhancing networking and collaboration among members [5][18]. Group 4: Educational Resources - The community has compiled extensive resources, including over 30 technical routes, open-source projects, and datasets relevant to embodied intelligence and robotics [17][21][31]. - Members can access a variety of learning materials, including books and research reports, to support their academic and professional development [27][24].
CMU最新!跨实体世界模型助力小样本机器人学习
具身智能之心· 2025-08-12 00:03
Core Viewpoint - The article discusses a novel approach to training visuomotor policies for robots by leveraging existing low-cost data sources, which significantly reduces the need for expensive real-world data collection [2][11]. Group 1: Methodology - The proposed method is based on two key insights: 1. Embodiment-agnostic world model pretraining using optic flow as an action representation, allowing for cross-embodiment data set training followed by fine-tuning with minimal target embodiment data [3][12]. 2. Latent Policy Steering (LPS) method improves policy outputs by searching for better action sequences in the latent space of the world model [3][12]. Group 2: Experimental Results - Real-world experiments showed that combining the policy with a pretrained world model from existing datasets led to significant performance improvements, with 30 demonstrations yielding over 50% relative improvement and 50 demonstrations yielding over 20% relative improvement [3][9]. Group 3: Challenges and Solutions - The article highlights the challenges posed by embodiment gaps in pretraining models across different robots, and emphasizes that world models are more suitable for cross-embodiment pretraining and fine-tuning for new embodiments [11][12].
影响市场重大事件:时隔10年,A股两融余额重回2万亿;全国一体化算力网算力池化、算网安全相关技术文件公开征求意见
Mei Ri Jing Ji Xin Wen· 2025-08-07 00:05
Group 1 - The National Data Standardization Technical Committee has publicly solicited opinions on two technical documents related to the National Integrated Computing Network, marking the transition from planning to implementation [1] - The Ministry of Industry and Information Technology expressed willingness to collaborate with APEC member economies to promote digital and AI innovation applications [2] - The A-share market's margin trading balance has reached 2 trillion yuan, the highest since July 2015, indicating increased trading activity [3] Group 2 - The Ministry of Transport, Ministry of Finance, and Ministry of Natural Resources have issued a new rural road enhancement action plan, focusing on innovative financing models and encouraging participation from financial institutions [4][9] - The National Development and Reform Commission has introduced a management method for central budget investment in training bases, emphasizing support for emerging fields with talent shortages and traditional industries with strong employment absorption [5] - The China Photovoltaic Industry Association is collecting opinions on the draft amendment to the Price Law, aiming to reflect the demands of the photovoltaic industry [6] Group 3 - Heilongjiang Province has implemented 20 policy measures to support the high-quality development of the high-end intelligent agricultural machinery industry [7] - Shanghai's financial regulatory authorities have introduced measures to promote the development of commercial health insurance, including tax deductions and optimized financing [8]
10%训练数据超越100%表现,机器人学习领域迎来重要突破
机器之心· 2025-06-11 03:54
Core Viewpoint - The ViSA-Flow framework represents a revolutionary approach to robot skill learning, significantly enhancing learning efficiency in data-scarce situations by extracting semantic action flows from large-scale human videos [4][36]. Group 1: Research Background and Challenges - Traditional robot imitation learning methods require extensive, meticulously curated datasets, which are costly to collect, creating a bottleneck for developing robots capable of diverse real-world tasks [7]. - Humans exhibit remarkable abilities to learn new skills through observation, focusing on semantically relevant components while filtering out irrelevant background information [8]. Group 2: Key Innovations - The core innovation of the ViSA-Flow framework is the introduction of Semantic Action Flow as an intermediate representation, capturing the essential spatiotemporal features of operator-object interactions, unaffected by surface visual differences [11]. - Key components of the framework include: 1. Semantic entity localization using pre-trained visual language models to describe and locate operators and task-related objects [11]. 2. Hand-object interaction tracking to maintain stable segmentation across frames [12]. 3. Flow-conditioned feature encoding to generate rich feature vectors while preserving visual context [13]. Group 3: Experimental Evaluation - In the CALVIN benchmark tests, ViSA-Flow outperformed all baseline methods using only 10% of annotated robot trajectories (1,768), achieving a success rate of 31.4% in completing five consecutive tasks, nearly double that of the next best method [19]. - The average sequence length of 2.96 further demonstrates ViSA-Flow's effectiveness in handling long-duration operational tasks [20]. Group 4: Ablation Studies - Ablation studies indicate that removing semantic entity localization significantly reduces performance, while omitting the time tracking phase decreases the average success length [26]. - The full ViSA-Flow model achieved a success rate of 89.0% in task completion, showcasing its robustness [21]. Group 5: Real-World Experiments - Real-world evaluations of ViSA-Flow included single-stage and long-duration operational tasks, demonstrating its ability to maintain performance across varying task complexities [23][30]. - The model's focus on operator and task-related objects allows for smooth transitions in spatial support as scenes change [31]. Group 6: Technical Advantages and Limitations - Advantages include data efficiency, cross-domain generalization, long-duration stability, and semantic consistency in task execution [40]. - Limitations involve the absence of explicit 3D geometric modeling, reliance on pre-trained components, and potential challenges in tasks requiring precise physical interactions [40]. Group 7: Future Directions - Future developments may include integrating physical modeling, reducing reliance on pre-trained components, combining with reinforcement learning algorithms, and expanding pre-training datasets [40]. Group 8: Significance and Outlook - ViSA-Flow represents a significant breakthrough in robot learning, proving the feasibility of extracting semantic representations from large-scale human videos for skill acquisition [36]. - The framework bridges the gap between human demonstration observation and robot execution, paving the way for more intelligent and efficient robotic learning systems [37].
马斯克:Optimus人形机器人2027年将在火星表面行走;阿里云发布通义灵码AI IDE,可调用3000多款工具丨AIGC日报
创业邦· 2025-05-31 00:57
1.【马斯克:Optimus人形机器人2027年将在火星表面行走】5月30日消息,马斯克表示,明年年 底,SpaceX将发射携带特斯拉Optimus人形机器人的星舰前往火星,按照轨道周期计算,将在2027 年抵达火星。届时,Optimus人形机器人将在火星表面行走。如果一切顺利,SpaceX将尝试送人类 前往火星。(东方财富网) 扫码订阅 AIGC 产业日报, 精选行业新闻,帮你省时间! 此外,如果您还想 查公司、找项目、看行业,深入了解人形机器人、商业航天、AGI等热门赛道 ,欢迎加入睿兽分析会员,解锁相关行业图谱和报告等。 (活动期间加入会员可免费获赠一份 产业日报) 4.【美国能源部联手英伟达、戴尔官宣下一代超算】美国劳伦斯伯克利国家实验室29日宣布,美国能 源部已与戴尔公司签订合同,将打造一台由英伟达芯片驱动的全新旗舰超级计算机。根据该实验室发 布的新闻稿,美国能源部长克里斯·赖特当天到访实验室,其间宣布与戴尔签订合同,为美国能源部下 辖的"国家能源研究科学计算中心(NERSC)"开发下一代旗舰超级计算机。据美联社报道,赖特当天与 戴尔高管和英伟达首席执行官黄仁勋共同宣布上述合作。新闻稿说,新超算将以 ...