Workflow
HyperTASR
icon
Search documents
港大团队首发具身表征新范式,构建任务自适应感知框架
具身智能之心· 2025-09-12 00:05
编辑丨机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文的共同第一作者为香港大学 InfoBodied AI 实验室的博士生孙力和吴杰枫,合作者为刘瑞哲,陈枫。通讯作者为香港大学数据科学研究院及电机电子工程系助 理教授杨言超。InfoBodied AI 实验室近年来在 CVPR,ICML,Neurips,ICLR 等顶会上有多项代表性成果发表,与国内外知名高校,科研机构广泛开展合作。 出发点与研究背景 在具身智能中,策略学习通常需要依赖场景表征(scene representation)。然而,大多数现有多任务操作方法中的表征提取过程都是任务无关的(task-agnostic): 无论具身智能体要 "关抽屉" 还是 "堆积木",系统提取的特征的方式始终相同(利用同样的神经网络参数)。 想象一下,一个机器人在厨房里,既要能精准抓取易碎的鸡蛋,又要能搬运重型锅具。传统方法让机器人用同一套 "眼光" 观察不同的任务场景,这会使得场景表 征中包含大 ...
CoRL 2025 | 港大InfoBodied AI团队首发具身表征新范式,构建任务自适应的感知框架
机器之心· 2025-09-10 11:30
Core Viewpoint - The article introduces HyperTASR, a novel framework for task-aware scene representation in embodied intelligence, enabling robots to dynamically adjust their perception based on task relevance, akin to human cognitive processes [5][12]. Group 1: Research Background and Challenges - In embodied intelligence, strategy learning relies heavily on scene representation, but existing methods often use task-agnostic feature extraction, leading to inefficiencies [4][18]. - Traditional approaches do not adapt to different tasks, resulting in irrelevant information being included in strategy learning, which hampers efficiency and generalization [18][20]. Group 2: Innovations and Contributions - HyperTASR framework allows for task-aware scene representation, enabling robots to focus on relevant environmental features during task execution [8][12]. - Introduces a hypernetwork-driven representation transformation mechanism that dynamically generates adaptive parameters based on task specifications and progress [9][20]. - Compatible with various strategy learning architectures, allowing integration without significant modifications, enhancing performance [10][26]. Group 3: Experimental Validation - Significant improvements were observed in both simulation (RLBench) and real-world environments, establishing new state-of-the-art (SOTA) benchmarks for single-view manipulation tasks [11][29]. - In simulation, integrating HyperTASR with GNFactor and 3D Diffuser Actor led to success rates exceeding baseline methods by 27% and achieving over 80% success in single-view operations, respectively [29][31]. - Real-world experiments demonstrated strong generalization capabilities, achieving a success rate of 51.1% with only 15 demonstration samples per task [32][33].