非抓握操作

Search documents
机器人不只会抓和放!北大x银河通用「世界-动作模型」来了
自动驾驶之心· 2025-08-04 07:31
点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文的作者团队来自北京大学和银河通用机器人公司。第一作者为北京大学计算机学院前沿计算研究中心博士生吕江燃,主要研究方向为具身智能,聚焦于世 界模型和机器人的灵巧操作,论文发表于 ICCV,TPAMI,RSS,CoRL,RAL 等机器人顶会顶刊。本文的通讯作者为北京大学计算机学院教授王亦洲和 北京 大学助理教授、银河通用创始人及CTO 王鹤。 尽管当前的机器人视觉语言操作模型(VLA)展现出一定的泛化能力,但其操作模式仍以准静态的抓取与放置(pick-and-place)为主。相比之下,人类在操作物 体时常常采用推动、翻转等更加灵活的方式。若机器人仅掌握抓取,将难以应对现实环境中的复杂任务。例如,抓起一张薄薄的银行卡,通常需要先将其推到桌 边;而抓取一个宽大的盒子,则往往需要先将其翻转立起(如图 1 所示): 这些技能都属于一个重要的领域:非抓握操作(Non-prehensile Manipulation) ...
机器人不只会抓和放!北大x银河通用「世界-动作模型」赋能全面泛化的非抓握技能
具身智能之心· 2025-08-01 16:02
点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文的作者团队来自北京大学和银河通用机器人公司。第一作者为北京大学计算机学院前沿计算研究中心博士生吕江燃,主要研究方向为具身智能,聚焦于世 界模型和机器人的灵巧操作,论文发表于 ICCV,TPAMI,RSS,CoRL,RAL 等机器人顶会顶刊。本文的通讯作者为北京大学计算机学院教授王亦洲和 北京 大学助理教授、银河通用创始人及CTO 王鹤。 尽管当前的机器人视觉语言操作模型(VLA)展现出一定的泛化能力,但其操作模式仍以准静态的抓取与放置(pick-and-place)为主。相比之下,人类在操作物 体时常常采用推动、翻转等更加灵活的方式。若机器人仅掌握抓取,将难以应对现实环境中的复杂任务。例如,抓起一张薄薄的银行卡,通常需要先将其推到桌 边;而抓取一个宽大的盒子,则往往需要先将其翻转立起(如图 1 所示): 这些技能都属于一个重要的领域:非抓握操作(Non-prehensile Manipulation) ...
机器人不只会抓和放!北京大学X银河通用「世界-动作模型」赋能全面泛化的非抓握技能
机器之心· 2025-08-01 01:30
Core Viewpoint - The article discusses the development of a new model called Dynamics-adaptive World Action Model (DyWA) aimed at enhancing non-prehensile manipulation skills in robots, which are essential for performing complex tasks in real-world environments [3][10]. Group 1: Non-prehensile Manipulation - Non-prehensile manipulation refers to actions that do not involve grasping, such as pushing or flipping objects, which are crucial for handling various shapes and sizes in complex environments [3][5]. - Current robot models primarily focus on pick-and-place operations, limiting their effectiveness in dynamic and intricate tasks [3][5]. Group 2: Challenges in Non-prehensile Manipulation - The main challenges include complex contact modeling, where slight changes in friction can drastically alter movement trajectories, and the need for high-quality perception systems to understand object states and interactions [5][8]. - Traditional physical modeling methods struggle with real-world applications due to their reliance on precise object properties, which are often difficult to obtain [7][9]. Group 3: DyWA's Methodology - DyWA employs a teacher-student framework to train a model that predicts future states based on actions, allowing robots to "imagine" the outcomes of their movements [11]. - It incorporates a dynamic adaptation mechanism that infers hidden physical properties from historical observations, enhancing the robot's ability to interact with various surfaces and object weights [12][13]. - The model is designed to work with single-view inputs, making it feasible for real-world deployment without the need for complex multi-camera setups [14]. Group 4: Performance and Generalization - DyWA has demonstrated superior performance in simulations, achieving over 80% success rates in various scenarios, including known and unknown object states [17][18]. - In real-world tests, DyWA successfully adapted to different object shapes and surface frictions, achieving nearly 70% success in pushing unseen objects to target positions [20][24]. - The model's robust closed-loop adaptation allows it to learn from failures and improve its manipulation strategies over time [26].