上海交大团队让AI机器人拥有视觉预见力

Core Viewpoint - The research conducted by a collaboration of institutions including Shanghai Jiao Tong University and Bosch has developed a new framework called Mantis, which enhances robot learning by enabling them to predict future actions, significantly improving their efficiency and performance in complex tasks [3][4][8]. Group 1: Mantis Framework - Mantis introduces a "decoupled visual foresight" capability, separating the tasks of "seeing the future" and "performing actions" to optimize each task's effectiveness [4][9]. - The framework employs a technique called "latent action queries," which helps robots identify necessary actions from current and future scenes, enhancing their learning process [4][6]. Group 2: Training Methodology - Mantis utilizes a progressive training approach, starting with observing human operation videos, then integrating real robot operation data, and finally incorporating language understanding training [5][6]. - This method allows robots to learn from simpler tasks before advancing to more complex ones, akin to how children learn [5]. Group 3: Performance Metrics - In tests on the LIBERO simulation platform, Mantis achieved a success rate of 96.7%, outperforming several advanced systems like OpenVLA and π0 [6][9]. - Mantis demonstrated a significantly faster learning speed, achieving results in a few training cycles compared to traditional methods that require many more [6][9]. Group 4: Real-World Testing - The research team conducted real-world tests across three scenarios to validate Mantis's capabilities, showing superior performance in understanding world knowledge, basic reasoning, and intent comprehension [7]. - Mantis exhibited strong generalization abilities, particularly in handling novel instructions, outperforming the leading open-source model π0.5 [7]. Group 5: Future Implications - The development of Mantis signifies a new direction in robotics, balancing operational skills with language comprehension, which is crucial for future integration into human life [8]. - The potential applications of robots with visual foresight span various sectors, including household chores, healthcare, manufacturing, and service industries, promising more precise and efficient assistance [8].