Workflow
Time Contrastive Learning (TCL)
icon
Search documents
告别机器人“断片”!KAIST和UC Berkeley团队让VLA模型拥有记忆 实测成功率翻倍!
机器人大讲堂· 2026-02-16 15:31
Core Insights - The article discusses the limitations of existing Visual-Language-Action (VLA) models in robotics, particularly their lack of "historical memory," which hampers their ability to perform complex tasks that require context [1][4] - A new framework called HAMLET has been introduced, which enhances VLA models by integrating a lightweight memory system, resulting in a significant increase in task success rates [3][17] Group 1: Current Limitations of VLA Models - Current VLA models, such as GR00T N1.5 and CogACT, rely solely on the current visual frame and text instructions, leading to poor performance in tasks requiring context [4] - For example, in a task where a robot must cover a block with a cup, the lack of historical memory results in a success rate of only 37.5% for GR00T N1.5, causing the robot to repeat actions unnecessarily [4][14] - Simply stacking historical frames does not work effectively, as it slows down inference speed by 35% and increases peak memory usage by 3.6 times [4] Group 2: HAMLET Framework - HAMLET addresses the historical memory gap by adding two core components: moment tokens and a lightweight memory module [5][9] - Moment tokens are designed to compress and store scene information for each time step, allowing the model to focus on dynamic changes relevant to the task [6][8] - The memory module uses a two-layer Transformer architecture to filter and integrate these moment tokens, enabling the model to make more informed decisions based on historical context [9][11] Group 3: Performance Improvements - Extensive experiments show that HAMLET significantly improves success rates in long-term tasks, with an average success rate increase of 47.2% compared to baseline models [12][14] - In specific tasks, HAMLET improved the success rate from 12.5% to 66.7% in "Pick-and-Place Twice" and from 37.5% to 83.3% in "Swap Cubes" [14] - HAMLET also maintains high efficiency, with only a 7% increase in inference speed and a 1x increase in memory usage, compared to traditional methods that drastically slow down performance [15] Group 4: Cross-Task Transferability - The memory module of HAMLET demonstrates cross-task transferability, allowing it to improve success rates even when applied to different datasets, indicating a generalizable capability in processing historical information [16] Conclusion - HAMLET effectively resolves the core issue of historical memory in VLA models without requiring extensive retraining or restructuring, marking a significant step towards more capable and versatile robotic systems [17]