MemoryVLA

Search documents
MemoryVLA:给机器人装上海马体,助力长时序机器人操作任务
具身智能之心· 2025-09-03 00:03
Core Viewpoint - The article discusses the development of MemoryVLA, a cognitive-memory-action framework inspired by human memory systems, aimed at improving robotic manipulation tasks that require long-term temporal dependencies [3][7]. Group 1: Current Issues in VLA Models - Existing Vision-Language-Action (VLA) models primarily rely on current observations, leading to poor performance in long-term, temporally dependent tasks [2][7]. - Cognitive science indicates that humans utilize a memory system involving neural activity and the hippocampus to manage tasks effectively over time [7]. Group 2: MemoryVLA Framework - MemoryVLA is designed to create a memory system for robots, drawing inspiration from human cognitive mechanisms [3][7]. - The framework includes a pre-trained Vision-Language Model (VLM) that encodes observations into perceptual and cognitive tokens, which are stored in a Perceptual-Cognitive Memory Bank [3]. - Working memory retrieves relevant entries from the memory bank, merging them with current tokens to adaptively update the memory [3]. Group 3: Importance of Memory in Robotics - The article emphasizes the necessity of memory in robotic tasks, explaining that it enhances decision-making and action sequences in complex environments [3][7]. - A memory-conditioned diffusion action expert generates action sequences with temporal awareness using the tokens [3].