Workflow
TACO框架
icon
Search documents
直面VLA的「阿喀琉斯之踵」:TeleAI提升具身推理稳定性
具身智能之心· 2025-12-25 01:41
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在机器人具身智能领域,视觉 - 语言 - 动作(Vision-Language-Action, VLA)模型正以惊人的速度发展。从 RT-1、Octo 到最新的 π0、GR00T N1,这些集成了大规 模视觉语言模型与机器人控制的系统展现出前所未有的泛化能力。然而,一个被长期忽视的问题正阻碍着 VLA 模型从实验室走向真实世界 —— 推理阶段的不稳 定性。 中国电信集团 CTO、首席科学家、中国电信人工智能研究院(TeleAI)院长李学龙教授联合清华大学、中国科学技术大学团队 直面这一挑战,提出了名为 TACO (Test-time Anti-exploration via pseudo-COunts) 的创新框架。该研究为解决 VLA 推理的不稳定性提供了扎实的理论根基和实践方案,通过在模拟基准和真实机 器人平台上的实验验证了方法的有效性。在 真实机器人实验中,TACO 将任务成功率平均提升 ...
直面VLA的「阿喀琉斯之踵」:TeleAI用「反探索」提升具身推理稳定性
机器之心· 2025-12-24 07:40
Core Insights - The article discusses the rapid development of Vision-Language-Action (VLA) models in embodied intelligence, highlighting their unprecedented generalization capabilities but also addressing the critical issue of instability during the reasoning phase [2][3][4]. - A novel framework named TACO (Test-time Anti-exploration via pseudo-Counts) is introduced to tackle the reasoning instability in VLA models, providing a solid theoretical foundation and practical solutions [2][8]. Group 1: VLA Model Challenges - VLA models, despite their impressive average performance, exhibit extreme sensitivity to initial noise during inference, leading to success rates that can fluctuate between 0% and 80% for the same model [4][6]. - The instability is attributed to two main factors: the retention of redundant action patterns from diverse pre-training data and the multimodal nature of fine-tuning datasets, which may include suboptimal strategies [7][6]. Group 2: TACO Framework - TACO draws inspiration from the "anti-exploration" principle in offline reinforcement learning, aiming to constrain generated actions to successful patterns within the fine-tuning dataset [9][11]. - The framework includes three key components: a Coupled Pseudo-Count Estimator that utilizes the VLA model's internal representation, ensuring efficient validation without additional training [11][12]. Group 3: Implementation and Results - TACO employs a two-stage reasoning process: generating diverse action candidates and validating them through pseudo-counts, which are calculated using a trained CFN [17][18]. - The implementation of a Shared Observation Key-Value Cache significantly reduces computational costs, allowing for efficient real-time operation with minimal latency [20][21]. Group 4: Experimental Validation - Comprehensive evaluations across multiple simulated benchmarks and a dual-arm robot platform demonstrate TACO's effectiveness, with average success rates improving by 16% in real-world tasks [22][32]. - Specific tasks, such as "organizing paper and pens," showed a remarkable 25% increase in success rates, highlighting TACO's ability to filter out suboptimal behaviors [32][33]. Group 5: Future Directions - TACO not only addresses practical challenges but also opens new perspectives for VLA research, suggesting potential expansions into more complex multi-task scenarios and integration with world models for enhanced long-term planning capabilities [35].