免训练！使用贝叶斯去微调VLM，机器人操作任务取得SOTA！

Core Insights - The article discusses the advancements in Visual Language Models (VLM) and introduces T²-VLM, a novel framework that generates temporally consistent rewards for robotic tasks without requiring training [2][5]. Group 1: VLM and T²-VLM Overview - VLMs have significantly improved performance in embodied tasks such as goal decomposition and visual understanding, but providing precise rewards for robotic operations remains challenging due to the lack of domain-specific knowledge in pre-training datasets and high computational costs [2]. - T²-VLM is designed to track the state changes of sub-goals derived from VLMs to generate accurate rewards, enhancing long-term decision-making capabilities and improving fault recovery performance through reinforcement learning [2]. Group 2: Methodology and Results - The T²-VLM method queries the VLM before each interaction to establish spatially aware sub-goals and initial completion estimates, utilizing a Bayesian tracking algorithm to dynamically update the target completion state [2]. - Extensive experiments demonstrate that T²-VLM achieves state-of-the-art performance in two robotic operation benchmarks while reducing computational costs and exhibiting superior reward accuracy [2]. Group 3: Live Session Details - A live session is scheduled for December 3rd, from 19:30 to 20:30, focusing on the background of real-machine reinforcement learning, the current state of reward generation research based on VLMs, and reflections on the T²-VLM method [5][6].