PaliGemma 3B VLM

Search documents
在复杂真实场景中评估 π0 这类通用 policy 的性能和边界
具身智能之心· 2025-08-16 16:03
Core Viewpoint - The article discusses the evaluation of the PI0-FAST-DROID model in real-world scenarios, highlighting its potential as a generalist model for robotic operations and the challenges it faces in various tasks [4][10][73]. Evaluation Method - The evaluation utilized the π₀-FAST-DROID model, specifically fine-tuned for the DROID robot platform, which includes a Franka Panda robot equipped with cameras [5][10]. - The assessment involved over 300 trials across various operational tasks, focusing on subjective evaluations similar to those used in natural language processing [11][10]. Key Findings - The model demonstrated a strong prior assumption of reasonable behavior, but this was often insufficient to complete tasks [11]. - Prompt engineering significantly influenced the model's performance, with variations in wording or camera angles leading to substantial fluctuations in success rates [12][56]. - The model exhibited impressive visual-language understanding capabilities and could mimic continuous behaviors across different scenarios [13][27]. Performance in Complex Scenarios - The model showed robust performance in recognizing and manipulating transparent objects and those camouflaged against complex backgrounds [19][20]. - It maintained focus on tasks despite human activity in the background, indicating a strong robustness to human movement [24]. Challenges and Limitations - The model faced issues with semantic ambiguity and a lack of memory, leading to premature task termination in multi-step operations [36][40]. - It struggled with precise spatial reasoning, often failing to lift objects high enough to avoid collisions with containers [46][48]. - The model's performance was sensitive to the quality of prompts, with unclear instructions leading to failures [57][59]. Task-Specific Performance - The model's progress and success rates varied across different task categories, such as pouring (52.3% progress, 24% success) and manipulating articulated objects (37.8% progress, 28.5% success) [85][87]. - In human-robot interaction scenarios, the model achieved a progress rate of 53.5% but only a 24% success rate, indicating room for improvement in safety and collaboration [102]. Conclusion - The evaluation indicates that while the PI0 model shows promise as a generalist policy in unseen operational scenarios, significant challenges remain in instruction adherence, fine manipulation, and performance under partial observability [73].