宇树科技宣布开源UnifoLM-VLA-0 具备单模型处理多任务的通用能力

Core Insights - Yuzhu Technology announced the open-source release of the UnifoLM series model "UnifoLM-VLA-0," designed for general humanoid robot operations, aiming to overcome limitations of traditional VLLMs in physical interactions [1] - The model demonstrates enhanced spatial reasoning and reliable multimodal perception capabilities across various task scenarios, evolving from general "text-image understanding" to a "embodied brain" with physical common sense [1] - In real machine validation, the model can complete 12 complex operational tasks with a single strategy, showcasing its task generalization ability [2] Group 1 - The UnifoLM-VLA-0 model integrates an Action Head for action prediction, enabling it to handle multiple tasks with a single model [2] - The model was trained using high-quality real machine datasets covering 12 complex operational tasks, achieving near-optimal performance in the LIBERO simulation benchmark [2] - Real machine experiments indicate that the model maintains robust execution and interference resistance under external disturbance conditions [2] Group 2 - The model's spatial perception and understanding capabilities significantly outperform Qwen2.5-VL-7B and are comparable to Gemini-Robotics-ER 1.5 in "no thinking" mode [1]