视觉-语言-动作大模型
Search documents
宇树开源UnifoLM-VLA-0大模型
Bei Jing Shang Bao· 2026-01-29 14:14
Core Viewpoint - Yushu announced the open-source release of UnifoLM-VLA-0, a visual-language-action (VLA) large model aimed at general humanoid robot operations, which seeks to overcome the limitations of traditional visual-language models (VLM) in physical interactions [1] Group 1 - UnifoLM-VLA-0 is part of the UnifoLM series and focuses on enhancing robot operation capabilities through continued pre-training on robot operation data [1] - The model represents an evolution from general "image-text understanding" to a "embodied brain" with physical common sense [1]
宇树科技宣布开源UnifoLM-VLA-0 具备单模型处理多任务的通用能力
智通财经网· 2026-01-29 12:41
Core Insights - Yuzhu Technology announced the open-source release of the UnifoLM series model "UnifoLM-VLA-0," designed for general humanoid robot operations, aiming to overcome limitations of traditional VLLMs in physical interactions [1] - The model demonstrates enhanced spatial reasoning and reliable multimodal perception capabilities across various task scenarios, evolving from general "text-image understanding" to a "embodied brain" with physical common sense [1] - In real machine validation, the model can complete 12 complex operational tasks with a single strategy, showcasing its task generalization ability [2] Group 1 - The UnifoLM-VLA-0 model integrates an Action Head for action prediction, enabling it to handle multiple tasks with a single model [2] - The model was trained using high-quality real machine datasets covering 12 complex operational tasks, achieving near-optimal performance in the LIBERO simulation benchmark [2] - Real machine experiments indicate that the model maintains robust execution and interference resistance under external disturbance conditions [2] Group 2 - The model's spatial perception and understanding capabilities significantly outperform Qwen2.5-VL-7B and are comparable to Gemini-Robotics-ER 1.5 in "no thinking" mode [1]
宇树开源 UnifoLM-VLA-0
Mei Ri Jing Ji Xin Wen· 2026-01-29 12:38
每经AI快讯,1月29日,宇树宣布开源 UnifoLM-VLA-0。UnifoLM-VLA-O是UnifoLM系列下面向通用人 形机器人操作的视觉-语言-动作(VLA)大模型。该模型旨在突破传统VLM在物理交互中的局限,通过 在机器人操作数据上的继续预训练,实现了从通用"图文理解"向具备物理常识的"具身大脑"的进化。 ...
宇树开源UnifoLM-VLA-0
Xin Lang Cai Jing· 2026-01-29 12:37
人民财讯1月29日电,宇树1月29日宣布,开源UnifoLM-VLA-0。UnifoLM-VLA-0是UnifoLM系列下面向 通用人形机器人操作的视觉-语言-动作(VLA)大模型。该模型旨在突破传统VLM在物理交互中的局 限,通过在机器人操作数据上的继续预训练,实现了从通用"图文理解"向具备物理常识的"具身大脑"的 进化。 ...
智能驾驶深度报告:世界模型与VLA技术路线并行发展
Guoyuan Securities· 2025-10-22 08:56
Investment Rating - The report does not explicitly state an investment rating for the smart driving industry Core Insights - The smart driving industry is experiencing rapid evolution driven by "end-to-end" and "smart driving equity" concepts, with significant growth in both new energy vehicle sales and smart driving functionalities [3][4][9] - The penetration rate of L2-level smart driving in new energy vehicles in China has increased from approximately 7% in 2019 to around 65% by the first half of 2025, indicating a strong correlation between new energy vehicle sales and the adoption of smart driving technologies [9][10] - The smart driving market is projected to exceed 5 trillion yuan by 2030, with a compound annual growth rate driven by technological advancements and increased consumer acceptance [15][16] Summary by Sections 1. "Equity + End-to-End" Accelerating Smart Driving Evolution - The smart driving industry has seen a significant increase in new energy vehicle sales, which has created a positive feedback loop for the adoption of smart driving technologies [9][10] - The penetration of L2-level smart driving features in new energy vehicles has rapidly increased, reflecting the growing consumer acceptance and market expansion of smart driving technologies [9][10] 2. End-to-End Smart Driving Review - The evolution of end-to-end smart driving can be categorized into four main stages, with advancements in perception, decision-making, and control processes [30][32] - The introduction of the "occupancy network" has enhanced environmental perception capabilities, allowing for more accurate and stable decision-making in complex driving scenarios [46][47] 3. VLA Technology Route - The VLA (Vision-Language-Action) model is emerging as a key driver of paradigm shifts in autonomous driving, integrating visual, linguistic, and action modalities into a cohesive framework [70][71] - The VLA model's development is divided into four stages, with significant advancements in task understanding and execution capabilities [76][77] 4. World Model Technology Route - The world model approach emphasizes physical reasoning and spatial understanding, representing a long-term evolution path for smart driving technologies [69][70] - The integration of world models with cloud computing is expected to enhance the iterative optimization of end-to-end smart driving systems [65][66]