超大参数量具身VLM开源:首创DPPO训练范式,模型性价比天花板,来自北京人形
机器之心·2025-11-14 10:32

Core Insights - The article highlights the launch of Pelican-VL 1.0, an open-source embodied intelligence VLM model, which is considered the largest scale of its kind in the industry, covering 7B and 72B parameter scales [1][4]. Group 1: Model Performance and Training - Pelican-VL has achieved a performance improvement of 20.3% over baseline models, surpassing similar open-source models by 10.6% [4][11]. - The model was trained on a cluster of over 1000 A800 GPUs, with a single checkpoint training consuming more than 50,000 A800 GPU-hours [4]. - The training process utilizes a novel "Deliberate Practice Policy Optimization" (DPPO) paradigm, mimicking human metacognitive learning to enhance model capabilities [8][10]. Group 2: Capabilities and Applications - Pelican-VL demonstrates significant advancements in multimodal understanding and reasoning, effectively processing visual and textual inputs to perform complex tasks [12][13]. - The model excels in spatial-temporal reasoning, allowing it to understand sequences of actions and make predictions based on dynamic scenarios [13]. - It showcases strong embodied interaction capabilities, enabling it to generate detailed action plans for robotic tasks such as object manipulation and navigation [13]. Group 3: Industry Implications - The open-source nature of Pelican-VL allows other labs and companies to customize training, thereby accelerating the practical application of VLM in robotics [23]. - The model's development addresses challenges in high-quality embodied data scarcity and evaluation benchmarks, paving the way for future advancements in the field [23]. - Pelican-VL represents a significant step towards creating robots that can not only recognize objects but also make informed decisions about their interactions with the environment [23][28].

超大参数量具身VLM开源:首创DPPO训练范式,模型性价比天花板,来自北京人形 - Reportify