Core Insights - The article highlights a significant breakthrough in the performance of the VLA model, achieving a 300% increase in efficiency, primarily due to the use of world model-generated training data, which now constitutes 90% of the dataset [1][3][4]. Group 1: Model Development and Performance - The GigaWorld-0 model, developed by the domestic company 极佳视界, has successfully integrated world model-generated data, leading to substantial improvements in generalization across new textures, perspectives, and object placements [3][4]. - The GigaWorld-0 model consists of two main components: GigaWorld-0-Video for generating rich, realistic interaction data, and GigaWorld-0-3D for ensuring geometric and physical accuracy in generated data [5][6]. Group 2: Technical Innovations - GigaWorld-0-Video employs a sparse attention mechanism and a mixture-of-experts (MoE) architecture to enhance computational efficiency and content control, significantly reducing memory usage and inference latency [7][12][13]. - GigaWorld-0-3D combines generative and reconstruction techniques to improve scene modeling under sparse observations, utilizing a differentiable physics engine for high-fidelity physical simulations [14][18]. Group 3: Training Framework and Efficiency - The GigaTrain framework, which supports advanced training techniques, has been open-sourced to facilitate community development and standardization in embodied intelligence data generation [20][29]. - GigaWorld-0 is the first world model to adopt FP8 precision for end-to-end training, achieving a balance between visual fidelity and computational efficiency [19]. Group 4: Competitive Performance - In comparative evaluations against leading world models, GigaWorld-0, with only 2 billion parameters, outperformed larger models in overall quality scores, demonstrating its effectiveness in embodied intelligence tasks [22][23][24]. - The model's ability to generate high-quality video and 3D scenes positions it as a cost-effective solution in the market [25]. Group 5: Company Background and Funding - 极佳视界, founded in 2023, focuses on world models and embodied intelligence, aiming to bridge the gap between physical and virtual environments [27][28]. - The company recently completed a significant funding round, raising over 100 million yuan, with investments from Huawei and other notable funds, indicating strong market confidence [29].
世界模型和具身大脑最新突破:90%生成数据,VLA性能暴涨300%|开源
量子位·2025-12-02 04:59