VLA端到端基础模型G0

Search documents
专访星海图赵行:热闹的Demo不等于泛化能力,具身智能胜负仍在数据量
36氪· 2025-08-13 13:35
Core Viewpoint - The article discusses the advancements in embodied intelligence, particularly focusing on the development of the G0 model by the company Starry Sky, which aims to enhance the generalization capabilities of robots through high-quality data collection and a robust training framework [5][8][15]. Group 1: Development of G0 Model - The G0 model is designed to be a large-scale, generalizable embodied intelligence model, moving beyond smaller models that are limited in application [4][8]. - The model's training involved a three-stage framework that includes cross-ontology pre-training, single ontology pre-training, and post-training, resulting in a 20% performance improvement over the existing PI 0 model [25][26]. - The company emphasizes the importance of high-quality data collection, which is essential for the model's performance and generalization capabilities [41][50]. Group 2: Data Collection and Open Source Initiative - Starry Sky plans to open-source a dataset of 500 hours of real-world data to provide a benchmark for the industry, facilitating comparisons and advancements in algorithm development [13][51]. - The dataset aims to establish a high standard for data quality and evaluation, which is currently lacking in the robotics field [52][54]. - The company is focused on collecting diverse and realistic data from various environments, including homes, hotels, factories, and restaurants, to enhance the model's training [49][50]. Group 3: Challenges in Embodied Intelligence - The article highlights the challenges in achieving generalization in embodied intelligence, including variations in objects, environments, and tasks that complicate programming algorithms [30][32]. - The company believes that only large models can achieve the necessary generalization, as smaller models struggle with scalability [28][34]. - The current state of embodied intelligence is still in a "non-consensus phase," with ongoing research needed to determine if scaling laws observed in language models can be replicated in robotics [10][36]. Group 4: Industrialization and Future Directions - The company is exploring the VLA (Vision-Language-Action) paradigm as a primary approach for industrial applications, while also considering the integration of tactile feedback in the future [60][62]. - The dual-system approach of "slow thinking" and "fast execution" is being adopted to optimize performance and efficiency in robotic tasks [64][66]. - The exploration of world models is seen as a future direction, although it is not yet at an industrialized stage [69][70].
专访星海图赵行:热闹的Demo不等于泛化能力,具身智能胜负仍在数据量
3 6 Ke· 2025-08-13 03:37
Core Insights - The demonstration of bed-making by the robot at the 2025 WRC highlights the complexity of seemingly simple tasks, showcasing the robot's capabilities in flexible object manipulation and full-body control [1][2][4] - The newly released G0 model by the company aims to enhance generalization capabilities in embodied intelligence, moving beyond previous smaller models that struggled with scalability [2][4][11] - The company emphasizes the importance of high-quality data collection and engineering processes to support the development of robust models, with a focus on real-world data [4][19][28] Group 1: Technology and Model Development - The G0 model utilizes a three-stage training framework that has shown a 20% improvement over the previous PI 0 model in average metrics [9][10] - The company plans to open-source a dataset of 500 hours of real-world data to establish a high-quality benchmark for the industry, facilitating comparisons and algorithm validation [5][30] - The focus on data collection involves training personnel and addressing various challenges in real-time data acquisition, which is considered foundational for model training [19][22][24] Group 2: Industry Context and Future Directions - The company believes that the scaling laws observed in large language models can also apply to embodied intelligence, suggesting a potential for significant advancements in the field [14][16] - The VLA paradigm is seen as a primary industrial path, with ongoing exploration of additional technologies such as tactile sensing and world modeling for future applications [32][39] - The collaboration between academia and industry is viewed as beneficial, with the potential for academic insights to drive industrial advancements and vice versa [45][46]