Workflow
ByteMini通用双臂移动机器人
icon
Search documents
国泰海通|机械:字节推出GR-3模型,泛化性显著提升
Core Viewpoint - ByteDance has launched the GR-3 model, which demonstrates strong capabilities in executing complex long tasks and significant improvements in generalization, suggesting attention to related industry chain stocks [1][2]. Summary by Sections Model Development - The GR-3 model, released by ByteDance's Seed team on July 22, showcases a VLA architecture that enhances its ability to generalize to new objects and environments, understand abstract language instructions, and manipulate flexible objects precisely [2]. - Compared to the previous GR-2 model (released in October 2024), GR-3 exhibits superior operational performance in new environments and object handling, with a high accuracy in understanding complex instructions [2]. Technical Innovations - The GR-3 model integrates a MoT+DiT network structure, combining "visual-language modules" and "action generation modules" into a 4 billion parameter end-to-end model, enhancing dynamic instruction-following capabilities through RMSNorm [2]. - The training methodology for GR-3 employs a three-in-one data training approach, utilizing high-quality remote operation data, low-cost human VR trajectory data (up to 450 data points per hour), and publicly available image-text data to improve its generalization ability [2]. Hardware Development - To maximize the potential of the GR-3 model, ByteDance has introduced the ByteMini, a general-purpose dual-arm mobile robot designed specifically for GR-3 [3]. - The ByteMini features 22 degrees of freedom and a unique wrist ball joint design for human-like flexibility, a multi-camera system for comprehensive situational awareness, and a whole-body control system for smooth trajectory generation [3]. Performance Comparison - In comparative tests against the industry-leading embodied model π0, GR-3 significantly outperformed in generalization and complex task success rates [4]. - GR-3 achieved a 17.8% higher success rate in new object operations compared to π0, requiring only 10 human trajectory data points to elevate the success rate from 60% to over 80% [4].