Core Viewpoint - The article discusses the launch of RynnBrain, the first embodied brain model with spatiotemporal memory, developed by Alibaba's Damo Academy, which significantly enhances the capabilities of embodied robots in understanding and interacting with the physical world [7][9][76]. Group 1: RynnBrain Model Features - RynnBrain consists of seven models ranging from 2B to 30B parameters, designed to understand both "time" and "space," allowing it to remember past trajectories and predict future actions [7][9]. - It outperforms leading models like Nvidia's Cosmos-reason2 and Google's Gemini Robotics ER 1.5 across 20 benchmarks, achieving 16 state-of-the-art (SOTA) results [7]. - RynnBrain-30B-A3B, the first MoE architecture in embodied models, demonstrates exceptional efficiency, requiring only 3B active parameters while surpassing the performance of a 72B model [10][11]. Group 2: Training and Data Utilization - The model was trained using over 20 million pairs of high-quality data, incorporating various multimodal training datasets to enhance its understanding of physical space [19][20]. - A unique aspect of the training involved generating 1 million pairs of "self-centered" OCR question-answer data, enabling the robot to interpret labels and numbers in its environment [21][23]. Group 3: Functional Capabilities - RynnBrain exhibits strong flexibility in input and output, capable of processing images and videos of varying resolutions and providing multiple modalities of output, such as trajectories and poses [26][28]. - It possesses spatiotemporal memory, allowing it to maintain awareness of object locations and trajectories even after interruptions, which is crucial for long-term tasks [34][40]. Group 4: System Architecture and Scalability - The model employs a "big brain-small brain" layered architecture, where RynnBrain handles long-term planning and scene understanding, while a smaller execution layer focuses on motor control [54][56]. - This architecture facilitates modular iteration and enhances the model's adaptability to various tasks, such as complex navigation and planning [57][58]. Group 5: Open Source and Industry Impact - Damo Academy has open-sourced RynnBrain along with comprehensive training codes and a new evaluation benchmark, RynnBrain-Bench, which assesses the model's understanding of video sequences and spatial positioning [60][62]. - This initiative aims to lower barriers in the industry by providing a shared infrastructure for understanding physical concepts, improving system efficiency, and fostering healthy competition among teams [66][69].
阿里达摩院开源具身大脑基模:3B激活参数性能超越72B,转身就忘事的机器人有救了