通用具身智能体 - filings, earnings calls, financial reports, news

通用具身智能体

Search documents

机器之心· 2025-12-21 04:21

Core Viewpoint - The article introduces Nvidia's latest open-source model, NitroGen, which is capable of playing over 1,000 different games using AI-generated controls, showcasing significant advancements in gaming automation and cross-game adaptability [5][6][8]. Group 1: Model Overview - NitroGen is designed to play a wide variety of game genres, including RPGs, platformers, and racing games, by directly processing game video frames to generate controller signals [6][8]. - The model supports fine-tuning for new games, allowing it to adapt quickly without starting from scratch, demonstrating its potential for cross-game generalization [8]. - The architecture of NitroGen is based on the GR00T N1.5 framework, which was originally designed for robotics but has been adapted for gaming applications with minimal modifications [12]. Group 2: Key Components - NitroGen consists of three core components: a multi-game intelligent agent, a universal simulator, and a large-scale dataset of gaming videos [15][16][17]. - The multi-game intelligent agent can generate controller commands from game observations, enabling zero-shot gameplay across various titles [15]. - The universal simulator standardizes interactions across different games using the Gymnasium API, facilitating large-scale training and evaluation [16]. - The dataset comprises 40,000 hours of publicly available gaming videos, covering over 1,000 games, and includes automatically generated action labels [17][24]. Group 3: Data Collection and Processing - The data collection process involved extracting player actions from videos with "input overlays," which present real-time controller inputs [18][19]. - The research team utilized advanced techniques to match key points and segment the controller displays from the videos, ensuring the model learns without "cheating" [21]. - The dataset features a diverse distribution of game types, with action RPGs making up 34.9% of the total video duration, followed by platformers at 18.4% [26]. Group 4: Performance and Results - NitroGen has demonstrated strong performance across various game types, including 3D action games and 2D platformers, achieving non-trivial task completion rates [28][30]. - The model showed a significant improvement in task success rates when fine-tuned for new games, with up to a 52% relative increase compared to models trained from scratch [32]. - The research indicates that NitroGen is a foundational step towards creating general-purpose embodied agents capable of interacting with complex environments [35][36].

北大发布 ManualVLA：首个长程「生成–理解–动作」一体化模型，实现从最终状态自主生成说明书并完成操纵

机器之心· 2025-12-18 09:08

Core Insights - The article discusses the limitations of existing VLA models in handling long-duration tasks that require precise final state definitions, such as LEGO assembly and object rearrangement, highlighting the need for a more integrated approach [2][9] - A new model called ManualVLA is introduced, which combines planning and action generation into a unified framework, improving the efficiency and effectiveness of robotic manipulation tasks [3][5] Group 1: Research Background and Challenges - Recent advancements in VLA models have significantly contributed to the development of general embodied intelligence, but challenges remain in coordinating high-level planning with precise control for long-duration tasks [9] - Existing hierarchical methods struggle with generalization to unseen final states and often rely on manually crafted instructions or human demonstration videos, leading to limitations in system complexity, deployment costs, and generalization capabilities [9] Group 2: ManualVLA Methodology - ManualVLA allows the model to generate its own instructions and execute actions based on those instructions, breaking down complex long-duration tasks into manageable steps [10][12] - The model employs a Mixture-of-Transformers (MoT) architecture, integrating a planning expert that generates multimodal operation manuals and an action expert that executes the tasks based on these manuals [5][14] Group 3: Experimental Results - ManualVLA demonstrated a significant improvement in success rates for real-world tasks, achieving an average success rate increase of approximately 32% compared to the latest baseline methods [7][28] - In experiments involving 2D LEGO assembly, 3D LEGO assembly, and object rearrangement, the model produced high-quality intermediate images and maintained a low mean absolute error (MAE) in predicting target object positions [24][27] Group 4: Training Phases - The training process consists of three phases: pre-training on a large dataset of robotic trajectories, utilizing a digital twin tool for 3D reconstruction and manual data generation, and fine-tuning on real-world expert demonstration trajectories [20][21][19] Group 5: Generalization and Robustness - ManualVLA exhibits robust generalization capabilities, maintaining high success rates even under varying backgrounds, object shapes, and lighting conditions, outperforming baseline models in these scenarios [33][37] - Ablation studies confirm that both explicit and implicit reasoning paths are essential for achieving optimal performance in long-duration tasks [33]

星动纪元端到端原生机器人大模型ERA-42亮相，引领具身大模型进入灵巧操作时代

IPO早知道· 2024-12-24 02:56

值得一提的是，这是业界首次仅通过同一个具身大模型就实现了五指灵巧手像人手一样使用多种工具完成上百种灵巧复杂操作任务。模型不需要任何预编程技能，完全基于其强大的泛化和自适应力，能在不到2小时内通过收集少量数据就学会执行新的任务。同时，ERA-42还在快速不断学习更多新技能中。此外， ERA-42也是世界范围内首个真正的五指灵巧手具身大模型，开启了具身大模型的通用灵巧操作时代。星动纪元打造的端到端原生机器人大模型ERA-42凭借其预测能力，模型具备强大的泛化、自适应和规模化能力。结合星动纪元为AI打造的全新硬件平台，可快速实现具身智能体软硬件协同进化和商业化落地。 1、相比夹爪，星动XHAND1已经可以完成100多种精细化、智能化的复杂灵巧操作任务；2、星动纪元原生机器人大模型ERA-42能理解物理世界和预测未来；3、星动纪元原生机器人大模型ERA- 42具备更强泛化能力；4、星动纪元原生机器人大模型ERA-42具备更强自适应性；5、星动纪元原生机器人大模型ERA-42初步体现"Scaling效应"。当然，构建通用具身智能体需要软硬件协同迭代，就像人类的脑和身体是从小到大同步协同成长的 ...