具身智能之心
Search documents
2.64亿元订单!刷新全球人形机器人记录
具身智能之心· 2025-11-26 04:00
Core Viewpoint - The article highlights the recent success of UBTECH in securing a significant contract for humanoid robots, indicating a strong market position and growth potential in the humanoid robotics sector [2][4]. Group 1: Contract and Financials - UBTECH announced a contract worth 264 million yuan for a humanoid robot data collection and testing center in Fangchenggang, Guangxi, with the latest humanoid robot model, Walker S2, as the primary product [2]. - The total order value for the Walker S2 robot has reached 1.1 billion yuan this year, marking it as the largest single product sales figure in the global humanoid robot market [4]. Group 2: Market Impact - The delivery of the S2 model has begun, primarily targeting the manufacturing and logistics sectors, which is expected to boost confidence in the global humanoid robot market [5].
机加篮球有没有搞头?港科大解锁全球首个真实篮球机器人Demo!
具身智能之心· 2025-11-26 00:05
编辑丨 量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 1米3的机器人小土豆,三步上篮也可以如此丝滑。 别误会,这台宇树G1暂时还不准备参加NBA选秀,但它刚解锁的 "现实世界打篮球" 技能,离上"村BA"首发应该不远了。 据悉,这是全球首个能在真实场景中完成篮球动作的机器人demo,来自香港科技大学的研究团队。 虽然团队还没公开完整的技术细节,但结合他们此前让机器人"打篮球"的工作,这次很可能是在之前研究的基础上,进一步改良而来。 接下来,让我们一窥究竟。 SkillMimic-v2 首先是被收录于 SIGGRAPH 2025 的 SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations 。 SkillMimic-V2旨在解决交互演示强化学习(RLID)中演示轨迹稀疏、含噪且覆盖不足的难题。 其通过 ...
除了27万小时的真实世界操作轨迹和GEN-0 ,Generalist AI还有哪些亮点值得深扒
具身智能之心· 2025-11-26 00:05
点击下方 卡片 ,关注" 具身智能 之心 "公众号 11月4日,Generalist AI发布了震撼世界的Gen-0具身基础模型,其 数据规模是前所未有。这个由Google DeepMind高级研究科学家Pete Florence创 立、Andrew Barry担任CTO、Andy Zeng担任首席科学家的具身领域独角兽,仅在短短数月内就2度凭借官网公开发布的成果惊艳世人。上一次是 凭借4段任务难度高、精度要求不低的的双臂长程操作视频,而这次是Gen-0. GEN-0 的"强大"基于Generalist AI自研机器人数据集进行预训练。这套 27万小时的真实世界操作轨迹是当前具身领域规模最大的数据集,仅在衣物 处理的轨迹数就达到了3亿条。而DRIOD是七万多条示范轨迹,Agibot World/Open X-Embodiment是超一百万条轨迹。 而π0.5 是在移动操控机器 人的环境中,收集了大约400小时的真机数据。从轨迹的角度来看,他们仅在衣物处理的轨迹数就达到了3亿条。而DRIOD是七万多条示范轨迹, Agibot World/Open X-Embodiment是超一百万条轨迹。在数量级上,Gener ...
ActDistill:同济大学提出动作引导蒸馏框架,机器人推理速度提升1.67倍
具身智能之心· 2025-11-26 00:05
Group 1 - The article discusses the challenges of deploying Vision-Language-Action (VLA) models in real-time or resource-constrained robotic systems due to high computational costs and inference delays [2][3]. - Existing efficient VLA strategies often prioritize visual-language model optimizations, leading to key information loss and incoherent action semantics [2][3]. Group 2 - The proposed ActDistill framework aims to address these issues by providing an action-prediction-oriented distillation framework that balances efficiency and fidelity while preserving action prediction accuracy [3][4]. - ActDistill consists of two core modules: Graph-Structured Encapsulation and Action-Guided Self-Derived Distillation, which work together to model action semantics and guide knowledge distillation [4][8]. Group 3 - The Graph-Structured Encapsulation module explicitly models the hierarchical evolution of action semantics and separates task-related interactions from redundant background signals [6]. - The Action-Guided Self-Derived Distillation module utilizes a lightweight student model that aligns with the teacher model's structure while reducing depth, incorporating dynamic routing to adaptively predict layer gating scores [8][11]. Group 4 - Experimental results show that ActDistill achieves a success rate of 73.95% with a 1.59x speed-up and a 50.5% reduction in computational load compared to full models [9][12]. - The framework demonstrates significant improvements in efficiency and performance across various benchmarks, including LIBERO and SIMPLER [12][13]. Group 5 - The article highlights the importance of the Graph-Structured Encapsulation module, noting that replacing it with a simpler architecture led to a significant drop in performance [13]. - The framework's ability to maintain trajectory stability and focus attention on action-relevant areas is emphasized, showcasing its effectiveness in practical applications [16][17]. Group 6 - ActDistill represents a novel approach to action-centered compression of VLA models, achieving over 50% reduction in computational load while maintaining task success rates [24]. - Future directions include exploring teacher-free or reinforcement learning-guided variants and integrating long-horizon temporal reasoning into the routing mechanism for enhanced adaptability [24].
快3000人了,这个具身社区有点干货~
具身智能之心· 2025-11-26 00:05
最近在为大家收敛具身科研的几个重点模块:行业内容、本体形态、算法、还有部署的一些方案,已经汇 总在我们的社区内部。 XLerobot存在一定的移动能力,但不多,适合一些入门科研&个人开发使用,可以适配移动操作的一些任 务。 其它开发平台,成本较高,需要一定的资金投入,可以参考方舟无限、星海图、宇树的几款本体。 目前为大家梳理了行业正在从事具身大脑、本体研发的公司(突然发现本体也卷不太动了......),以及一些 比较活跃的具身实验室。方便大家判断和升学,除此之外,还有很多行业的研报,供大家判断具身的发展 与周期。 本体方面,推荐几款适合科研的产品:SO-100系列、openarm系列、XLerobot系列等; SO100及升级版本,能上一些VA和VLA的算法,常见功能可以实现了; Openarm是一款双臂任务框架,目前有几家公司开始生产相关本体,缺乏移动能力,一些叠衣服、pick and place也都能满足。但从数据采集来看,VR版本更舒服。 以上是我们在具身社区中的分享,也欢迎更多需要入门进阶的同学加入我们的社区。近一年的搭建,社区 内已经完成了技术路线分享、直播、问答、求职、赛事等多个版块的分享。这里实 ...
Meta再推WorldGen,一句话「盖」出50×50米一座城
具身智能之心· 2025-11-25 00:03
Core Insights - Meta has introduced a groundbreaking research project called WorldGen, which allows users to generate fully navigable and interactive 3D worlds from simple text prompts [12][22][30] - The technology leverages advanced procedural reasoning, diffusion models, and object-oriented scene decomposition to create coherent and visually rich 3D environments [13][19][29] Group 1: Technology Overview - WorldGen enables the creation of 3D worlds by inputting a simple prompt, such as "a medieval village in cartoon style," resulting in a consistent and themed environment [5][12] - The generated 3D worlds are not just static images but are interactive and allow for free movement within the space, maintaining structural integrity and connectivity between different areas [9][12] - Unlike existing methods that often degrade in quality when viewed from different angles, WorldGen maintains high-quality textures and geometry across a 50 x 50 meter area [19][29] Group 2: Development and Future Plans - Currently, WorldGen is in the research phase and is not yet available to developers, but it is compatible with major game engines like Unity and Unreal without additional conversion processes [22][31] - Future iterations of WorldGen are expected to support larger-scale world generation and reduce latency in the generation process [20][22] - The introduction of WorldGen signifies a shift in 3D content creation, making it more accessible to non-experts and potentially revolutionizing workflows in various industries [22][30]
达摩院最新!RynnVLA-002:统一VLA与世界模型
具身智能之心· 2025-11-25 00:03
Core Insights - The article discusses the RynnVLA-002 model, which enhances robot control by integrating Visual-Language-Action (VLA) models with world models to improve action generation, environmental understanding, and future predictions [3][4][37] - RynnVLA-002 achieves a success rate of 97.4% in simulated environments and shows a 50% improvement in real-world robot tasks, demonstrating its effectiveness in bridging perception, understanding, action, and prediction [19][20][37] Summary by Sections Introduction to RynnVLA-002 - RynnVLA-002 addresses the limitations of existing VLA models and world models by creating a dual enhancement framework that allows for better action generation and scene prediction [4][7] Key Components - The model employs a unified multimodal encoding system that integrates visual, textual, and action data into a single vocabulary, facilitating cross-modal understanding and generation [8][10] - It features a dual enhancement architecture that allows VLA and world models to mutually improve each other's performance [10][11] - A mixed action generation mechanism is introduced to tackle issues of error accumulation and generalization in traditional action generation [12][17] Experimental Results - In simulated environments, RynnVLA-002 achieved an average success rate of 97.4% for continuous actions and 93.3% for discrete actions, outperforming pre-trained baseline models [20][19] - In real-world tasks, the model demonstrated a success rate of 90% in placing blocks and 80% in placing strawberries, showcasing its robustness in complex scenarios [23][24] Ablation Studies - The integration of world models significantly improved VLA performance, with discrete action success rates increasing from 62.8% to 67.2% and continuous actions from 91.6% to 94.6% [27][28] - The action attention mask strategy enhanced long-sequence action generation success rates by over 30% [34] Conclusion and Future Directions - RynnVLA-002 establishes a closed-loop ecosystem for robot control, effectively addressing the challenges of perception, understanding, action, and prediction [37][40] - Future enhancements may include the integration of additional modalities like touch and sound, further optimizing the model for complex environments [40]
新国立提出VLA-4D:4D感知VLA模型,实现时空连贯的机器人操作
具身智能之心· 2025-11-25 00:03
Core Concept - The article introduces the 4D perception VLA model, which aims to enhance the spatial and temporal coherence of robotic operations by integrating spatial and temporal information, thereby improving visual reasoning and action planning [2][4]. Group 1: Model Design and Technical Details - The VLA-4D model innovates through dual spatial-temporal fusion, embedding 4D (3D space + 1D time) information into visual representations for reasoning and incorporating time variables into action representations for planning [5]. - The 2D VLA model relies on single-frame image input, leading to rough visual reasoning and spatial inaccuracies, while the 3D VLA model lacks explicit temporal modeling, resulting in motion stuttering [6]. - A "4D embedding + cross-attention fusion" representation method is designed to address the lack of spatial-temporal precision in visual reasoning [7][10]. Group 2: Dataset and Training Process - The existing VLA dataset lacks temporal action annotations, prompting an expansion based on the LIBERO dataset, which includes 40 sub-tasks and 150,000 visual-language-action samples [15][16]. - A two-stage training process significantly improves task success rates and reduces execution times compared to single fine-tuning [17][18]. Group 3: Experimental Validation and Key Findings - In the LIBERO benchmark, the VLA-4D model outperforms state-of-the-art models with a success rate of 97.4% and an average completion time of 5.8 seconds across various tasks [19][21]. - The model demonstrates superior generalization capabilities in zero-shot tasks, maintaining higher success rates and shorter execution times [20]. - Ablation studies confirm the necessity of visual representation modules, showing that the combination of spatial and temporal embeddings enhances success rates and reduces completion times [24][27].
把具身开发变简单,地瓜机器人S600与一站式平台正式亮相
具身智能之心· 2025-11-25 00:03
Core Insights - The article highlights the advancements made by Digua Robotics in the field of embodied intelligence, emphasizing the launch of the S600 high-performance development platform and a one-stop development platform to accelerate the deployment and commercialization of embodied intelligent robots [1][2][29]. Group 1: Product Launches and Features - Digua Robotics introduced the S600, a flagship embodied intelligent robot development platform with a computing power of 560 TOPS (INT8), designed with an efficient brain architecture to support various large model algorithms [8][9]. - The one-stop development platform offers three main services: a data closed-loop system for efficient data generation and annotation, an embodied intelligence training ground for comprehensive support, and agent development services to simplify robot development [11][19]. Group 2: Strategic Partnerships and Ecosystem - The company announced several strategic partnerships with industry leaders such as Fourier, GAC Group, and others, marking them as the first global customers for the S600 platform [20][22]. - Digua Robotics is collaborating with over 60 partners across the industry chain to create integrated hardware and software solutions, significantly lowering development barriers and deployment costs [24][27]. Group 3: Vision and Future Directions - The CEO of Digua Robotics stated that embodied intelligence will reshape efficiency across various industries, and the company aims to provide foundational infrastructure to help developers overcome common challenges in robot development [2][4]. - The company is focused on three key areas: enhancing existing robot products, accelerating the deployment of robots in diverse scenarios, and laying the groundwork for general embodied intelligent robots [24][29].
不知道选择哪个作为具身科研平台?别人已经把π0.5部署上了.......
具身智能之心· 2025-11-24 10:02
Core Viewpoint - The article highlights the launch of the Imeta-Y1, a lightweight and cost-effective robotic arm designed for beginners and researchers in the field of embodied intelligence, emphasizing its open-source tools and user-friendly features [3][4][6]. Product Features - Imeta-Y1 is specifically designed for newcomers and researchers, providing a high-performance robotic arm at an affordable price [3]. - It offers a complete open-source toolchain and code examples, facilitating a seamless workflow from data collection to model deployment [4][18]. - The arm supports dual-language interfaces (Python/C++) and is compatible with ROS1/ROS2, allowing users to quickly adapt regardless of their programming background [4][19]. - It features a compact structure and modular interfaces, making it suitable for embedded AI and robotic learning platform development [7]. Technical Specifications - The robotic arm has a weight of 4.2 kg, a rated load of 3 kg, and 6 degrees of freedom, with a working radius of 612.5 mm and a repeatability precision of ±0.1 mm [9][20]. - It operates at a supply voltage of 24V and communicates via CAN, with a control method that includes trajectory tracking, teaching, and API [9][20]. - The arm's joint motion range and maximum speeds are specified, ensuring precise control for various applications [22]. Development and Support - The company provides a comprehensive open-source SDK, including drivers, API interfaces, sample code, and documentation, supporting rapid application development [31][30]. - Users can leverage multi-modal data fusion capabilities, compatible with mainstream frameworks like TensorFlow and PyTorch, to implement end-to-end intelligent algorithms [37][18]. - The company ensures quick after-sales support, with a 24-hour response time for customer inquiries [20][49]. Testing and Reliability - Rigorous hardware testing processes are in place to validate the arm's accuracy, durability, load performance, and stability across various application scenarios [40][44]. - The product is backed by a six-month warranty against non-human damage, with timely delivery and support services [50][49].