Workflow
具身智能之心
icon
Search documents
ICCV 2025 Highlight | 大规模具身仿真平台UnrealZoo
具身智能之心· 2025-11-13 02:05
Core Insights - The article introduces UnrealZoo, a high-fidelity virtual environment platform designed to enhance research in embodied AI by providing over 100 diverse and realistic 3D scenes [5][12][72] - UnrealZoo aims to address the limitations of existing simulators by offering a flexible and rich training environment that supports various tasks and enhances the adaptability of AI agents in complex, dynamic settings [7][8][72] Summary by Sections Introduction to UnrealZoo - UnrealZoo is developed using Unreal Engine and includes over 100 high-quality, realistic scenes, ranging from indoor settings to large-scale industrial environments [5][12] - The platform features 66 customizable embodied entities, including humans, animals, and vehicles, allowing for diverse interactions and training scenarios [5][12] Purpose and Necessity - The rapid development of embodied AI necessitates a platform that can simulate diverse and high-fidelity environments to improve the adaptability and generalization of AI agents [7][8] - Existing simulators often limit the scope of AI training to specific tasks, hindering the development of agents capable of functioning in unpredictable real-world scenarios [7][8] Features of UnrealZoo - UnrealZoo provides a comprehensive set of tools, including an optimized Python API and enhanced communication protocols, to facilitate data collection, environment customization, and multi-agent interactions [5][48] - The platform supports various tasks such as visual navigation and active target tracking, demonstrating the importance of diverse training environments for improving model generalization [5][72] Experimental Results - Experiments conducted using UnrealZoo highlight the significant impact of environment diversity on the performance and robustness of AI agents, particularly in complex navigation and social interaction tasks [72] - Results indicate that while reinforcement learning methods show promise, there remains a substantial gap between AI agents and human performance in navigating intricate environments [72] Future Directions - The ongoing development of UnrealZoo will focus on expanding the variety of scenes, entities, and interaction tasks to further enhance the capabilities of embodied AI in real-world applications [72]
首款人形机器人,摔了个“狗啃泥”
具身智能之心· 2025-11-12 09:30
Core Viewpoint - The article discusses the unveiling of Russia's first domestically produced humanoid robot named "Aidol," highlighting its advanced features and the challenges faced during its presentation [2]. Group 1: Product Features - "Aidol" is built primarily with Russian-made components and represents an advanced example of humanoid robotics [2]. - The robot is capable of dialogue, emotion recognition, and can operate offline, with all voice processing conducted independently on the device [2]. Group 2: Event Highlights - During the launch event, a humorous incident occurred where the robot lost balance and fell, which was followed by a small black cloth being placed over it, marking an amusing end to the presentation [3]. Group 3: Industry Comparison - The article notes that domestic manufacturers in other regions are significantly ahead in the field of humanoid robotics, progressing from motion control to more human-like features, thus approaching the definition of embodied intelligence [6].
轻量级VLA模型Evo-1:仅凭0.77b参数取得SOTA,解决低成本训练与实时部署
具身智能之心· 2025-11-12 04:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 视觉-语言-动作(VLA)模型将感知、语言和控制能力统一起来,使机器人能够通过多模态理解执行多样化任务。然而,当前的VLA模型通常包含海 量参数,且高度依赖大规模机器人数据预训练,导致训练过程中的计算成本高昂,同时限制了其在实时推理中的部署能力。此外,多数训练范式常导 致视觉-语言backbone模型的感知表征退化,引发过拟合并削弱对下游任务的泛化能力。 论文名称: Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment 论文链接: https://arxiv.org/abs/2511.04555 来自上海交大、CMU、剑桥大学的团队提出轻量级VLA模型Evo-1,在无需机器人数据预训练的前提下,既降低计算成本又提升部署效率,同时保持 强劲性能。Evo-1基于原生多模态视觉语言模型(VLM),融合创新的交叉调制扩散变换器与优化集成模块,构建高效架构。这里还进一步引入两阶段 训练范式,通过逐步协调动作与感知,完整保留VLM的表征能力。 编辑丨具身智能之心 ...
VLA方向,招募几个辅导的同学~
具身智能之心· 2025-11-12 04:00
2025年还剩不到2个月,有些同学刚结束CVPR,又火急火燎的去准备其它会议了。具身智能之心今年 也带了几名同学,paper已经陆续投出去了,希望能有好的结果。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 感兴趣的同学欢迎联系小助理微信:AIDriver005,备注"具身论文辅导咨询"。 目前我们向全网招募3名VLA方向的同学进行论文辅导,因为要保证质量,所以名额有限。主要方向: VLA模型、轻量化、VLA+触觉、VLA+世界模型、VLA+RL等。 ...
港中文(深圳)冀晓强教授实验室全奖招收博士/博士后
具身智能之心· 2025-11-12 00:03
Core Viewpoint - The article emphasizes the importance of interdisciplinary research in embodied intelligence, highlighting opportunities for doctoral and postdoctoral candidates in deep learning and artificial intelligence, with a focus on high-level research platforms and international collaboration [2][10]. Research Content - Research directions include deep learning and artificial intelligence theories and algorithms [2]. - Candidates are expected to have a strong understanding and interest in core research areas, with the ability to conduct independent theoretical innovation and experimental validation [8]. Candidate Requirements - Candidates should possess relevant degrees in computer science, data science, automation, applied mathematics, or artificial intelligence from reputable institutions [8]. - Experience in publishing research in top international journals or conferences is preferred, showcasing strong research potential [9]. Skills and Qualifications - Familiarity with multimodal large models such as CLIP, BLIP, and LLaVA is essential [3]. - Proficiency in classic models like VAE, Transformer, and BERT, along with strong algorithm design and programming skills, particularly in high-performance languages like C++ or Rust, is advantageous [4][5]. - Understanding of large language model architectures and practical experience in unsupervised pre-training, SFT, and RLHF is a plus [6]. Professor's Profile - Professor Ji Xiaoqiang, with a PhD from Columbia University, leads a research lab focused on intelligent control systems and has published over 50 papers in top-tier journals and conferences [10]. - The lab aims to integrate control theory, artificial intelligence, robotics, high-performance computing, and big data for foundational and original research in intelligent systems [11]. Benefits and Compensation - Postdoctoral candidates may receive a pre-tax living allowance of 210,000 CNY per year, with additional university and mentor-specific compensation [12]. - Doctoral students can receive full or half scholarships covering tuition and living stipends, with top candidates eligible for a principal's scholarship [13]. - Research master's students have opportunities to transition to PhD programs and may receive additional living stipends [14]. Application Materials - Applicants must submit a complete CV in both Chinese and English, along with any published papers and materials demonstrating their research capabilities [15].
NVIDIA最新|Isaac Gym 继任者来啦!解决传统仿真在效率、保真度上的痛点(GPU 加速)
具身智能之心· 2025-11-12 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 NVIDIA团队 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 Isaac Lab 作为 Isaac Gym 的继任者,以 GPU 原生仿真为核心,融合高保真物理引擎、照片级渲染与模块化架构,构建了支持大规模多模态机器人学习的一站式 平台。它不仅解决了传统仿真在效率、保真度与扩展性上的痛点,还整合了感知、控制、数据生成等全流程工具,为机器人学习提供了从模拟训练到真实部署的 完整解决方案,已在 locomotion、操作、导航等多个领域验证了其通用性与高效性。 为什么需要新一代机器人仿真框架? 传统机器人研发面临 "真实场景数据获取难、极端情况测试风险高、算法迭代效率低" 三大核心问题,而现有仿真工具难以同时满足 "高保真、大规模、多模态" 的需求: Isaac Lab 针对性解决这些问题,通过 GPU 全流程加速、标准化数据格式与模块化架构,实现 "高效仿真、灵活扩展、无缝迁移" 的核 ...
从零把pi0部署到你的机械臂上吧!
具身智能之心· 2025-11-12 00:03
面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 支持pi0部署了~ 最近刚把pi0任务打通,代码也会对客户正式开源,助力大家加速具身科研落地。感兴趣的同学可以关注下 ~ 太贵的机械臂买不起,太便宜的又难用、难上手? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完成 算法验证与项目开发。 对小白尤其友好的是: ✅ 提供全流程开源工具链+代码示例,从数据采集到模型部署一气呵成; ✅ 支持 Python / C++ 双语言接口,无论你擅长哪种语言都能快速上手; ✅ 兼容 ROS1 / ROS2,并提供 URDF 模型,仿真与真机无缝切换; ✅ 24小时快速售后响应,遇到问题不卡壳,学习路上有保障! 该机械臂融合高精度运动控制、低功耗设计与开放软硬件架构,支持从仿真到真机的无缝联调,并提供全 流程开源SDK与工具链,助力用户快速实现算法验证、数据采集、模型训练与部署应用。 其紧凑型结构与模块化接口,尤其适用于嵌入式AI与机器人学习平台的开发 ...
美团 “全能突破”:RoboTron-Mani +RoboData实现通用机器人操作
具身智能之心· 2025-11-12 00:03
Core Insights - The article discusses the challenges in robot operation, particularly the dual bottleneck of lacking 3D perception and inefficient data utilization, which hinder the development of versatile robotic systems [2][3][21] - The introduction of RoboTron-Mani, a model that enhances 3D perception and multi-modal fusion, along with the RoboData dataset, aims to overcome these challenges and achieve universal operation across different robots and scenarios [1][3][21] Group 1: Challenges in Current Robot Operation Models - Existing solutions are either limited to 2D visual understanding or rely on single datasets, making them ineffective in diverse physical environments [2][3] - Traditional multi-modal models focus on 2D image understanding, lacking 3D spatial awareness, which results in low accuracy in physical interactions [2] - Single dataset training leads to weak generalization, requiring retraining for different robots or scenarios, which is costly and time-consuming [2][3] Group 2: RoboTron-Mani and RoboData Overview - RoboTron-Mani is designed to provide a comprehensive solution by integrating 3D perception and multi-modal fusion, supported by a unified dataset [3][21] - The model architecture includes a visual encoder, 3D perception adapter, feature fusion decoder, and multi-modal decoder, enabling it to process various input types and produce accurate outputs [7][9][10] - RoboData consolidates multiple public datasets, addressing key issues such as modality completion and spatial alignment, which are critical for effective 3D perception training [11][12][15][16] Group 3: Experimental Results and Performance - RoboTron-Mani has demonstrated superior performance, surpassing expert models in various datasets, achieving a success rate of 91.7% on the LIBERO dataset and 93.8% on the CALVIN dataset [17][18] - The model shows an average improvement of 14.8%-19.6% in success rates compared to existing general models across multiple datasets [18] - Ablation studies confirm the importance of key components, such as the 3D perception adapter, which significantly enhances spatial understanding and task completion rates [19][22] Group 4: Future Directions - The article suggests potential future enhancements, including the integration of additional modalities like touch and force feedback, optimization of model efficiency, and expansion of real-world data to reduce the gap between simulation and real-world applications [21][23]
美团 “全能突破”:RoboTron-Mani +RoboData实现通用机器人操作
具身智能之心· 2025-11-11 03:48
Core Insights - The article discusses the development of RoboTron-Mani, a universal robotic operation strategy that overcomes the limitations of existing models by integrating 3D perception and multi-modal fusion, enabling cross-platform and cross-scenario operations [1][3][21]. Group 1: Challenges in Robotic Operations - Current robotic operation solutions face a "dual bottleneck": either lacking 3D perception capabilities or suffering from data set issues that hinder cross-platform training [2][3]. - Traditional multi-modal models focus on 2D image understanding, which limits their ability to interact accurately with the physical world [2][3]. - Single data set training leads to weak generalization, requiring retraining for different robots or scenarios, which increases data collection costs [2][3]. Group 2: RoboTron-Mani and RoboData - RoboTron-Mani is designed to address the challenges of 3D perception and data modality issues, achieving full-link optimization from data to model [3][21]. - The architecture of RoboTron-Mani includes a visual encoder, 3D perception adapter, feature fusion decoder, and multi-modal decoder, allowing it to process various input types and produce multi-modal outputs [5][7][9][10]. - RoboData integrates nine mainstream public datasets, containing 70,000 task sequences and 7 million samples, addressing key pain points of traditional datasets by completing missing modalities and aligning spatial and action representations [11][12][15][16]. Group 3: Experimental Results and Performance - RoboTron-Mani has demonstrated superior performance across multiple datasets, achieving a success rate of 91.7% on the LIBERO dataset, surpassing the best expert model [18][21]. - The model shows an average improvement of 14.8%-19.6% in success rates compared to the general model RoboFlamingo across four simulated datasets [18][21]. - Ablation studies confirm the necessity of key components, with the absence of the 3D perception adapter significantly reducing success rates [19][22]. Group 4: Future Directions - Future enhancements may include the integration of additional modalities such as touch and force feedback to improve adaptability in complex scenarios [23]. - There is potential for optimizing model efficiency, as the current 4 billion parameter model requires 50 hours of training [23]. - Expanding real-world data integration will help reduce the domain transfer gap from simulation to real-world applications [23].
招募VLA+RL方向的合伙人!
具身智能之心· 2025-11-11 03:48
Core Viewpoint - The company is seeking to recruit a lecturer for an online course focused on VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) to enhance understanding in these areas [1]. Group 1 - The company aims to develop an online course in the VLA and RL domain, responding to community interest [1]. - The ideal candidate for the lecturer position should have a PhD or be a doctoral student in the VLA and RL research area, with experience in top conferences [2]. - The company is recognized as the first full-stack technology communication community in China, focusing on embodied intelligence, and has gathered many individuals interested in VLA and RL [3]. Group 2 - The company offers compensation above the industry average and provides access to extensive industry resources for the lecturer position [4]. - For more detailed information, interested individuals are encouraged to contact via WeChat [5].