具身智能之心
Search documents
IROS 2025 AIR4S 研讨会:AI + 机器人,正在重塑未来科学
具身智能之心· 2025-10-21 07:20
Core Insights - The article discusses the integration of embodied AI and robotics in scientific research, highlighting the shift from human-led exploration to AI and robot collaboration in scientific discovery [4][5]. Group 1: Workshop Overview - The IROS 2025 AIR4S Workshop will focus on "Embodied AI and Robotics for Future Scientific Discovery," scheduled for October 24, 2025, in Hangzhou, China [6][20]. - The workshop aims to explore how AI and robots can participate in the entire research process, from literature review to hypothesis generation, experimental execution, data analysis, and publication [5][6]. Group 2: Expert Participation - Notable experts from academia and industry will participate, including representatives from Unitree Robotics, MIT, Stanford University, Tencent Robotics X, and the University of Tokyo, discussing various aspects of embodied AI and its applications in scientific discovery [7][9]. Group 3: Research Papers and Innovations - The workshop has received 17 papers covering cutting-edge topics such as AI for Science, robotic scientists, and laboratory automation. A novel AI Review mechanism will be introduced to assist in the paper review process [13][14]. - The integration of AI in scientific evaluation aims to enhance the efficiency and intelligence of research assessments [14]. Group 4: Community and Support - The workshop is supported by various organizations, including NOKOV, Frontiers in Robotics and AI, and Lumina, promoting the cross-disciplinary development of embodied AI and research automation [15]. - The article encourages scholars, students, and industry partners interested in the intersection of AI, robotics, and scientific discovery to join the IROS 2025 AIR4S community for updates and discussions [17].
相约杭州!具身智能之心首次赞助IROS并现场颁奖
具身智能之心· 2025-10-21 01:30
Core Viewpoint - The RoboSense Challenge 2025 aims to systematically evaluate the perception and understanding capabilities of robots in real-world scenarios, addressing the challenges posed by traditional perception algorithms in complex environments [1]. Group 1: Event Overview - The challenge is organized by multiple prestigious institutions, including the National University of Singapore, Nanyang Technological University, and the University of Michigan, among others [4][5]. - It is officially recognized as a competition during the IROS 2025 conference, which will take place in Hangzhou, China [5]. Group 2: Challenge Objectives - The primary goal is to develop socially intelligent autonomous navigation robots that can navigate safely and efficiently in dynamic indoor environments without disrupting human activities [8][10]. - The challenge focuses on creating a perception and navigation system based on RGBD vision and odometry, requiring robots to operate without maps or privileged information [9]. Group 3: Challenge Difficulties - Key challenges include dynamic behavior modeling, social rule encoding, and uncertainty handling in unpredictable environments [12]. - Evaluation metrics will not only consider success rates and path efficiency but also social compliance indicators and collision statistics [12]. Group 4: Recommended Directions - Suggested approaches include using transformer-based social trajectory prediction modules, behavior classifiers for risk assessment, and graph neural networks for multi-target structural modeling [15].
开源对机器人的价值,远超想象丨唐文斌深度对谈抱抱脸联创
具身智能之心· 2025-10-21 00:03
Core Insights - The article discusses the challenges in the field of robotics, particularly the gap between simulation and real-world application, and introduces RoboChallenge.ai as a solution to create a standardized evaluation platform for embodied intelligence [2][42][51]. Group 1: Current Challenges in Robotics - Many models perform well in simulations but fail in real-world scenarios, highlighting a significant pain point in robotics research [2][42]. - The need for a unified, open, and reproducible evaluation system for robotics is emphasized, as current benchmarks are primarily based on simulations [50][44]. Group 2: Introduction of RoboChallenge.ai - RoboChallenge.ai is launched as an open, standardized platform for evaluating robotic models in real-world environments, allowing researchers to remotely test their models on physical robots [6][51]. - The platform enables users to control local models through an API, facilitating remote testing without the need to upload models [8][53]. Group 3: Importance of Open Source in Robotics - Open source is identified as a crucial driver for advancements in AI and robotics, enabling collaboration and innovation across global teams [10][19]. - The article argues that open source in robotics may be even more critical than in large language models (LLMs) due to the necessity of hardware accessibility for model application [20][22]. Group 4: Future Directions and Community Involvement - The article anticipates that the next three to five years will see significant evolution in embodied intelligence research, with robots capable of executing longer and more complex tasks [82]. - Community participation is encouraged, with the expectation that diverse contributions will enhance data availability and model robustness [66][68].
无需再训练!港大团队提出GPC框架,实现机器人「策略组合」
具身智能之心· 2025-10-21 00:03
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 本文一作曹嘉航,香港大学在读博士生,前北京人形机器人创新中心实习生;共同一作黄翊泽,上海交通大学在读本科生;通讯导师 Andrew F. Luo,香港大学助 理教授。 在机器人学习领域,提升基于生成式模型的控制策略(Policy)的性能通常意味着投入巨额成本进行额外的数据采集和模型训练,这极大地限制了机器人能力的快 速迭代与升级。面对模型性能的瓶颈,如何在不增加训练负担的情况下,进一步挖掘并增强现有策略的潜力? 香港大学团队开创性地提出了 GPC(General Policy Composition,通用策略组合) 框架,为这一挑战提供了全新的免训练解决方案。该框架通过在测试时(test- time)对多个预训练模型进行 "策略组合",能够创造出一个性能超越任何单一父策略的 "组合策略"。 GPC 作为一个 "即插即用" 的通用框架,能够灵活融合不同架构(如 Diffusion-base ...
告别 “专家垄断”!AdaMoE 破解 VLA 模型效率与精度两难问题
具身智能之心· 2025-10-21 00:03
Core Viewpoint - The article discusses the AdaMoE architecture, which enhances the performance of Vision-Language-Action (VLA) models in robotic control by decoupling expert selection and weight distribution, leading to improved success rates in both simulation and real-world tasks [1][24]. Summary by Sections Research Background: The Three Dilemmas of VLA Models - Traditional VLA models face three main dilemmas: 1. Difficulty in improving performance due to high training costs, as collecting precise robotic data is resource-intensive [2]. 2. The challenge of real-time control, where dense models require all parameters to be activated, slowing down response times [3]. 3. The inefficiency of using Mixture of Experts (MoE) due to conflicts among experts, which hinders effective task execution [5]. Core Design: The Decoupling Magic of AdaMoE - AdaMoE's innovation lies in its ability to separate the roles of expert selection and performance evaluation, allowing each component to focus on its strengths rather than trying to solve all problems simultaneously [6]. Key Designs of AdaMoE - **Design 1**: Utilizes pre-trained weights to significantly reduce training costs by focusing on fine-tuning specialized skills rather than relearning basic actions [8]. - **Design 2**: Implements "sparse activation" and dual-module decoupling to balance capacity and efficiency while preventing conflicts among experts [9][10]. Key Findings: Advantages of Decoupling - The research team conducted extensive experiments revealing four key conclusions that highlight the superiority of AdaMoE: 1. Experts can effectively specialize in their tasks without interference, leading to improved performance [13]. 2. Decoupling responsibilities enhances performance compared to traditional coupling methods [15]. 3. Fewer, more specialized experts yield better results than a larger number of overlapping experts [19]. 4. Real-world scenarios benefit more from decoupling than simulated environments, with significant improvements in task success rates [22]. Experimental Results: Validation of AdaMoE - AdaMoE demonstrated superior performance across various benchmarks, achieving an average success rate of 96.0%, outperforming traditional models and other architectures [23]. Conclusion: The Breakthrough Significance of AdaMoE - AdaMoE not only improves performance but also provides a pathway for VLA models to operate effectively without excessive resource demands, emphasizing the importance of clear task specialization for both robots and humans [24][26].
最后1个名额!强化学习在人形/四足/机械臂等方向上的应用
具身智能之心· 2025-10-21 00:03
Core Insights - Reinforcement Learning (RL) remains a significant field, with increasing applications in robotics, including humanoid and quadrupedal robots, as well as in product optimization across various industries [1][2][3] - The complexity of RL poses challenges for newcomers, making it difficult to produce publishable research papers without a structured learning system [5][6][9] Group 1: Importance of Reinforcement Learning - RL is crucial for tasks such as gait control in embodied intelligent robots, which is essential for achieving general-purpose capabilities [2] - Companies like Yushun and Zhiyuan utilize RL for humanoid robots to perform complex actions like climbing stairs, running, and dancing, enabling applications in rescue and hazardous environments [2][8] Group 2: Challenges in Learning and Research - The extensive and intricate nature of RL makes it hard for beginners to enter the field, often leading to frustration and abandonment of learning [5][9] - Producing a paper that meets the standards of peer review requires proficiency in methodology, experimental results, and writing, with any misstep potentially resulting in low scores from reviewers [5][6] Group 3: Educational Initiatives - To address the entry barriers in RL research, a specialized 1v6 mentoring course has been launched, targeting graduate students and others needing guidance in paper writing [6][7] - The course includes weekly live sessions, project implementation, experimental guidance, and writing refinement, aiming to help participants produce a draft suitable for submission to top conferences and journals [7][9][15] Group 4: Course Structure and Content - The course spans 14 weeks of intensive online training followed by 8 weeks of maintenance support, focusing on various aspects of RL and robotics [9][15] - Key topics include foundational RL concepts, simulation environments, sim2real techniques, and writing guidance, with a structured approach to ensure participants achieve measurable milestones [15][19][20]
原力灵机提出ManiAgent!会 “动手”,会 “思考”,还会“采数据”!
具身智能之心· 2025-10-20 10:00
Core Insights - The article introduces ManiAgent, an innovative agentic framework designed for general robotic manipulation tasks, addressing limitations in existing Vision-Language-Action (VLA) models in complex reasoning and long-term task planning [1][2][26]. Group 1: Framework Overview - ManiAgent consists of multiple agents that collaboratively handle environment perception, sub-task decomposition, and action generation, enabling efficient responses to complex operational scenarios [2][10]. - The framework employs four key technologies: tool invocation, context engineering, real-time optimization, and automated data collection, creating a complete technical link from perception to action execution [8][12]. Group 2: Performance Metrics - In the SimplerEnv benchmark tests, ManiAgent achieved a task success rate of 86.8%, while in real-world pick-and-place tasks, the success rate reached 95.8% [2][10][28]. - The high success rates indicate that ManiAgent can serve as an effective automated data collection tool, generating training data that can match the performance of models trained on manually annotated datasets [2][10]. Group 3: Methodology - The framework includes four types of agents: 1. Scene perception agent, which generates task-relevant scene descriptions using visual language models [11]. 2. Reasoning agent, which evaluates task states and proposes achievable sub-tasks using large language models [11]. 3. Object-level perception agent, which identifies target objects and extracts detailed information for action generation [11]. 4. Controller agent, which generates executable action sequences based on sub-task descriptions and object details [11]. Group 4: Data Collection and Optimization - The automated data collection system is designed to operate with minimal human intervention, significantly reducing labor costs while ensuring high-quality data for VLA model training [12][21]. - The framework incorporates a context processing mechanism to enhance task relevance and information effectiveness, alongside a caching mechanism to reduce action generation delays [12][17]. Group 5: Experimental Results - In the SimplerEnv simulation environment, various tasks demonstrated an average success rate of 86.8%, with specific tasks achieving rates as high as 95.8% [22][28]. - Real-world experiments with the WidowX 250S robotic arm showed a range of tasks with success rates, indicating the framework's versatility across different operational contexts [25][28].
具身智能之心交流群成立来!VLA/RL/导航/数采等多个方向
具身智能之心· 2025-10-20 10:00
Group 1 - The establishment of a technical exchange group focused on embodied intelligence has been announced, inviting participation from various stakeholders in the field [1] - The group encompasses nearly 20 sub-directions, indicating a broad scope of interest and expertise within the embodied intelligence domain [1] - Participants are encouraged to engage in discussions related to humanoid robots, quadrupeds, robotic arms, and various advanced technologies such as VLA, large models, VLN, reinforcement learning, mobile operations, multi-modal perception, simulation, and data collection [1]
我们的具身社区,最近又增加了很多模块~
具身智能之心· 2025-10-20 03:29
增加了很多模块,我们的具身智能社区进一步完善了! 9月和10月一直在补充我们的具身社区版块,重点增加了VLA、real2sim2real、移动操作、世界模 型、域适应等任务,当然还有很多高质量的直播。 除此之外,目前正在给大家更新一些开源的方案与硬件,后期我们期望能在这些方案的基础上做一 些分享,让每个同学都能完成自己的project。 近一年的搭建,我社区内已经完成了技术路线分享、直播、问答、求职、赛事等多个版块的分享。 实现了产业、学术、求职、问答交流等多个领域的闭环。 1)持续的直播分享 社区为大家准备了很多圆桌论坛、直播,从本体、数据到算法,各类各样,逐步为大家分享具身行 业究竟在发生什么?还有哪些问题待解决。 2)完整的技术路线 针对入门者,我们整理了许多为小白入门的技术栈和路线。 3)产业&项目相关的方案 已经从事相关研究的同学,我们也给大家提供了很多有价值的产业体系和项目方案。 4)内推与求职 星球还和多家具身公司建立了岗位内推机制,欢迎大家随时艾特我们。第一时间将您的简历送到心 仪公司的手上。 **更有料的是:无论你是要找benchmark、还是要找综述和学习入门路线,都能极大缩短检索时间。 ...
MuJoCo教程来啦!从0基础到强化学习,再到sim2real
具身智能之心· 2025-10-20 00:03
Core Insights - The article emphasizes that the field of AI is at a pivotal moment, transitioning from early symbolic reasoning to deep learning breakthroughs and now to the rise of embodied intelligence, which is redefining human-machine relationships [1][3]. Group 1: Embodied Intelligence - Embodied intelligence is characterized by machines that can understand language commands, navigate complex environments, and make intelligent decisions in real-time, moving beyond the realm of virtual space [1]. - Major tech companies like Tesla, Boston Dynamics, OpenAI, and Google are actively developing technologies in this disruptive field, indicating a competitive landscape [1][3]. - The potential impact of embodied intelligence spans across various industries, including manufacturing, healthcare, and space exploration, suggesting a transformative effect on the economy and society [1]. Group 2: Technical Challenges and Solutions - Achieving true embodied intelligence presents unprecedented technical challenges, requiring advancements in algorithms, physical simulation, robot control, and perception fusion [3]. - MuJoCo (Multi-Joint dynamics with Contact) is highlighted as a critical technology for embodied intelligence, serving as a high-fidelity simulation engine that connects virtual and real-world environments [4][6]. - MuJoCo allows researchers to conduct millions of trials in a simulated environment, significantly accelerating the learning process while minimizing risks associated with physical hardware [6][8]. Group 3: MuJoCo's Advantages - MuJoCo's advanced contact dynamics algorithms enable precise simulation of complex interactions between robots and their environments, making it a standard tool in both academia and industry [4][8]. - The engine supports high parallelization, allowing thousands of simulations to run simultaneously, which enhances efficiency in training AI systems [4][6]. - The technology's stability and numerical accuracy ensure reliable long-term simulations, making it a preferred choice for leading tech companies [4][6]. Group 4: Educational Initiatives - A comprehensive MuJoCo development tutorial has been created, focusing on practical applications and theoretical foundations within the context of embodied intelligence [9][11]. - The course is structured into six modules, each with specific learning objectives and practical projects, ensuring a thorough understanding of the technology stack [15][17]. - Participants will engage in hands-on projects that cover a range of applications, from basic robotic arm control to complex multi-agent systems, fostering both theoretical knowledge and practical skills [19][29]. Group 5: Target Audience and Outcomes - The course is designed for individuals with programming or algorithm backgrounds looking to enter the field of embodied robotics, as well as students and professionals seeking to enhance their practical capabilities [32][33]. - Upon completion, participants will possess a complete skill set in embodied intelligence, including proficiency in MuJoCo, reinforcement learning, and real-world application of simulation techniques [32][33]. - The program aims to cultivate a combination of technical, engineering, and innovative skills, preparing participants to tackle complex problems in the field [33].