具身智能之心
Search documents
斯坦福大学提出RTR框架,让机械臂助力人形机器人真机训练
具身智能之心· 2025-08-28 01:20
Core Insights - The article discusses the emerging focus on motion control of humanoid robots as a key application area for reinforcement learning (RL) algorithms, emphasizing the "Sim-to-Real" paradigm and the challenges associated with transferring learned behaviors from simulation to real-world environments [1][2]. Group 1: Current Challenges and Innovations - Current methods primarily utilize domain randomization to train general control models in diverse simulated environments, aiming for zero-shot transfer to real-world dynamics [1][2]. - Recent efforts have begun to explore fine-tuning models with limited real-world data after simulation pre-training, with notable contributions from institutions like NVIDIA and CMU [2]. - The inherent instability of humanoid robots poses significant risks during real-world training, making direct reinforcement learning in these environments a longstanding challenge [2]. Group 2: Proposed Solutions - The article introduces an innovative approach inspired by human learning, where a "teacher" robotic arm guides a "student" humanoid robot through online reinforcement learning [3][5]. - The teacher arm serves multiple roles: providing safety, assisting in resets after failures, collecting training data, and facilitating a structured learning process through curriculum learning [5][7]. Group 3: RTR System Overview - The proposed system, named RTR (Robot-Trains-Robot), highlights the importance of physical assistance from the teacher robot for effective real-world learning [7][9]. - To address the high costs of real-world data collection, a novel RL algorithm is introduced that optimizes a low-dimensional latent variable related to environmental dynamics, significantly enhancing sample efficiency [7][9]. Group 4: Methodology and Experimental Validation - The RTR system comprises hardware and algorithmic components, featuring a UR5 robotic arm as the teacher and a ToddlerBot humanoid as the student [9][10]. - The Sim-to-Real process is divided into three stages: training adaptable policies in simulation, optimizing a general latent variable, and performing online fine-tuning in the real world [10][12]. - Experimental results demonstrate the effectiveness of the RTR system in tasks such as walking and swinging, showing significant improvements in learning efficiency and performance compared to traditional methods [14][18]. Group 5: Future Implications - The RTR framework not only addresses current limitations in humanoid robot training but also introduces a new paradigm of physical assistance that could be applied to larger humanoid robots and other complex robotic systems [16][19]. - The findings suggest that the integration of teacher robots can enhance the learning process, making it more efficient and stable, which is crucial for advancing real-world applications of humanoid robotics [16][17].
启动招募!外滩大会机器人职业技能表演赛等你来战
具身智能之心· 2025-08-28 01:20
Core Viewpoint - The article emphasizes the importance of embodied intelligence in addressing human safety and operational efficiency in hazardous environments, highlighting the potential of robots to perform tasks in dangerous situations such as deep mining, firefighting, and emergency rescue [2][4]. Group 1: Industry Events and Competitions - The "Artificial Intelligence Hardware Innovation Competition" will feature a live robot skills performance event, inviting partners in the embodied intelligence industry to participate and gain media exposure and collaboration opportunities [4][5]. - The competition will include various challenge areas such as hazardous environment navigation, precision tasks, and emergency rescue operations, with evaluation criteria based on task difficulty, accuracy, efficiency, and autonomy [5]. Group 2: Community and Educational Resources - The "Embodied Intelligence Heart" community offers comprehensive support for academic and research endeavors, including guidance for top-tier conferences and journals, as well as assistance for thesis and competition preparation [7]. - The community serves as a platform for developers and researchers in embodied intelligence, focusing on various technical aspects such as data sets, simulation platforms, and advanced learning models, with resources including over 30 learning paths and 40 open-source projects [7][10].
英伟达通用机器人芯片来了:AI算力提升7.5倍,宇树、银河通用已搭载
具身智能之心· 2025-08-27 00:04
Core Viewpoint - Nvidia has launched its new robot-specific chip, Jetson Thor, which significantly enhances computing power for humanoid robots and other forms, aiming to support advanced embodied intelligence algorithms [3][11]. Group 1: Product Features - Jetson Thor features a GPU with AI computing capability of up to 2070 FP4 TFLOPS, which is 7.5 times more powerful than its predecessor, Jetson Orin, with a power consumption of 130W and energy efficiency 3.5 times better [3][7]. - The memory capacity of Jetson Thor has doubled to 128GB, with a memory bandwidth of 273GB/s [3][7]. - The chip is designed for generative AI model inference, supporting next-generation "physical AI" agents that can operate in real-time on the edge, minimizing reliance on cloud computing [7][10]. Group 2: Software and Ecosystem - Jetson Thor supports all major generative AI frameworks and inference models, enabling developers to conduct local experiments and run inferences efficiently [8][10]. - The product includes a developer kit priced at $3,499 (approximately 25,000 RMB) and a production-level module priced at $2,999 (approximately 21,400 RMB) for bulk orders [11]. Group 3: Market Impact and Partnerships - Major robotics companies, including Yushu Technology and Galaxy General Robotics, have announced plans to integrate Jetson Thor into their products, highlighting its significance in the robotics industry [13][14]. - Nvidia's strategy focuses on supporting the robotics and autonomous driving markets, which are projected to be worth trillions of dollars, while continuing to provide foundational AI infrastructure [18][17].
转行,拿到了具身岗位的offer!
具身智能之心· 2025-08-27 00:04
最近越来越多的同学开始给峰哥传递好消息, 秋招拿到口头offer了、社招成功从自驾转到具身了。 除此之外,还有很多具身机器人公司委托我们结合他们的EDU版本硬件开发更多的教程与功能。 这 个已经在筹备了,后面我们将决定慢慢把这类教程公布到我们的具身社区,促进行业的发展。 "具身智能之心知识星球"目前集视频 + 图文 + 学习路线 + 问答 + 求职交流为一体,是一个综合类的 具身社区,近2000人了。 我们期望未来2年内做到近万人的规模。给大家打造一个交流+技术分享的 聚集地,是许多初学者和进阶的同学经常逛的地方。 社区内部还经常为大家解答各类实用问题:如何使用设备?如何有效采集数据?如何部署VA、VLA 模型等。是采集背景太复杂还是数据比较dirty? 快速解答,方便大家应用到项目中。 一个社区能在大家最需要帮助的时候解决问题,无疑是非常有价值的。 具身智能之心知识星球(国 内首个具身全栈技术社区),目前已经完成了产业、学术、求职、问答交流等多个领域的闭环 。遇 到什么问题就分享什么解决方案,哪块研究最前沿,就给大家源源不断提供解决思路,还有求职岗 位第一时间对接给大家!除了上面的问题,我们还为大家梳理了很 ...
速度提升3倍,CoT推理助力VLA!ECoT-Lite:融合具身机器人推理改善策略的几种机制
具身智能之心· 2025-08-27 00:04
Core Insights - The article discusses the development of efficient training strategies for embodied reasoning in robotics, specifically focusing on the ECoT (Embodied Chain-of-Thought) framework and its lightweight variant, ECoT-Lite, which enhances policy generalization without the need for extensive additional data collection [3][8][30]. Group 1: Motivation and Background - The need for robots to generalize across diverse real-world scenarios has been a long-standing focus in the field of robotics, with various architectures like RT-X and RT-1 showing improved generalization capabilities through extensive training on diverse datasets [2]. - Traditional methods to enhance policy generalization involve collecting more robot datasets, often through tedious human remote control operations [3]. Group 2: ECoT Framework - ECoT improves policy performance by breaking down robot action prediction into a series of reasoning steps, such as identifying object locations and planning sub-tasks, which significantly enhances generalization to new scenes and tasks without requiring additional demonstration data [3][4][5]. - Despite its promise, ECoT incurs significant costs, including the need for detailed reasoning instructions in training data and slower inference speeds due to the extended reasoning steps [3][5]. Group 3: ECoT-Lite Development - ECoT-Lite introduces simpler and lighter alternatives to ECoT, focusing on better representation learning, improved learning processes, and enhanced expressiveness while avoiding the drawbacks of conventional chain-of-thought reasoning [6][8]. - ECoT-Lite achieves state-of-the-art performance on widely used benchmarks like LIBERO, surpassing traditional VLA models by 10-19% while increasing inference speed from 1-1.2Hz to over 3.5Hz [8]. Group 4: Experimental Results - The experiments demonstrate that ECoT-Lite significantly improves performance across various tasks, achieving approximately 90% accuracy on the LIBERO-90 dataset, which is higher than previous state-of-the-art results [54][56]. - Reasoning dropout and reasoning pre-training strategies were found to be particularly effective, with reasoning dropout providing a speed advantage while maintaining high performance [58][92]. Group 5: Implications and Recommendations - The findings suggest that while ECoT is the most performant method, it is also the slowest, making ECoT-Lite variants more practical for real-time applications [90]. - Recommendations include using full ECoT for maximum performance, reasoning dropout for fewer task domains, and reasoning pre-training for more diverse tasks or when unpaired reasoning data is available [92].
3个月!搞透具身大脑+小脑算法
具身智能之心· 2025-08-27 00:04
Core Viewpoint - The exploration of Artificial General Intelligence (AGI) is increasingly focusing on embodied intelligence, which emphasizes the interaction and adaptation of intelligent agents within physical environments, enabling them to perceive, understand tasks, execute actions, and learn from feedback [1]. Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, driving advancements in embodied intelligence technologies [3]. - Major domestic companies like Huawei are launching initiatives such as the "Global Embodied Intelligence Industry Innovation Center" in collaboration with firms like Leju Robotics and Dazhu Robotics to develop key technologies for embodied intelligence [5]. - JD.com has been investing in companies like Zhiyuan Robotics and Qianxun Intelligent since May 2025 to enhance efficiency and service capabilities in logistics and home service scenarios [5]. - Internationally, companies like Tesla and Figure AI are advancing applications in industrial and logistics robotics, while U.S. investment firms are supporting companies like Wayve and Apptronik in autonomous driving and warehouse robotics [5]. Technological Evolution - The development of embodied intelligence has progressed through several stages, from low-level perception to high-level task understanding and generalization, aiming to enhance robots' capabilities in real-world environments [6]. - The first stage focused on grasp pose detection, enabling robots to predict suitable end-effector poses for static object manipulation, but lacked context modeling for complex tasks [6]. - The second stage introduced behavior cloning, allowing robots to learn from expert demonstrations, yet faced challenges in generalization and performance in multi-target scenarios [6]. - The third stage, emerging in 2023, utilized Diffusion Policy methods to improve stability and generalization by modeling action trajectories [7]. - The fourth stage, starting in 2025, explores the integration of VLA models with reinforcement learning and tactile sensing to overcome limitations in feedback and future prediction capabilities [8]. Product and Market Development - The evolution from grasp pose detection to behavior cloning and VLA models signifies a shift towards intelligent agents capable of handling general tasks in open environments, leading to the emergence of various products like humanoid robots and robotic arms across industries such as healthcare and logistics [9]. - The demand for engineering and system capabilities is increasing as embodied intelligence transitions from research to deployment, necessitating higher engineering standards [12].
研二多发几篇论文,也不至于到现在这个地步……
具身智能之心· 2025-08-26 04:45
Core Viewpoint - The article emphasizes the importance of high-quality research papers for graduate students, especially those facing challenges in job hunting or pursuing doctoral studies. It suggests that students should seek professional assistance to enhance their research capabilities and output [1]. Group 1: Research Challenges and Solutions - Many graduate students struggle with producing satisfactory research papers due to lack of guidance from their advisors, leading to confusion in topic selection and paper structure [1]. - The article introduces a professional tutoring service aimed at helping students navigate the research process and improve their paper writing skills [1][8]. Group 2: Tutoring Service Overview - The tutoring service is backed by a team of over 300 experts in fields like autonomous driving and embodied intelligence, with a high acceptance rate of 96% for students they have guided [5]. - The structured 12-week program includes defining research topics, literature review, experimental design, drafting, and submission processes [4]. Group 3: Target Audience and Benefits - The service is designed for students who are facing challenges such as lack of guidance, unclear research frameworks, or those looking to enhance their academic profiles for job applications or further studies [9][10]. - Successful participants may receive recommendations from prestigious institutions and opportunities for internships in leading tech companies [15].
2.5w!英伟达推出机器人“最强大脑”:AI算力飙升750%配128GB大内存,宇树已经用上了
具身智能之心· 2025-08-26 04:45
Core Viewpoint - NVIDIA has launched the Jetson Thor, a new robotic computing platform that significantly enhances AI computing power and efficiency, marking a leap towards the era of physical AI and general robotics [1][6][22]. Group 1: Product Features - Jetson Thor boasts an AI computing power of 2070 TFLOPS, which is 7.5 times higher than its predecessor, Jetson Orin, while achieving a 3.5 times improvement in energy efficiency [1][5]. - The platform includes 128GB of memory, an unprecedented configuration for edge computing devices [2]. - It supports multiple AI models simultaneously on edge devices, enhancing the capabilities of robots to interact with and even change the physical world [5][6]. Group 2: Technical Specifications - The GPU is based on the Blackwell architecture, featuring up to 2560 CUDA cores and 9 fifth-generation Tensor Cores, with support for Multi-Instance GPU (MIG) technology [16]. - The CPU consists of a 14-core Arm Neoverse V3AE, designed for real-time control and task management, with significant performance improvements over previous generations [16]. - Storage and bandwidth are upgraded to 128GB 256-bit LPDDR5X with a memory bandwidth of 273GB/s, supporting large Transformer inference and high-concurrency video encoding [16]. Group 3: Market Adoption - A significant number of Chinese companies, including Union Medical, Wanji Technology, and UBTECH, are among the first to adopt the Jetson Thor platform [19]. - Boston Dynamics is integrating Jetson Thor into its Atlas humanoid robot, enabling it to utilize computing power previously only available in servers [20]. - Agility Robotics plans to use Jetson Thor as the core computing unit for its sixth-generation Digit robot, aimed at logistics tasks in warehouses and manufacturing environments [21]. Group 4: Development and Simulation - NVIDIA emphasizes the importance of a three-computer system for achieving physical AI: a DGX system for training AI, an Omniverse platform for simulation, and the Jetson Thor as the robot's "brain" [22]. - Continuous training, simulation, and deployment cycles are essential for upgrading the robot's capabilities even after deployment [24].
基于大型VLM的VLA模型如何改一步一步推动机器人操作任务的发展?
具身智能之心· 2025-08-26 00:03
Core Viewpoint - The article discusses the transformative impact of large Vision-Language Models (VLMs) on robotic manipulation, enabling robots to understand and execute complex tasks through natural language instructions and visual cues [3][4][5]. Group 1: VLA Model Development - The emergence of Vision-Language-Action (VLA) models, driven by large VLMs, allows robots to interpret visual details and human instructions, converting this understanding into executable actions [4][5]. - The article highlights the evolution of VLA models, categorizing them into monolithic and hierarchical architectures, and identifies key challenges and future directions in the field [9][10][11]. Group 2: Research Contributions - The research from Harbin Institute of Technology (Shenzhen) provides a comprehensive survey of VLA models, detailing their definitions, core architectures, and integration with reinforcement learning and human video learning [5][9][10]. - The survey aims to unify terminology and modeling assumptions in the VLA field, addressing fragmentation across disciplines such as robotics, computer vision, and natural language processing [17][18]. Group 3: Technical Advancements - VLA models leverage the capabilities of large VLMs, including open-world generalization, hierarchical task planning, knowledge-enhanced reasoning, and rich multimodal integration [13][64]. - The article outlines the limitations of traditional robotic methods and how VLA models overcome these by enabling robots to handle unstructured environments and vague instructions effectively [16][24]. Group 4: Future Directions - The article emphasizes the need for advancements in 4D perception and memory mechanisms to enhance the capabilities of VLA models in long-term task execution [5][16]. - It also discusses the importance of developing unified frameworks for VLA models to improve their adaptability across various tasks and environments [17][66].
VLA和VLN技术交流群来啦!
具身智能之心· 2025-08-26 00:03
Group 1 - The establishment of multiple VLA and VLN related communities by the company aims to facilitate discussions on developments in academia, industry, and product implementations [1] - The company encourages individuals interested in VLA/VLN to join the community by adding a specific assistant on WeChat [2]