Workflow
LeRobot
icon
Search documents
π0-FAST正式集成到LeRobot中!pytorch版本来了
具身智能之心· 2026-01-15 00:32
Core Viewpoint - The article discusses the introduction of π0-FAST, a new model by the pi team that integrates visual language model capabilities with FAST (Frequency Domain Action Sequence Tokenization) action encoding technology, significantly improving training speed and precision for complex robotic tasks [1][4]. Group 1 - π0-FAST enhances the training of high-precision operational tasks, achieving a training speed increase of up to 5 times compared to traditional diffusion model methods [1]. - The model addresses the limitations of traditional action encoding methods, which struggle with complex dexterous skill tasks requiring precise control and high-frequency response [3]. - The implementation of π0-FAST has been integrated into the LeRobot framework, which now supports multiple models including π0, π0.5, and π0-FAST, as well as the domestic model WALL-OSS [2][7]. Group 2 - The original π0-FAST implementation was based on the JAX framework, but it has been restructured using PyTorch, incorporating cross-entropy loss objectives, FAST tokenization schemes, and inference optimization techniques such as KV caching [6]. - π0-FAST generates dense action token sequences that can be predicted in a self-regressive manner, aligning its prediction method with that of language tokens, thus solving the challenges faced by traditional methods [4].
π0-FAST正式集成到LeRobot中!pytorch版本来了
具身智能之心· 2026-01-14 09:00
Core Viewpoint - The article discusses the introduction of π0-FAST, a new model by the pi team that integrates visual language model capabilities with FAST (Frequency Domain Action Sequence Tokenization) action encoding technology, significantly improving training speed and precision for complex robotic tasks [1][4]. Group 1 - π0-FAST enhances the training of high-precision operational tasks, achieving a training speed increase of up to 5 times compared to traditional diffusion model methods [1]. - The model addresses the limitations of traditional action encoding methods, which struggle with complex dexterous skill tasks requiring precise control and high-frequency responses [3]. - The integration of π0-FAST into the LeRobot framework allows for improved action sequence compression and self-regressive prediction of dense action tokens, aligning its prediction method with that of language tokens [4]. Group 2 - The original π0-FAST implementation was based on the JAX framework, but it has been restructured using PyTorch, incorporating cross-entropy loss objectives, FAST tokenization schemes, and inference optimization techniques [6]. - The LeRobot framework now supports multiple models, including π0, π0.5, and π0-FAST, as well as the domestic model WALL-OSS [7].
最近开源的一个框架,使用各种SOTA技术训练你的VLA模型
具身智能之心· 2026-01-12 03:36
Core Viewpoint - The article discusses the development of OpenTau, an open-source training toolchain for VLA models, aimed at improving reproducibility, usability, and scalability in model training [1]. Group 1: Industry Pain Points - Existing VLA model training tools like OpenPi and LeRobot lack a one-stop solution, with significant core capabilities missing, failing to meet the advanced training needs of VLA models [3]. - There are issues with mixed data training, as OpenPi and LeRobot do not support heterogeneous datasets with adjustable mixed ratios for collaborative training, discrete action training, or knowledge isolation between VLM and action decoders [3][4]. Group 2: OpenTau Framework Enhancements - OpenTau expands on LeRobot (PyTorch framework), ensuring full compatibility with the LeRobot ecosystem, allowing for the reuse of compliant strategies and datasets [5]. - The framework addresses the limitations of OpenPi by providing native support for the Dropout layer in PyTorch, which was previously only available in Jax [5][6]. - OpenTau improves checkpoint completeness by supplementing the missing text embeddings from LeRobot, ensuring the integrity of model functionality [7]. Group 3: Key Features and Modules - OpenTau supports heterogeneous datasets for collaborative training with adjustable mixing ratios [8]. - New features include discrete action training capabilities, knowledge isolation between VLM backbone and action decoders, and the integration of a Dropout layer to reduce overfitting risks [12]. - The framework includes a built-in reinforcement learning pipeline, supports multi-node and multi-GPU distributed training, and is compatible with simulation environments for model evaluation [12].
用低成本复现这几个Git上最受欢迎的VLA任务
具身智能之心· 2026-01-11 03:02
Core Viewpoint - The article discusses the challenges faced by beginners in the field of VLA (Vision-Language Alignment) tasks due to high costs and the complexity of data collection and model training, while introducing a comprehensive course aimed at addressing these issues and providing practical skills for aspiring professionals in the field [3][5][9]. Group 1: Challenges in VLA Tasks - Many beginners express frustration over the high costs associated with mechanical arms and sensors, which can exceed 15,000 yuan, making it difficult for self-learners or those without equipment to engage in VLA tasks [3]. - Open-source low-cost mechanical arms are available, but many beginners struggle to achieve effective results due to difficulties in data collection and model training [4]. - A significant amount of time is wasted by beginners on common pitfalls, particularly with models like π0 and π0.5, which require specific tricks for data collection and training [5]. Group 2: Course Offerings - The "Embodied Intelligence Heart" platform has successfully replicated methods such as ACT, GR00T, π0, and π0.5 using SO-100 and LeRobot, aiming to help those lacking access to expensive equipment [8]. - A new practical course titled "VLA Small Class for Practical and Job-Seeking" has been developed in collaboration with VLA experts to assist learners in effectively utilizing VLA technologies [9]. - The course covers a wide range of topics, including hardware for robotic arms, data collection, VLA algorithms, evaluation, simulation, deployment of mainstream VLA models, and various real-machine experiments [14]. Group 3: Course Details and Requirements - The course is designed for individuals seeking practical experience and projects in the VLA field, including students at various academic levels and those transitioning from traditional fields like computer vision and robotics [25]. - Participants will receive a SO-100 robotic arm as part of the course package, which includes both teaching and execution arms [18]. - The course aims to equip learners with skills equivalent to 1-2 years of experience as algorithm engineers upon completion [27].
VLA工作正在呈现爆发式增长.......
具身智能之心· 2025-12-20 16:03
Core Viewpoint - The article discusses the rapid development and challenges of the VLA (Whole Body Visual Learning Algorithm) in the field of embodied intelligence, highlighting the importance of real data collection and the difficulties faced by newcomers in the field [2][3][4]. Group 1: VLA Development and Challenges - The VLA algorithm is experiencing explosive growth, with various frameworks and tools, such as reinforcement learning (RL), enhancing its performance [2]. - Data collection methods are diversifying, with millions of open-source data becoming available, indicating a potential for industrialization [2]. - Many practitioners express frustration with the challenges of tuning VLA models and the complexities of data collection, particularly for those new to the field [3][5]. Group 2: Data Collection and Training - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on remote operation and VR for mechanical arms [13]. - Simulation and real-to-sim-to-real (real2sim2real) techniques are crucial for training VLA models, especially when real data is insufficient [14]. - Training techniques are critical, with many practitioners struggling to achieve good results due to the complexity of models like π0 and π0.5, which require high attention to detail [14][10]. Group 3: Model Deployment - After training, VLA models require optimization to reduce their parameter size for deployment, which is essential for edge computing applications [15]. - Techniques such as quantization and distillation are necessary to maintain performance while minimizing model size [15]. Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn VLA effectively, covering hardware, data collection, algorithm deployment, and real-world experiments [17][20]. - The course is designed to save time and reduce the learning curve for newcomers, providing practical experience that can enhance resumes [18][31].
都在说VLA,很多同学连demo都跑不好......
具身智能之心· 2025-12-03 10:00
Core Viewpoint - The article discusses the challenges and advancements in the field of VLA (Vision-Language Alignment) models, emphasizing the importance of real machine data and practical applications in robotics and embodied intelligence. Group 1: Challenges in VLA Implementation - Many students struggle with the transition from theoretical knowledge to practical application, often finding it difficult to achieve satisfactory results without hands-on experience [2][6] - The reliance on real machine data for effective training and deployment of VLA models is highlighted, with a focus on the limitations of simulation data [2][8] Group 2: Data Collection and Training - Data collection methods for VLA include imitation learning and reinforcement learning, with a particular emphasis on remote operation and VR techniques [8] - The training of VLA models requires careful tuning and optimization, with specific challenges noted for models like π0 and π0.5, which demand a high level of expertise [10][12] Group 3: Deployment and Optimization - Post-training, VLA models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [12] - The deployment of VLA models on edge devices presents significant challenges due to their typically large parameter sizes [12] Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn about VLA, covering various aspects such as hardware, data collection, algorithm implementation, and real-world applications [14][30] - The course is designed for a diverse audience, including students and professionals looking to transition into the field of embodied intelligence [27][30]
HuggingFace联合牛津大学新教程开源SOTA资源库!
具身智能之心· 2025-10-27 00:02
Core Viewpoint - The article emphasizes the significant advancements in robotics, particularly in robot learning, driven by the development of large models and multi-modal AI technologies, which have transformed traditional robotics into a more learning-based paradigm [3][4]. Group 1: Introduction to Robot Learning - The article introduces a comprehensive tutorial on modern robot learning, covering foundational principles of reinforcement learning and imitation learning, leading to the development of general-purpose, language-conditioned models [4][12]. - HuggingFace and Oxford University researchers have created a valuable resource for newcomers to the field, providing an accessible guide to robot learning [3][4]. Group 2: Classic Robotics - Classic robotics relies on explicit modeling through kinematics and control planning, while learning-based methods utilize deep reinforcement learning and expert demonstration for implicit modeling [15]. - Traditional robotic systems follow a modular pipeline, including perception, state estimation, planning, and control [16]. Group 3: Learning-Based Robotics - Learning-based robotics integrates perception and control more closely, adapts to tasks and entities, and reduces the need for expert modeling [26]. - The tutorial highlights the challenges of safety and efficiency in real-world applications, particularly during the initial training phases, and discusses advanced techniques like simulation training and domain randomization to mitigate risks [34][35]. Group 4: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential in various scenarios [28]. - The tutorial discusses the complexity of integrating multiple system components and the limitations of traditional physics-based models, which often oversimplify real-world phenomena [30]. Group 5: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs [41]. - The tutorial addresses challenges such as compound errors and handling multi-modal behaviors in expert demonstrations [41][42]. Group 6: Advanced Techniques in Imitation Learning - The article introduces advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data [43][45]. - Diffusion Policy demonstrates strong performance in various tasks with minimal demonstration data, requiring only 50-150 demonstrations for training [45]. Group 7: General Robot Policies - The tutorial envisions the development of general robot policies capable of operating across tasks and devices, inspired by large-scale open robot datasets and powerful visual-language models [52][53]. - Two cutting-edge visual-language-action (VLA) models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise control commands [53][56]. Group 8: Model Efficiency - SmolVLA represents a trend towards model miniaturization and open-sourcing, achieving high performance with significantly reduced parameter counts and memory consumption compared to π₀ [56][58].
手把手带你入门机器人学习,HuggingFace联合牛津大学新教程开源SOTA资源库
机器之心· 2025-10-26 07:00
Core Viewpoint - The article emphasizes the significant advancements in the field of robotics, particularly in robot learning, driven by the development of artificial intelligence technologies such as large models and multi-modal models. This shift has transformed traditional robotics into a learning-based paradigm, opening new potentials for autonomous decision-making robots [2]. Group 1: Introduction to Robot Learning - The article highlights the evolution of robotics from explicit modeling to implicit modeling, marking a fundamental change in motion generation methods. Traditional robotics relied on explicit modeling, while learning-based methods utilize deep reinforcement learning and expert demonstration learning for implicit modeling [15]. - A comprehensive tutorial provided by HuggingFace and researchers from Oxford University serves as a valuable resource for newcomers to modern robot learning, covering foundational principles of reinforcement learning and imitation learning [3][4]. Group 2: Learning-Based Robotics - Learning-based robotics simplifies the process from perception to action by training a unified high-level controller that can directly handle high-dimensional, unstructured perception-motion information without relying on a dynamics model [33]. - The tutorial addresses challenges in real-world applications, such as safety and efficiency issues during initial training phases, and high trial-and-error costs in physical environments. It introduces advanced techniques like simulator training and domain randomization to mitigate these risks [34][35]. Group 3: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential across various scenarios [28]. - The tutorial discusses the "Offline-to-Online" reinforcement learning framework, which enhances sample efficiency and safety by utilizing pre-collected expert data. The HIL-SERL method exemplifies this approach, enabling robots to master complex real-world tasks with near 100% success rates in just 1-2 hours of training [36][39]. Group 4: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs and ensuring training safety [41]. - The tutorial presents advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data by learning the latent distribution of expert behaviors [42][43]. Group 5: Universal Robot Policies - The article envisions the future of robotics in developing universal robot policies capable of operating across tasks and devices, inspired by the emergence of large-scale open robot datasets and powerful visual-language models (VLMs) [52]. - Two cutting-edge VLA models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise robot control commands, with SmolVLA being a compact, open-source model that significantly reduces application barriers [53][56].
250美元起售,还开源,Hugging Face 发布史上最亲民人形机器人
机器之心· 2025-05-31 04:00
Core Viewpoint - Hugging Face has officially open-sourced two humanoid robots, HopeJR and Reachy Mini, moving closer to Elon Musk's prediction of 10 billion humanoid robots by 2040 [1][31]. Group 1: Robot Specifications - HopeJR is a full-sized humanoid robot with 66 degrees of freedom, capable of walking and arm movement [3]. - Reachy Mini is a desktop robot that can move its head, speak, and listen, designed for testing AI applications [5][20]. Group 2: Pricing and Availability - HopeJR is priced at approximately $3,000, while Reachy Mini costs between $250 and $300, depending on tariffs [7]. - The company plans to start shipping the first batch of robots by the end of the year, with a waiting list already open [7]. Group 3: Open Source and Community Impact - The open-sourcing of these robots allows anyone to assemble and understand their workings, democratizing access to robotic technology [7][28]. - Hugging Face aims to build an open-source robotics ecosystem, breaking down barriers to knowledge and technology, making robotics accessible to a wider audience [28][30]. Group 4: Development and Features - HopeJR requires developers to manually control it and record actions for training through imitation learning algorithms [10][12]. - Reachy Mini is designed to help develop AI applications, allowing for testing before deployment in real-world scenarios [20]. Group 5: Previous Initiatives - This is not Hugging Face's first venture into robotics; they previously launched the LeRobot project and the SO-100 robotic arm design [26][28].
速递|Hugging Face全力进军AI机器人:发布两款开源人形机器人,最低仅售250美元
Z Potentials· 2025-05-30 03:23
Core Viewpoint - Hugging Face has launched two new humanoid robots, HopeJR and Reachy Mini, as part of its expansion into the robotics sector, emphasizing open-source technology and affordability [1][3]. Group 1: Product Launch - The company introduced HopeJR, a full-sized humanoid robot with 66 degrees of freedom, capable of walking and arm movements, and Reachy Mini, a desktop robot that can rotate its head, speak, and listen [1]. - The estimated price for HopeJR is around $3,000, while Reachy Mini is priced between $250 and $300, depending on tariff policies [3]. Group 2: Open Source and Accessibility - The open-source nature of these robots allows anyone to assemble, reconstruct, and understand their operation, preventing monopolization by a few large companies [3]. Group 3: Strategic Acquisitions - The launch of these robots is partly attributed to the acquisition of Pollen Robotics, which provided new capabilities for the development of these humanoid robots [4]. Group 4: Future Developments - Hugging Face has been actively entering the robotics industry, with plans to launch LeRobot in 2024, a resource collection that includes open-source AI models, datasets, and tools for building robotic systems [6]. - In 2025, the company released an upgraded version of its 3D printable programmable robotic arm SO-101, developed in collaboration with The Robot Studio [6].