Workflow
LeRobot
icon
Search documents
π0-FAST正式集成到LeRobot中!pytorch版本来了
具身智能之心· 2026-01-14 09:00
>> 点击进入→ 具身智能之心 技术交流群 点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 目前LeRobot已经将π0、π0.5、π0-fast做了支持,除此之外国产模型WALL-OSS也被集成进去。 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 π0-FAST通过信号处理方法压缩动作序列,生成可自回归预测的密集动作词元序列:其预测方式与语言词元完全一致,从而解决了这一难题。 原版pi0-FAST实现仅支持 JAX 框架,本次用PyTorch进行了重构,包含了交叉熵损失目标、FAST分词方案以及KV缓存等推理优化技术。 pi系列工作已基本支持 π0-FAST是pi团队新推出的,一款融合了视觉语言模型的能力与FAST(频域动作序列分词)动作编码技术的模型。该方案使自回归 VLA 模型能够训练高精度操作任 务,这是传统方法无法实现的。除此之外,训练速度相比π0等扩散模型方法提升高达5倍。 目前已经集成到LeRobot框架中。 为什么集成这个工作? 传统机器人动作编码方法通常采用简单的按 ...
最近开源的一个框架,使用各种SOTA技术训练你的VLA模型
具身智能之心· 2026-01-12 03:36
点击下方 卡片 ,关注" 具身智能 之心 "公众号 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 最近看到了一个VLA框架OpenTau,OpenTau是Tensor为前沿VLA模型打造的开源训练工具链——其设计目标是使模型训练具备可复现性、易用性和可扩展性。感兴 趣的同学可以看看~ 链接:https://github.com/TensorAuto/OpenTau 更多VLA相关内容,欢迎加入我们的具身社区,和近3000名同学,200家具身公司与机构一起交流产业、学术、工程落地等。 虽然VLA出了这么多工作,但行业依然有很多痛点 现有的VLA模型训练工具(OpenPi、LeRobot)缺乏一站式解决方案,核心能力存在明显缺失,无法满足前沿VLA模型训练需求; 还有一个是数据混合训练问题,OpenPi和LeRobot均不支持异构数据集可调混合比例协同训练、 离散动作训练、VLM与动作解码器间的知识隔离,也无 风 格强化学习pipeline; ★ 在数据之外,跨平台通用语言也非常重要。 OpenPi的Pa ...
用低成本复现这几个Git上最受欢迎的VLA任务
具身智能之心· 2026-01-11 03:02
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 最近,不少同学想要复现各类vla任务,但苦于"成本太高"一直吐槽。 "能用"的机械臂基本要1.5w以上,加上相机等传感器价格不菲。对很多自学或者没有设备的同学来说是个硬 伤。 开源的低成本机械臂可以用吗? 可以是可以,但大多数初学者"调不出"效果。数据采不好,效果训不出,action总感觉很奇怪。 更多内容,欢迎添加小助理微信AIDriver005了解更多。 项目可以写到简历里面,trick可以作为面试的答案,更多的是节省了很多"踩坑"的时间。 一些同学相当多的时间"浪费"在踩坑上了。 想要把数据、VLA模型、训练优化、部署一整套任务打通,对很多初学者来说非常困难。特别是π0和π0.5、 GR00T这类模型,无论是数据的采集,还是模型的训练都有很多"trick"。 很多同学想低成本完成各类vla任务,预算不够也能入坑。 这一点,我们做到了! 具身 ...
VLA工作正在呈现爆发式增长.......
具身智能之心· 2025-12-20 16:03
Core Viewpoint - The article discusses the rapid development and challenges of the VLA (Whole Body Visual Learning Algorithm) in the field of embodied intelligence, highlighting the importance of real data collection and the difficulties faced by newcomers in the field [2][3][4]. Group 1: VLA Development and Challenges - The VLA algorithm is experiencing explosive growth, with various frameworks and tools, such as reinforcement learning (RL), enhancing its performance [2]. - Data collection methods are diversifying, with millions of open-source data becoming available, indicating a potential for industrialization [2]. - Many practitioners express frustration with the challenges of tuning VLA models and the complexities of data collection, particularly for those new to the field [3][5]. Group 2: Data Collection and Training - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on remote operation and VR for mechanical arms [13]. - Simulation and real-to-sim-to-real (real2sim2real) techniques are crucial for training VLA models, especially when real data is insufficient [14]. - Training techniques are critical, with many practitioners struggling to achieve good results due to the complexity of models like π0 and π0.5, which require high attention to detail [14][10]. Group 3: Model Deployment - After training, VLA models require optimization to reduce their parameter size for deployment, which is essential for edge computing applications [15]. - Techniques such as quantization and distillation are necessary to maintain performance while minimizing model size [15]. Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn VLA effectively, covering hardware, data collection, algorithm deployment, and real-world experiments [17][20]. - The course is designed to save time and reduce the learning curve for newcomers, providing practical experience that can enhance resumes [18][31].
都在说VLA,很多同学连demo都跑不好......
具身智能之心· 2025-12-03 10:00
Core Viewpoint - The article discusses the challenges and advancements in the field of VLA (Vision-Language Alignment) models, emphasizing the importance of real machine data and practical applications in robotics and embodied intelligence. Group 1: Challenges in VLA Implementation - Many students struggle with the transition from theoretical knowledge to practical application, often finding it difficult to achieve satisfactory results without hands-on experience [2][6] - The reliance on real machine data for effective training and deployment of VLA models is highlighted, with a focus on the limitations of simulation data [2][8] Group 2: Data Collection and Training - Data collection methods for VLA include imitation learning and reinforcement learning, with a particular emphasis on remote operation and VR techniques [8] - The training of VLA models requires careful tuning and optimization, with specific challenges noted for models like π0 and π0.5, which demand a high level of expertise [10][12] Group 3: Deployment and Optimization - Post-training, VLA models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [12] - The deployment of VLA models on edge devices presents significant challenges due to their typically large parameter sizes [12] Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn about VLA, covering various aspects such as hardware, data collection, algorithm implementation, and real-world applications [14][30] - The course is designed for a diverse audience, including students and professionals looking to transition into the field of embodied intelligence [27][30]
HuggingFace联合牛津大学新教程开源SOTA资源库!
具身智能之心· 2025-10-27 00:02
Core Viewpoint - The article emphasizes the significant advancements in robotics, particularly in robot learning, driven by the development of large models and multi-modal AI technologies, which have transformed traditional robotics into a more learning-based paradigm [3][4]. Group 1: Introduction to Robot Learning - The article introduces a comprehensive tutorial on modern robot learning, covering foundational principles of reinforcement learning and imitation learning, leading to the development of general-purpose, language-conditioned models [4][12]. - HuggingFace and Oxford University researchers have created a valuable resource for newcomers to the field, providing an accessible guide to robot learning [3][4]. Group 2: Classic Robotics - Classic robotics relies on explicit modeling through kinematics and control planning, while learning-based methods utilize deep reinforcement learning and expert demonstration for implicit modeling [15]. - Traditional robotic systems follow a modular pipeline, including perception, state estimation, planning, and control [16]. Group 3: Learning-Based Robotics - Learning-based robotics integrates perception and control more closely, adapts to tasks and entities, and reduces the need for expert modeling [26]. - The tutorial highlights the challenges of safety and efficiency in real-world applications, particularly during the initial training phases, and discusses advanced techniques like simulation training and domain randomization to mitigate risks [34][35]. Group 4: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential in various scenarios [28]. - The tutorial discusses the complexity of integrating multiple system components and the limitations of traditional physics-based models, which often oversimplify real-world phenomena [30]. Group 5: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs [41]. - The tutorial addresses challenges such as compound errors and handling multi-modal behaviors in expert demonstrations [41][42]. Group 6: Advanced Techniques in Imitation Learning - The article introduces advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data [43][45]. - Diffusion Policy demonstrates strong performance in various tasks with minimal demonstration data, requiring only 50-150 demonstrations for training [45]. Group 7: General Robot Policies - The tutorial envisions the development of general robot policies capable of operating across tasks and devices, inspired by large-scale open robot datasets and powerful visual-language models [52][53]. - Two cutting-edge visual-language-action (VLA) models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise control commands [53][56]. Group 8: Model Efficiency - SmolVLA represents a trend towards model miniaturization and open-sourcing, achieving high performance with significantly reduced parameter counts and memory consumption compared to π₀ [56][58].
手把手带你入门机器人学习,HuggingFace联合牛津大学新教程开源SOTA资源库
机器之心· 2025-10-26 07:00
Core Viewpoint - The article emphasizes the significant advancements in the field of robotics, particularly in robot learning, driven by the development of artificial intelligence technologies such as large models and multi-modal models. This shift has transformed traditional robotics into a learning-based paradigm, opening new potentials for autonomous decision-making robots [2]. Group 1: Introduction to Robot Learning - The article highlights the evolution of robotics from explicit modeling to implicit modeling, marking a fundamental change in motion generation methods. Traditional robotics relied on explicit modeling, while learning-based methods utilize deep reinforcement learning and expert demonstration learning for implicit modeling [15]. - A comprehensive tutorial provided by HuggingFace and researchers from Oxford University serves as a valuable resource for newcomers to modern robot learning, covering foundational principles of reinforcement learning and imitation learning [3][4]. Group 2: Learning-Based Robotics - Learning-based robotics simplifies the process from perception to action by training a unified high-level controller that can directly handle high-dimensional, unstructured perception-motion information without relying on a dynamics model [33]. - The tutorial addresses challenges in real-world applications, such as safety and efficiency issues during initial training phases, and high trial-and-error costs in physical environments. It introduces advanced techniques like simulator training and domain randomization to mitigate these risks [34][35]. Group 3: Reinforcement Learning - Reinforcement learning allows robots to autonomously learn optimal behavior strategies through trial and error, showcasing significant potential across various scenarios [28]. - The tutorial discusses the "Offline-to-Online" reinforcement learning framework, which enhances sample efficiency and safety by utilizing pre-collected expert data. The HIL-SERL method exemplifies this approach, enabling robots to master complex real-world tasks with near 100% success rates in just 1-2 hours of training [36][39]. Group 4: Imitation Learning - Imitation learning offers a more direct learning path for robots by replicating expert actions through behavior cloning, avoiding complex reward function designs and ensuring training safety [41]. - The tutorial presents advanced imitation learning methods based on generative models, such as Action Chunking with Transformers (ACT) and Diffusion Policy, which effectively model multi-modal data by learning the latent distribution of expert behaviors [42][43]. Group 5: Universal Robot Policies - The article envisions the future of robotics in developing universal robot policies capable of operating across tasks and devices, inspired by the emergence of large-scale open robot datasets and powerful visual-language models (VLMs) [52]. - Two cutting-edge VLA models, π₀ and SmolVLA, are highlighted for their ability to understand visual and language instructions and generate precise robot control commands, with SmolVLA being a compact, open-source model that significantly reduces application barriers [53][56].
250美元起售,还开源,Hugging Face 发布史上最亲民人形机器人
机器之心· 2025-05-31 04:00
Core Viewpoint - Hugging Face has officially open-sourced two humanoid robots, HopeJR and Reachy Mini, moving closer to Elon Musk's prediction of 10 billion humanoid robots by 2040 [1][31]. Group 1: Robot Specifications - HopeJR is a full-sized humanoid robot with 66 degrees of freedom, capable of walking and arm movement [3]. - Reachy Mini is a desktop robot that can move its head, speak, and listen, designed for testing AI applications [5][20]. Group 2: Pricing and Availability - HopeJR is priced at approximately $3,000, while Reachy Mini costs between $250 and $300, depending on tariffs [7]. - The company plans to start shipping the first batch of robots by the end of the year, with a waiting list already open [7]. Group 3: Open Source and Community Impact - The open-sourcing of these robots allows anyone to assemble and understand their workings, democratizing access to robotic technology [7][28]. - Hugging Face aims to build an open-source robotics ecosystem, breaking down barriers to knowledge and technology, making robotics accessible to a wider audience [28][30]. Group 4: Development and Features - HopeJR requires developers to manually control it and record actions for training through imitation learning algorithms [10][12]. - Reachy Mini is designed to help develop AI applications, allowing for testing before deployment in real-world scenarios [20]. Group 5: Previous Initiatives - This is not Hugging Face's first venture into robotics; they previously launched the LeRobot project and the SO-100 robotic arm design [26][28].
速递|Hugging Face全力进军AI机器人:发布两款开源人形机器人,最低仅售250美元
Z Potentials· 2025-05-30 03:23
Core Viewpoint - Hugging Face has launched two new humanoid robots, HopeJR and Reachy Mini, as part of its expansion into the robotics sector, emphasizing open-source technology and affordability [1][3]. Group 1: Product Launch - The company introduced HopeJR, a full-sized humanoid robot with 66 degrees of freedom, capable of walking and arm movements, and Reachy Mini, a desktop robot that can rotate its head, speak, and listen [1]. - The estimated price for HopeJR is around $3,000, while Reachy Mini is priced between $250 and $300, depending on tariff policies [3]. Group 2: Open Source and Accessibility - The open-source nature of these robots allows anyone to assemble, reconstruct, and understand their operation, preventing monopolization by a few large companies [3]. Group 3: Strategic Acquisitions - The launch of these robots is partly attributed to the acquisition of Pollen Robotics, which provided new capabilities for the development of these humanoid robots [4]. Group 4: Future Developments - Hugging Face has been actively entering the robotics industry, with plans to launch LeRobot in 2024, a resource collection that includes open-source AI models, datasets, and tools for building robotic systems [6]. - In 2025, the company released an upgraded version of its 3D printable programmable robotic arm SO-101, developed in collaboration with The Robot Studio [6].