Workflow
具身智能之心
icon
Search documents
VLA方向,想再带几个同学冲一下具身的A会......
具身智能之心· 2025-11-10 10:00
Group 1 - The article highlights the urgency for students to prepare for upcoming conferences after CVPR, indicating a competitive academic environment in the field of embodied intelligence [2] - The organization is recruiting three students for guidance in the VLA (Vision-Language Alignment) direction, emphasizing the importance of quality in mentorship [2] - Key research areas for the VLA direction include VLA models, lightweight models, VLA combined with tactile feedback, VLA with world models, and VLA with reinforcement learning [2]
聊聊在线强化学习是怎么微调π0和π0.5的?为什么性能最高能提升50%以上?
具身智能之心· 2025-11-10 03:30
Core Viewpoint - The article discusses the introduction of the πRL framework, which enhances flow-based vision-language-action (VLA) models through online reinforcement learning (RL) fine-tuning, significantly improving their performance and generalization capabilities [5][7]. Group 1: Introduction to VLA Models - VLA models enable robots to understand and execute complex tasks through multimodal inputs, but large-scale RL applications face challenges due to the difficulty in handling action log-likelihood during the iterative denoising process [5]. Group 2: πRL Framework - The πRL framework, developed by teams from Tsinghua University and Peking University, addresses the challenges of applying large-scale RL to flow-based VLA models by training them in parallel simulations [6]. Group 3: RL Algorithms in πRL - πRL implements two RL algorithms: 1. FlowNoise models the denoising process as a discrete-time Markov Decision Process (MDP) using a learnable noise network for precise log-likelihood calculations [7]. 2. Flow-SDE combines the denoising process with agent-environment interaction, constructing a dual-layer MDP that transitions from ODE to SDE for efficient RL exploration [7]. Group 4: Performance Evaluation - In benchmark tests, πRL significantly improved the performance of few-shot SFT models π0 and π0.5 from 57.6% to 97.6% and from 77.1% to 98.3% on the LIBERO dataset, respectively [7]. - In the ManiSkill benchmark, πRL demonstrated scalable multi-task RL capabilities across 4,352 grasping and placing tasks using 320 parallel environments [7]. Group 5: Conclusion - Overall, πRL shows substantial performance enhancements and stronger generalization compared to SFT models, validating the effectiveness of online RL in flow-based VLA models [7].
机器人训练,北京男大有了技能玩法
具身智能之心· 2025-11-10 00:02
作者丨 量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 还得是大学生会玩啊(doge)! 网上正高速冲浪中,结果意外发现:有男大竟找了个机器人队友?而且机器人还相当黏人(bushi~ 白天超市打工它要跟着,一看东西装好就立马乐颠颠帮忙拉小推车,上楼下楼忙个不停: 等到中午去食堂兼职,它也自告奋勇帮忙推餐车,而且指哪打哪 (拍拍头就知道你想让它停下) : 甚至,一天劳作结束后,连健身它也要一起。既然来都来了,男大表示:那就练起来! 笑死,感觉可以以机器人视角去拍vlog了,标题就叫《高能量之机器人的一天》。 言归正传,不知道大家发现没有,图中男大和机器人伙伴的交流都是通过拍拍头、拉拉身体搞定的, 既没有遥控、也没有语音 。 这就有点东西了!要知道目前绝大多数机器人都是靠外部传感器 (摄像头、激光雷达等) 和遥控驱动的,而这群男大竟提出了一种全新的方 式——仅通过 "本体感知(Proprioception)" 就能和外界交互。 好好好,搞半天人家这 ...
银河通用全新模型统一机器人导航任务,7B参数模型支持实时部署
具身智能之心· 2025-11-10 00:02
Core Insights - The article discusses the development of NavFoM, a foundational model for embodied navigation that aims to unify navigation tasks across different robots and scenarios, marking a significant technological leap from specialized to general-purpose navigation [1][29]. Group 1: Unified Navigation Paradigm - NavFoM is based on a fundamental idea of unifying different robot navigation tasks into a common paradigm: streaming video input from robots combined with natural language navigation instructions to predict action trajectories [3]. - The model supports multiple tasks such as visual language navigation, target search, target following, and autonomous driving, across various environments including indoor and outdoor settings, and is applicable to different types of robots like quadrupeds, wheeled robots, humanoids, drones, and cars [3][29]. Group 2: Model Structure and Efficiency - The model features TVI Tokens, which provide a scalable method for understanding images under different tasks and camera settings, enhancing the model's adaptability [5]. - To enable real-time deployment of the 7B parameter navigation model, the team introduced the Budget-Aware Token Sampling Strategy (BATS), which adaptively samples key frames under computational constraints to maintain performance while ensuring efficient operation on real robots [6][11]. Group 3: Training Data and Performance - The team trained NavFoM on 8 million navigation data entries, including various tasks and robot types, as well as 4 million entries of open-world question-answering data, effectively doubling the training volume compared to previous works [12][15]. - NavFoM achieved state-of-the-art (SOTA) and SOTA-comparable results across multiple public benchmarks without requiring task-specific fine-tuning, demonstrating its versatility and effectiveness [16][29]. Group 4: Future Implications - The development of NavFoM signifies a move towards generalization in embodied navigation models, enabling cross-industry applications and fostering further research in intelligent navigation technologies [29]. - The team aims to inspire new technologies, datasets, and benchmarks in the field of embodied navigation, accelerating innovation in intelligent services and production capabilities [29].
具身的大小脑路线都在这里了......
具身智能之心· 2025-11-10 00:02
Core Insights - The exploration towards Artificial General Intelligence (AGI) highlights embodied intelligence as a key direction, focusing on the interaction and adaptation of intelligent agents within physical environments [1] - The development of embodied intelligence is marked by the evolution of its core components, the brain and cerebellum, which are essential for perception, task understanding, and action execution [1] Industry Analysis - In the past two years, numerous star teams in the field of embodied intelligence have emerged, establishing valuable companies such as Xinghaitu, Galaxy General, and Zhujidongli, driving advancements in embodied intelligence technologies [3] - Major domestic companies like Huawei, JD, Tencent, and Ant Group are actively investing and collaborating to build a robust ecosystem for embodied intelligence, while international players like Tesla and Wayve are focusing on industrial applications and autonomous driving [5] Technological Evolution - The evolution of embodied intelligence technology has progressed through several stages, from low-level perception to high-level task understanding and generalization [6] - The first stage focused on grasp pose detection, while the second stage introduced behavior cloning, allowing robots to learn from expert demonstrations [6][7] - The introduction of Diffusion Policy methods in 2023 marked a significant advancement, enhancing stability and generalization in task execution [6][9] - The current phase emphasizes the integration of Vision-Language-Action (VLA) models, which enable robots to understand human instructions and perform complex tasks [7][9] Future Directions - The industry is exploring the fusion of VLA models with reinforcement learning, world models, and tactile sensing to overcome existing limitations [9][11] - This integration aims to enhance robots' capabilities in long-term tasks, future prediction, and multi-modal perception, expanding their operational boundaries [11][12] Educational Initiatives - There is a growing demand for engineering and system capabilities in the field of embodied intelligence, prompting the development of comprehensive educational programs [17] - These programs aim to equip participants with practical skills in simulation, model training, and the deployment of advanced embodied intelligence architectures [17][20]
迭代模型与累积数据才是正解!灵巧智能软硬全系列平台亮相25年世界互联网大会
具身智能之心· 2025-11-10 00:02
Core Insights - The article discusses the significance of the 2025 World Internet Conference in Wuzhen, highlighting the focus on embodied intelligence and robotics, particularly the insights shared by the industry leader, Lingqiao Intelligent [2]. Group 1: Company Overview - Lingqiao Intelligent is a leading company in the field of dexterous manipulation, aiming to enhance the development of humanoid and industrial robots through advanced technology and a strong research team [7]. - The company has launched three models of dexterous hands within the past 1.5 years, showcasing high flexibility and industrial-grade reliability, with prices starting below 10,000 yuan [8]. Group 2: Product Development - The DexHand021 series includes various models designed for different applications, with the Pro version featuring 16 active joints and advanced tactile feedback capabilities [8]. - Lingqiao Intelligent has developed cost-effective composite robotic platforms that integrate dexterous hands, enabling a wide range of tasks such as inspection, logistics sorting, and service [13]. Group 3: Data Challenges - The article emphasizes that the lack of high-quality, large-scale data is a significant bottleneck for the practical application of embodied intelligence [16][26]. - Lingqiao Intelligent has introduced DexCanvas, an open-source dataset aimed at addressing the scarcity of dexterous operation data, which includes comprehensive multi-finger force and contact annotations [20]. Group 4: Data Collection Systems - The company has developed the DexCap data collection system, which offers high accuracy and efficiency for gathering dexterous operation data [23]. - The integration of DexCanvas and DexCap is expected to facilitate the deployment of dexterous manipulation technologies across various industries [27]. Group 5: Industry Outlook - The article concludes that the successful implementation of embodied intelligence relies on continuous iteration and optimization of both hardware and data collection processes [31]. - The theme of the World Internet Conference emphasizes the importance of collaboration among industry partners to achieve sustainable development in the field of embodied intelligence [32].
史上规模最庞大、最多元的真实世界操作数据集!具身领域的Scaling Law来了~
具身智能之心· 2025-11-09 14:08
Core Insights - The article discusses the introduction of GEN-0, a new type of embodied foundational model designed for multimodal training based on high-fidelity physical interactions, which aims to enhance robotic intelligence through real-world data [5][9]. Group 1: Model Characteristics - GEN-0 has been developed to capture human-level reflexes and physical common sense, featuring a core characteristic called "harmonic reasoning" that allows seamless training of thinking and action [5]. - The model has surpassed the critical threshold of 7 billion parameters, showing a phase transition where smaller models become stagnant while larger models continue to improve [6][11]. - GEN-0 demonstrates a strong scaling law, indicating that increased pre-training data and computational power predictably enhance the model's performance across multiple tasks [6][11]. Group 2: Data Utilization - The model is pre-trained on over 270,000 hours of real-world heterogeneous manipulation data, with the dataset expanding at a rate of over 10,000 hours per week [22]. - The data collection comes from diverse operational scenarios across thousands of households, warehouses, and workplaces, aiming to cover all conceivable operational tasks [24]. Group 3: Implications for Robotics - GEN-0 signifies a new era in embodied foundational models, where capabilities will grow predictably with real physical interaction data rather than relying solely on text, images, or simulated data [9]. - The findings highlight that smaller models struggle to process complex sensory-motor data during pre-training, while models with over 70 billion parameters can internalize large-scale pre-training data and quickly adapt to downstream tasks with minimal fine-tuning [15][11].
西湖大学最新!RobustVLA:面向VLA模型的鲁棒性感知强化后训练方法(优于SOTA方案)
具身智能之心· 2025-11-08 04:00
Core Insights - The article discusses the development of RobustVLA, a lightweight online reinforcement learning post-training method aimed at enhancing the robustness of Vision-Language-Action (VLA) models in the face of environmental uncertainties [1][5][20] - It highlights the limitations of existing methods that focus primarily on reward maximization without addressing the model's sensitivity to disturbances, which can lead to significant performance drops in real-world scenarios [5][20] Design Logic of RobustVLA - RobustVLA incorporates two key regularization terms: Jacobian regularization to reduce sensitivity to observation noise and smoothness regularization to stabilize policies under action disturbances [4][7][8] - The method emphasizes the importance of robustness-aware reinforcement learning post-training as a critical step in improving the reliability of VLA models [1][5] Robustness Analysis - The article outlines a theoretical analysis of robustness, establishing error amplification bounds, reward drift control, and guarantees for robust stability [4][11][18] - It identifies that the Jacobian sensitivity directly impacts error amplification, and reducing this sensitivity can effectively constrain performance loss [12][18] Experimental Results - In experiments, RobustVLA demonstrated an average success rate of 82.5% under observation perturbations, outperforming previous models like OpenVLA-OFT and RIPT-VLA [20][21] - Under action perturbations, RobustVLA achieved an average success rate of 54.8%, exceeding OpenVLA-OFT's 53.5% [22] - In scenarios with combined disturbances, RobustVLA-C achieved an average success rate of 82.1%, showcasing the synergy of autonomous interaction and dual regularization [23] Transfer Learning and Ablation Studies - Transfer learning experiments showed that RobustVLA improved out-of-distribution adaptability by 8.0% and 16.0% in specific tasks compared to zero-shot transfer [25] - Ablation studies confirmed that removing either Jacobian or smoothness regularization led to performance declines, underscoring the necessity of both regularization strategies for enhancing robustness [27]
今晚重磅圆桌讨论:让你的本体轻松实现高质量数采!
具身智能之心· 2025-11-08 00:03
Core Insights - The article discusses the challenges of acquiring high-quality data in the field of embodied intelligence, emphasizing the difficulties in data collection as a central issue [2] - It highlights the importance of various data sources, including teleoperation and motion capture systems, and the balance between precision and freedom in data collection [2] - The article outlines a roundtable discussion featuring industry experts to analyze the underlying logic, technical bottlenecks, and innovative solutions for data acquisition in embodied intelligence [2] Data Collection Challenges - High-quality embodied data is hard to find due to the inherent difficulties in data collection [2] - The article mentions the necessity of teleoperation as a core data source and the challenges faced in motion capture systems regarding precision and freedom [2] - It discusses the end-to-end data loop paradigm revealed by benchmark projects like ALOHA, and how various data types (internet video, synthetic data, and real robot data) can be integrated into a cohesive framework [2] Expert Contributions - The roundtable features notable guests, including: - Cui Hanqing, CEO of Beijing Moxianfei Technology Co., with a background in computer science and experience at Microsoft [3] - Feng Qian, co-founder and technical lead at Amio, with a PhD from the Technical University of Munich [4] - Ding Zhezhang, co-founder of AIO Intelligent, with a master's degree from Peking University [4] - Mu Shilong, CEO of Xspark AI, a young entrepreneur focusing on robotic dexterous manipulation [4] - Ding Yan, co-CTO of Luming Robotics, with prior experience at Yixing Robotics [5] Future Discussions - The article mentions a reaction segment that will cover recent hot topics, including 1X Technologies' "NEO" and Xiaopeng's IRON, indicating ongoing developments in the industry [22]
具身领域的图文+问答+路线+视频+研报来了!
具身智能之心· 2025-11-08 00:03
Core Insights - The article focuses on the development and research of embodied intelligence, highlighting key companies and laboratories involved in the field [2][10][21] - It provides recommendations for suitable robotic platforms for research, including SO-100 series, Openarm series, and XLerobot series [3][5][7] - The article discusses various algorithmic approaches and deployment strategies, emphasizing cloud-based inference and edge computing solutions [10][20] Industry Overview - The article identifies active companies and laboratories engaged in the development of embodied brains and robotics, noting a competitive landscape [2][21] - It mentions the increasing interest in embodied intelligence research, with a community established for knowledge sharing and collaboration [2][14][20] Product Recommendations - Recommended robotic platforms include: - SO-100 series, which supports various algorithms [3] - Openarm series, designed for dual-arm tasks but lacks mobility [5] - XLerobot series, suitable for entry-level research and personal development [7] - Other high-cost development platforms are also mentioned, such as Ark Infinite and Xinghai Map [9] Algorithmic and Deployment Insights - The article outlines several algorithmic directions, including VLA (Vision-Language-Action), VLN (Vision-Language Navigation), and control strategies [10] - Deployment strategies are primarily focused on cloud inference, with some companies like Xiaopeng completing deployments based on self-developed chips [10] Community and Educational Resources - The community offers various resources for newcomers, including technical stacks and learning routes [16][21] - It provides job referral mechanisms and opportunities for networking with industry professionals [20][24] Research and Development Focus - The article highlights ongoing research in areas such as data collection, dexterous manipulation, and multi-sensor fusion perception [15][21] - It emphasizes the importance of collaboration between academia and industry to advance the field of embodied intelligence [21][24]