Workflow
具身智能之心
icon
Search documents
面向真机,搞了一套VLA算法部署+量化+世界模型实战教程
具身智能之心· 2025-12-05 00:02
Core Viewpoint - The article discusses the challenges and advancements in the VLA (Variable Learning Algorithm) field, emphasizing the importance of real machine data for effective model training and deployment, as well as the need for practical learning resources in this rapidly evolving area [2][4][14]. Group 1: Data Collection - Data collection methods in VLA primarily include imitation learning and reinforcement learning, with remote operation, VR, and full-body motion capture being key techniques [8]. - Ensuring high-quality data collection is crucial, and methods like real2sim2real are highlighted as important for effective data utilization [8]. Group 2: VLA Training - Before deploying models on real machines, simulation debugging is essential, especially when real machine data is insufficient, utilizing frameworks like Mujoco and Isaac Gym [10]. - Training techniques are critical, with challenges in fine-tuning models and achieving good results with limited data being common issues faced by learners [10][11]. - Some algorithms, such as ACT, are easier to train, while others like π0 and π0.5 require more intricate techniques and experience [11]. Group 3: VLA Deployment - After training, models often require optimization to reduce their size, as VLA models typically have large parameter counts, posing challenges for deployment on edge devices [13]. - Techniques such as quantization and distillation are necessary to minimize parameters while maintaining performance [13]. Group 4: Educational Resources - The article introduces a practical course aimed at helping learners effectively navigate the complexities of VLA, covering hardware, data collection, algorithms, and deployment [14][16]. - The course is designed for various audiences, including those seeking to enter the field, advance their skills, or transition from related areas like traditional computer vision or robotics [24].
人形机器人新突破!敏捷稳定两不误
具身智能之心· 2025-12-05 00:02
Core Idea - The article discusses the AMS (Agility Meets Stability) framework developed by a joint research team from the University of Hong Kong, NVIDIA, and Tsinghua University, which successfully integrates dynamic motion tracking and extreme balance control in humanoid robots using a single strategy [3]. Group 1: Key Innovations of AMS - Heterogeneous Data Sources: AMS generates scalable balance data by directly sampling from the robot's action space, overcoming human data limitations and alleviating long-tail distribution issues [2][17]. - Hybrid Reward Mechanism: AMS employs selective application of balance prior rewards to provide precise balance guidance without sacrificing agility, resolving conflicts in optimization objectives [4][21]. - Adaptive Learning Strategy: The framework dynamically adjusts sampling probabilities and tailors learning for each action, enabling efficient adaptive learning [4][23]. Group 2: Challenges in Humanoid Robotics - Humanoid robots face a dilemma of needing both agile dynamic movement and precise balance control to perform tasks in human environments [5][6]. - Existing research primarily focuses on either dynamic motion tracking or balance control, making it difficult to achieve both capabilities within a unified framework [8][10]. Group 3: Experimental Results - The AMS framework was validated on the Unitree G1 humanoid robot, demonstrating excellent performance in dynamic motion tracking, including activities like shuttle runs and basketball dribbling [24]. - AMS also showcased precise balance control capabilities, effectively managing extreme balance poses [26]. - The framework supports various real-time teleoperation modes, highlighting its practical value as a foundational control model [29]. Group 4: Conclusion - AMS represents a significant advancement in humanoid robot control, combining heterogeneous data sources, a hybrid reward mechanism, and an adaptive learning strategy to achieve both dynamic agility and robust balance, laying a crucial foundation for humanoid robots in human environments [33].
有的同学已经开始叠毛巾,有的还在调硬件......
具身智能之心· 2025-12-04 09:53
Core Viewpoint - The article introduces the Imeta-Y1 robotic arm, designed for embodied intelligence research, emphasizing its affordability, ease of use, and comprehensive support for algorithm deployment and development [6][10]. Group 1: Product Features - Imeta-Y1 is a lightweight, cost-effective robotic arm tailored for beginners and researchers, enabling low-cost and efficient algorithm validation and project development [6][10]. - The robotic arm supports a full-process open-source toolchain, including data collection, model training, and inference deployment, compatible with mainstream frameworks like TensorFlow and PyTorch [22][41]. - It features a compact structure and modular interface, making it suitable for embedded AI and robotic learning platform development [11]. Group 2: Technical Specifications - The robotic arm has a weight of 4.2 kg, a rated load of 3 kg, and 6 degrees of freedom, with a working radius of 612.5 mm and a repeat positioning accuracy of ±0.1 mm [13][24]. - It operates on a 24V power supply and utilizes CAN communication, with a control method that includes trajectory tracking, teaching, and API [13][24]. - The arm's joint movement range includes J1 from -165° to 165°, J2 from -180° to 0°, and J3 from 0° to 180°, with maximum speeds of 180°/s for J1, J2, and J3, and 220°/s for J4, J5, and J6 [26]. Group 3: User Support and Accessibility - The product offers a complete open-source SDK, including drivers, API interfaces, example code, and documentation, supporting both Python and C++ languages for rapid application development [35][41]. - It provides a seamless transition between simulation and real machine operation, allowing users to validate algorithm logic in simulation before deploying to physical devices [27]. - The company ensures a 24-hour quick response for after-sales support, enhancing user experience and learning [8][24].
VLA 模型的泛化能力超乎你的想象:换个新相机和视角推理也能轻松搞定!
具身智能之心· 2025-12-04 03:10
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Weiqi Li等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 VLA模型在分布内任务中表现优异,但在新摄像机视角和视觉扰动下性能急剧下降。研究表明,这种脆弱性主要源于 空间建模 的对齐偏差,而非物理建模问题。 为解决此问题,中山大学等机构研究人员提出了一种单次自适应框架,通过轻量级可学习的参数更新来重新校准视觉表征。首先提出的 特征token调制(FTM) 方 法,对视觉token进行全局仿射变换,仅用4K参数就将Libero数据集的视角准确率从48.5%提升至87.1%。在此基础上, 特征线性自适应(FLA) 方法进一步为ViT编 码器引入低秩更新,以470万参数实现了90.8%的成功率,在远低于LoRA规模微调成本的情况下达到同等效果。这些结果表明,预训练VLA模型中存在大量未被挖 掘的鲁棒性潜力,并且 针对性、极小化的视觉自适应足以恢复模型的视角泛化能力。 VLA模型的泛化性 ...
具身智能之心招募合伙人了~
具身智能之心· 2025-12-04 03:10
Group 1 - The article emphasizes the importance of community support in operating a platform that brings continuous value to the industry [1] - The company invites influential figures in the field to collaborate on various initiatives, including course development, paper guidance, consulting services, corporate training, discipline co-construction, and hardware development [1] Group 2 - The company aims to develop courses that benefit beginners and promote industry advancement, targeting both C-end and corporate training, as well as higher education curriculum development [3] - The goal is to create an affordable and user-friendly research platform for developers and beginners in the field [5] Group 3 - The company offers consulting and training services for both B-end and C-end clients in areas such as embodied data, ontology, algorithms, and deployment, supporting industry upgrades and talent development [7] - The company ensures the protection of personal privacy for individuals currently employed in the industry [7] Group 4 - The company provides competitive compensation within the industry and access to its resources for collaborators [8]
LatBot:中科院团队提出潜在动作蒸馏,提升机器人VLA小样本迁移效率
具身智能之心· 2025-12-04 00:04
本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Zuolei Li等 编辑丨具身智能之心 一、 研究背景与挑战 潜动作学习是视觉-语言-动作(VLA)模型的重要研究方向,核心是从连续帧中提取压缩的运动语义,形成与机器人实体无关的通用表示,从而利用大规模人类 视频扩展训练数据,突破传统机器人数据集的多样性和泛化性限制。 现有潜动作模型(LAM)存在三大关键问题:一是缺乏任务指令引导,无法捕捉与任务相关的变化;二是对多帧信息利用不足,导致潜动作表示不够精确,难 以捕捉运动动态;三是过度关注视觉外观变化,缺乏物理感知,使得潜动作表示与实际可执行动作之间存在语义鸿沟,严重影响下游任务的迁移效果。 二、 核心方法设计 2.1 解耦的潜动作表示 将潜动作分解为两个互补的可学习token,明确区分机器人主动运动与环境被动变化: 通过引入预训练视觉-语言模型(VLM),结合任务指令和多帧输入,将两个可学习token([CP ...
中国移动以亿元战略投资落子,抢占具身智能触觉“必争之地”
具身智能之心· 2025-12-04 00:04
Core Viewpoint - The article emphasizes the growing importance of tactile sensors in robotics, highlighting their role in enabling precise manipulation and interaction with various materials, which is crucial for the advancement of embodied intelligence [3][4][10]. Market Overview - The global tactile sensor market is projected to grow from $15.33 billion in 2024 to $35.59 billion by 2031, with a compound annual growth rate (CAGR) of approximately 12.8% [4]. - The market is currently dominated by overseas leading companies, but domestic manufacturers are enhancing their competitiveness through innovation and market expansion [4]. Company Spotlight: Daimeng Robotics - Daimeng Robotics has recently completed a series of funding rounds, raising significant capital to advance its tactile sensing technology and maintain its leading position in the field of embodied intelligence [6][7]. - The company has achieved a record in angel round financing within the tactile sensor sector, accumulating several hundred million RMB over four rounds in two years [6][7]. - The latest funding round, led by China Mobile Chain Long Fund, is expected to accelerate Daimeng's technological breakthroughs and global expansion [8][9]. Technological Innovations - Daimeng Robotics is focusing on a monochromatic optical tactile sensing approach, which differs from the traditional three-color GelSight method, allowing for better engineering feasibility and mass production [15][17]. - The company has launched a new generation of tactile sensing products, including the DM-Tac series, which cater to various applications and emphasize high performance and reliability [17][19]. Product Development and Market Position - Daimeng's products are priced competitively, starting at 1,299 RMB, significantly lower than many international competitors, which typically range from 6,000 to 7,000 RMB [19]. - The company has achieved mass production of its tactile sensors, with successful certifications and orders from international clients, marking a significant milestone in the early-stage tactile sensor market [19][20]. Future Directions - The article suggests that the integration of tactile sensors into robotic systems is essential for enhancing their operational capabilities, and Daimeng is positioning itself as a comprehensive tactile perception company, focusing on hardware and software solutions [22][25]. - The industry is still exploring the optimal technical pathways for tactile sensors, and practical application and iterative feedback are crucial for advancing the technology [23][25].
为什么给机器人装上昂贵的触觉传感器,反而让它变笨了?
具身智能之心· 2025-12-04 00:04
Core Insights - The article discusses a new approach to multi-modal robot learning that addresses the limitations of traditional feature concatenation methods, which often fail in tasks requiring tactile feedback [3][5][33] - The proposed solution involves using compositional policies, where each sensory modality is trained as a separate expert, allowing for more flexible and robust integration of sensory data [9][12][33] Limitations of Current Methods - Traditional multi-modal robot learning typically relies on feature concatenation, which combines all sensor embeddings into a single vector, leading to significant performance drops in tasks requiring tactile information [5][16] - The feature concatenation method treats rare but critical tactile signals as noise, resulting in a drastic decrease in success rates from 35% to 5% when tactile data is added [3][16] Proposed Solutions - The new approach involves training separate expert policies for each modality, allowing for independent learning and reducing interference between modalities [9][12] - This modular design enables easy addition or removal of sensors without the need to retrain the entire system, thus lowering retraining costs and enhancing system robustness [13][16] Performance Results - The proposed method achieved an average success rate of 66% across four RLBench simulation tasks, outperforming single-modal strategies (49%) and feature concatenation (56%) [29] - In specific tasks, the method demonstrated a success rate of 65% for occluded marker picking, compared to 35% for RGB-only and 5% for the concatenation method [34] Robustness and Adaptability - The system shows robustness to runtime disturbances, such as sudden object removal, and can adapt by reallocating weights to remaining functional sensors [21][23] - It maintains stable performance even when simulating sensor failures, demonstrating the effectiveness of the routing network in managing consensus weights [23][27]
浙大系具身智能再闯港交所:主打工业场景,每天进账1000000元
具身智能之心· 2025-12-04 00:04
Core Viewpoint - The article discusses the recent developments of XianGong Intelligent, a company focused on robotic control systems, as it prepares for its IPO on the Hong Kong Stock Exchange. Despite increasing revenues, the company has faced continuous losses and challenges in cash flow management, which may impact its market position and growth potential [2][4][8][66]. Revenue Growth - XianGong Intelligent has shown consistent revenue growth over the past three years, with revenues of 184 million RMB in 2022, 249 million RMB in 2023, and projected 339 million RMB in 2024, reflecting a compound annual growth rate (CAGR) of 35.7% [5][40]. - The company generates nearly 1 million RMB in revenue daily [6]. Financial Performance - Despite revenue growth, XianGong Intelligent has not reached profitability, accumulating losses of 122 million RMB over three years, with losses of 32.26 million RMB in 2022, 47.70 million RMB in 2023, and 42.31 million RMB in 2024 [8][53]. - The gross profit margins have remained relatively stable, with rates of 46.8%, 49.2%, and 45.9% from 2022 to 2024 [45]. Product Offering - The company focuses on providing solutions for industrial applications rather than consumer-facing robots, with a product matrix that includes controllers, software, robots, and accessories [9][12][30]. - The SRC series controllers, developed in-house, serve as the "brain" of the robots, enabling them to operate autonomously [15][16]. Market Position - XianGong Intelligent has established a strong market presence, serving over 1,600 integrators and end customers across more than 35 countries, including notable clients like Philips and Schneider Electric [34][36]. - The company holds a leading position in the global market for robotic controllers, with a market share of 23.6% in 2024 [37]. Challenges - The company faces challenges related to cash flow, with an increasing accounts receivable turnover period, which has extended from 48 days in 2022 to 116 days in 2025 [66]. - High research and development costs, which amounted to 39.3 million RMB in 2022 and are projected to reach 71.3 million RMB in 2024, contribute to ongoing financial losses [57]. Management and Team - The founding team, consisting of experienced professionals from Zhejiang University, has been instrumental in the company's technological advancements and strategic direction [76][78][84].
都在说VLA,很多同学连demo都跑不好......
具身智能之心· 2025-12-03 10:00
Core Viewpoint - The article discusses the challenges and advancements in the field of VLA (Vision-Language Alignment) models, emphasizing the importance of real machine data and practical applications in robotics and embodied intelligence. Group 1: Challenges in VLA Implementation - Many students struggle with the transition from theoretical knowledge to practical application, often finding it difficult to achieve satisfactory results without hands-on experience [2][6] - The reliance on real machine data for effective training and deployment of VLA models is highlighted, with a focus on the limitations of simulation data [2][8] Group 2: Data Collection and Training - Data collection methods for VLA include imitation learning and reinforcement learning, with a particular emphasis on remote operation and VR techniques [8] - The training of VLA models requires careful tuning and optimization, with specific challenges noted for models like π0 and π0.5, which demand a high level of expertise [10][12] Group 3: Deployment and Optimization - Post-training, VLA models often require optimization techniques such as quantization and distillation to reduce parameter size while maintaining performance [12] - The deployment of VLA models on edge devices presents significant challenges due to their typically large parameter sizes [12] Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn about VLA, covering various aspects such as hardware, data collection, algorithm implementation, and real-world applications [14][30] - The course is designed for a diverse audience, including students and professionals looking to transition into the field of embodied intelligence [27][30]