具身智能之心
Search documents
基于真实数据和物理仿真,国防科大开源具身在线装箱基准RoboBPP
具身智能之心· 2025-12-20 01:02
Core Viewpoint - The article discusses the introduction of RoboBPP, a comprehensive benchmarking system for robotic online bin packing, which integrates real industrial data, physics-based simulation, and embodied execution evaluation, addressing the limitations of existing research in the field [4][28]. Group 1: RoboBPP Overview - RoboBPP is developed by a collaboration between National University of Defense Technology, Institute of Industrial Artificial Intelligence, Wuhan University, and Shenzhen University [2][4]. - It features a highly realistic physics-based simulation environment to assess the physical feasibility and embodied executability of online bin packing algorithms [4][10]. - The system includes three large-scale diverse datasets derived from real industrial processes, which are essential for systematic benchmarking [4][13]. Group 2: Testing and Evaluation Framework - The project employs a scientifically designed multi-level testing setup, progressing from pure mathematical evaluations to physical constraint simulations and finally to robotic execution [15][16]. - Three distinct testing settings are established: Math Pack (pure geometric placement), Physics Pack (introducing physical effects), and Execution Pack (full robotic execution) [16][18]. - A multi-dimensional evaluation metric and normalization scoring system are implemented to provide a comprehensive analysis of algorithm performance across different scenarios [19][20]. Group 3: Experimental Results - The team conducted extensive experiments across three testing settings and three datasets, ranking algorithms based on their overall performance scores [22][23]. - Specific algorithms such as PCT and TAP-Net++ excel in highly repetitive production environments, while transformer-based reinforcement learning strategies are effective in diverse logistics scenarios [24][29]. - The analysis of individual metrics like Occupancy, Trajectory Length, and Collapsed Placement reveals performance characteristics that are not captured in overall scores, guiding algorithm selection for practical packing tasks [24][30]. Group 4: Practical Implications - The findings suggest that algorithms prioritizing compact and efficient space utilization tend to achieve higher occupancy rates [26]. - Stability-related metrics are evaluated for their effectiveness in guiding learning-based methods towards more robust and physically feasible strategies [27][30]. - RoboBPP provides a reproducible and scalable foundation for future research and industrial applications in robotic online bin packing [28].
别让vision拖累VLA中的action!
具身智能之心· 2025-12-20 01:02
Core Insights - The article discusses the challenges and advancements in Visual-Language-Action (VLA) models used in robotics, particularly focusing on the limitations of existing models that rely on low-dimensional sparse action signals to supervise high-dimensional dense visual inputs, which restricts overall performance [6][9]. Research Background - VLA models have shown significant progress but still face issues due to the mismatch between action supervision signals and visual inputs, leading to underutilization of the model's representation capabilities [6]. - The introduction of a visual prediction mechanism is proposed to enhance action generation by predicting future visual states, although high-dimensional visual states often contain redundant information that complicates the training process [8]. Proposed Solutions - Decoupled Visual Forecasting (DVF) is introduced to alleviate the burden on the backbone network by automatically capturing implicit actions and enhancing explicit action generation [7]. - A progressive pre-training approach is suggested to gradually integrate different modalities, introducing language supervision to retain the understanding and reasoning capabilities of the VLA backbone [7]. - Adaptive Temporal Ensemble (ATE) is proposed to dynamically adjust the integration strength during inference, reducing computational costs while maintaining action stability [14]. Architecture Design - The DVF method incorporates implicit action queries and a separate diffusion DVF head, allowing the model to focus on frame-to-frame differences rather than predicting complete future frames [10]. - A progressive training scheme is designed to introduce visual, language, and action information in phases to avoid competition between modalities and achieve stable optimization [10]. Experimental Analysis - Mantis, the proposed model, outperforms existing baseline methods in three out of four tasks on the LIBERO benchmark, achieving the highest average success rate of 96.7% [16][18]. - The convergence speed of Mantis is significantly faster compared to traditional visual prediction methods like UnifiedVLA [20]. - Experiments demonstrate the effectiveness of language supervision in retaining the backbone's capabilities, with Mantis outperforming in both in-domain and out-of-domain instruction tasks [20]. Team Introduction - The research team, SJTU Deng Lab, focuses on generative models and large language models, collaborating with renowned institutions and maintaining a strong research output in top-tier journals and conferences [23].
30亿美元,超越宇树和智元!这家具身公司刷新了人形机器人的最大估值.......
具身智能之心· 2025-12-19 03:00
Core Viewpoint - The company Galaxy General Robotics has achieved a valuation of $3 billion, setting a new record for humanoid robotics in terms of funding and valuation [3][2]. Funding and Valuation - Recently, Galaxy General Robotics completed a new round of financing amounting to $300 million, bringing the total funding to approximately $800 million [2]. - The latest valuation of Galaxy General Robotics has reached $3 billion [3]. - The company was established in May 2023 and quickly secured seed round financing [4]. - In June 2024, Galaxy completed an angel round financing of 700 million yuan [5]. - In November 2024, the company secured 500 million yuan in strategic round financing [6]. - By June 2025, a new round of financing exceeded 1.1 billion yuan, with notable investors including CATL [7]. Investment Interest - The recent $300 million financing attracted international investment from institutions in Singapore and the Middle East [8]. - The capital is drawn to Galaxy General Robotics due to its strong foundation in large models and its pioneering achievements in deploying humanoid robots in real-world applications [8]. - The company has established deep strategic partnerships with major players such as CATL, Toyota, Hyundai, and SAIC, which enhances its potential for diverse application scenarios [8]. Application Areas - In the realm of smart city services, Galaxy's space capsule has been trialed in various locations including the Summer Palace in Beijing, Wangfujing, and Chunxi Road in Chengdu [9]. - The company is also making strides in the warehousing sector, having deployed robots in dozens of retail warehouses, along with applications in the medical field [10].
比LoRA更快更强,全新框架LoFA上线,秒级适配大模型
具身智能之心· 2025-12-19 00:05
Core Insights - The article discusses the limitations of existing visual generative models in meeting personalized user demands, particularly in generating precise outputs based on fine-grained instructions [5][6] - It introduces a new framework called LoFA, which allows for rapid adaptation of large models to personalized tasks without lengthy optimization processes, achieving results comparable to traditional methods [24] Group 1: Background and Challenges - The demand for creative media and visual content has led to the development of powerful visual generative models trained on large datasets, but these models struggle with specific user instructions [5][6] - Traditional methods like parameter-efficient fine-tuning (PEFT) require extensive optimization for each personalized task, making them impractical for real-time applications [6][10] Group 2: LoFA Framework - LoFA is designed to predict personalized LoRA parameters directly from diverse user instructions, enabling fast adaptation of visual generative models [8][10] - The framework incorporates a novel guiding mechanism within a hypernetwork to predict complete, uncompressed LoRA weights, avoiding information loss [11][12] Group 3: Methodology - The learning process in LoFA is divided into two phases: first predicting a simplified response map and then using this knowledge to guide the final LoRA weight prediction [10][11] - This structured approach allows the network to focus on key adaptation areas, enhancing stability and efficiency [11] Group 4: Experimental Analysis - The effectiveness of the LoFA framework was evaluated through systematic experiments in video and image generation tasks, demonstrating superior performance compared to baseline methods [13][14] - In video generation, LoFA was tested on personalized human action video generation and style transfer tasks, while in image generation, it focused on ID personalization [13][14] Group 5: Conclusion and Future Outlook - LoFA overcomes key limitations of existing personalization techniques by eliminating lengthy optimization processes and achieving comparable or superior performance to individually optimized models [24] - Future developments aim to create a unified hypernetwork capable of zero-shot learning for various specific instructions, expanding the framework's applicability [24]
Google 新作背后:机器人测评Evaluation范式正在发生变化
具身智能之心· 2025-12-19 00:05
具身纪元 . 以下文章来源于具身纪元 ,作者具身纪元 见证具身浪潮,书写智能新纪元 编辑丨 具身纪元 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 姚顺雨的在人工智能下半场的文章《The Second Half》,他说:在AI的下半场,技术方案已经很成熟,瓶颈变成了评估。 在具身智能的下半场,模型评估更加重要,也更加复杂。 完整评估单一策略,本身就不容易。 传统的评估方法需要在真机上去测试 ,困难也接踵而至: 第一点,成本高 :在真实硬件上进行大规模测试既费时又费力 尤其是当需要对比多个不同的策略版本时。 如果要提升测试效率,多个硬件的部署在所难免,这又是额外的成本。 控制测评变量的沉默成本也不小,比如要减轻光照的影响,要挑同样光线的情况去做测评 第二点,覆盖面有限: 测评需要设置不同的情况来测试模型是否能够依旧表现出色,但在真实场景中很难穷尽所有现实的情况,比如干扰物、杂乱的桌面和光线等 第三点,安全性风险: 测试机器人的安全性,往往意味着要给机器人去尝 ...
领域首篇RL+VLA 综述:强化学习如何推动 VLA 走向真实世界?
具身智能之心· 2025-12-19 00:05
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Haoyuan Deng等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 Vision-Language-Action(VLA)模型通过融合视觉、语言与动作,为机器人带来了强大的零样本与跨任务泛化能力。但仅依赖模仿学习的 VLA 在真实世界 OOD 场 景中仍然脆弱,缺乏失败恢复、自主探索与闭环纠错能力。 强化学习(RL)正成为连接 VLA 预训练与真实部署的关键桥梁。 由南洋理工大学、北京邮电大学、清华大学联合推出, 本综述系统梳理了 RL-VLA 在"学习—优化—部署"全生命周期中的核心方法与挑战,并从四个维度构建了 完整技术图景:架构、训练范式、真实世界部署以及评估。 一、RL-VLA 架构:从开环推理到闭环优化 RL 通过奖励驱动的策略更新,使 VLA 从"复现示范"转向"结果导向"的闭环决策: 动作建模 A 论文链接(每月更新) :https://doi.org/10.362 ...
堆方块,这款机械臂丝滑跑出了pi0与pi0.5,支持Lerobot框架~
具身智能之心· 2025-12-19 00:05
想让算法快速落地实战的同学,不妨了解一下我们这款机械臂! 成功适配Lerobot啦~ 新手也能轻松解锁的精准实操! 继打通pi0与pi0.5任务后,Imeta-Y1轻量级机械臂现已适配Lerobot ,成功 实现夹取方块精准放入胶带圈的流畅操作,配套代码也将正式开源! 从识别抓取,到稳定搬运,再到对准放置,每一步都见证了算法的持续迭代与机械臂执行表现的稳定性。 让科研更贴近实战,让想法更快得到验证。Imeta-Y1与你一同进步,在具身智能的道路上,走得更稳、更 远。 面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 太贵的机械臂买不起,太便宜的又难用、难上手? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完成 算法验证与项目开发。 对小白尤其友好的是: ✅ 提供全流程开源工具链+代码示例,从数据采集到模型部署一气呵成; ✅ 支持 Python / C++ 双语言接口,无论你擅长哪种语言都能快速上手; ✅ 兼容 ROS1 / ROS2, ...
一起创造价值!具身智能之心招募运营和销售的同学了(全职&实习)
具身智能之心· 2025-12-18 09:30
Group 1 - The company is recruiting for four positions in the field of embodied intelligence and AI, including two full-time and two internship roles [1] - The roles include a social media operations position responsible for managing platforms related to autonomous driving, embodied intelligence, robotics, and large models [1][2] - The sales position focuses on selling educational products related to the same technological areas, including courses and hardware [2][3] Group 2 - The social media operations role requires experience in operating platforms like WeChat and involves tasks such as increasing followers and engagement [2] - The sales role necessitates experience in online product sales and maintaining customer relationships [3] - The company offers flexible salary options for interested candidates [4]
VLA工作正在呈现爆发式增长.......
具身智能之心· 2025-12-18 09:30
Core Viewpoint - The article discusses the rapid growth and potential of VLA (Whole Body Visual Learning) algorithms in the field of embodied intelligence, highlighting the increasing availability of diverse data sources and standardized evaluation metrics, which may lead to industrialization soon [2][12]. Group 1: VLA Development and Challenges - VLA algorithms are experiencing explosive growth, supported by various frameworks and tools like reinforcement learning (RL) that enhance their generalization performance [2]. - Despite the promising direction, many practitioners face challenges with VLA, including difficulties in tuning and data collection, leading to frustrations among newcomers in the field [3][10]. - Real data collection is essential, often requiring hardware setups such as remote operation and VR, but the quality of real-world data can be suboptimal, complicating the training process [5][11]. Group 2: VLA Implementation Modules - The implementation of VLA involves several key modules, including data collection methods based on imitation learning and reinforcement learning, with a focus on ensuring high-quality data [13]. - Training VLA models typically requires simulation debugging, especially when real-world data is insufficient, with frameworks like Mujoco and Isaac Gym being crucial for this process [14]. - After training, VLA models need to undergo a "slimming" process to reduce parameter size for deployment, which involves techniques like quantization and distillation to maintain performance while minimizing resource usage [15]. Group 3: Educational Initiatives - To address the learning curve associated with VLA technologies, a specialized course has been developed, focusing on practical skills and project experience in the field of embodied intelligence [16][19]. - The course covers a comprehensive curriculum, including hardware, data collection, VLA algorithms, evaluation, simulation, and real-world experiments, aimed at equipping participants with the necessary skills for the industry [21][36].
全球首条!具身智能机器人在宁德时代电池产线实现规模化落地
具身智能之心· 2025-12-18 04:00
Core Viewpoint - The successful deployment of the humanoid robot "Xiao Mo" in the production line of CATL marks a significant milestone in the application of embodied intelligence in smart manufacturing, particularly in the production of new energy power battery PACKs [2][5]. Group 1: Technological Advancements - "Xiao Mo" can accurately perform complex tasks such as connecting battery connectors, which were previously reliant on manual labor due to their multi-variety, small-batch, and high-flexibility characteristics [2]. - The robot utilizes an advanced end-to-end vision-language-action (VLA) model, enabling it to adapt to uncertainties like material position deviations and connection point changes in real-time [2]. - The insertion success rate of "Xiao Mo" is maintained at over 99%, with operational efficiency matching that of skilled human workers [2]. Group 2: Operational Efficiency - "Xiao Mo" autonomously detects the status of wire connections and reports anomalies, effectively reducing the defect rate of products [4]. - The robot has achieved a threefold increase in daily workload while maintaining excellent consistency and stability during continuous production tasks of multiple battery models [4]. Group 3: Collaborative Development - "Xiao Mo" was developed by Qianxun Intelligent Robot Company, a subsidiary of CATL, and is equipped with CATL's self-developed batteries, showcasing the results of industry chain collaboration [5]. - CATL collaborated with multiple departments to conduct in-depth research on production line needs, creating a forward-looking and feasible development plan for the large-scale deployment of embodied intelligent robots [5]. Group 4: Future Prospects - The successful implementation of "Xiao Mo" is seen as a starting point for CATL to enhance the automation and intelligence levels of the PACK line, further deepening the synergy between smart manufacturing and green power [5]. - CATL aims to promote the application of embodied intelligent large models in broader scenarios, contributing to the achievement of global carbon neutrality goals [5].