具身智能之心
Search documents
首个文本到3D生成RL范式诞生,攻克几何与物理合理性
具身智能之心· 2025-12-20 16:03
论文链接: https://arxiv.org/pdf/2512.10949 代码链接: https://github.com/Ivan-Tang-3D/3DGen-R1 强化学习是否能够用于Text-to-3D生成,以加强3D自回归模型的逐步推理与生成过程? 点击下方 卡片 ,关注" 具身智能之心 "公众号 编辑丨 量子位 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在大语言模型和文生图领域,强化学习 (RL) 已成为提升模型思维链与生成质量的关键方法。 但当我们将目光转向更为复杂的文本到3D生成时,这套方法还会还管用吗? 近期,一项由 西北工业大学、北京大学、香港中文大学、上海人工智能实验室、香港科技大学合作 开展 的研究系统性探索了这一重要问 题。 在LLM推理和2D文生图中,RL已经证明可以显著提升CoT推理能力和生成质量。但 3D物体更长、更稠密、更具几何约束 。 因此相关方向研究常面临这几个问题: 1. 奖励如何同时刻画语义对齐、几何一致性和视觉质量? 2. 现有RL算法是否适合自回归式 ...
机器人学习现状!Physical Intelligence内部员工分享(从数采到VLA再到RL)
具身智能之心· 2025-12-20 16:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多VLA与RL实战项目,欢迎加入国内首个工业级VLA实战课程 : 具身VLA实战与求职教程来啦~ 。 原文链接:https://vedder.io/misc/state_of_robot_learning_dec_2025.html 这次来学习一下 PI 内部人员写的 blog,介绍了很多 robot learning 的现状,而且都是一线的真正经验,很多在一线的同学应该深有感触,说了很多实话,质量很 高,值的精读和学习。不管是对 IL DAgger RL 的看法都是很一手的经验。 接下来请享受这份知识 基本上,目前(2025 年 12 月)所有机器人学习系统都是纯粹的行为克隆(BC,也称模仿学习)系统。人类提供(接近)最优的任务演示,机器学习模型则尝试模 仿这些动作。形式上,策略训练采用监督式方法——给定机器人的状态 (例如摄像头图像、机器人关节角度以及可能的任务描述文本),policy 预测已演示的动作 a 通常是一个动作片段(action chun ...
VLA工作正在呈现爆发式增长.......
具身智能之心· 2025-12-20 16:03
Core Viewpoint - The article discusses the rapid development and challenges of the VLA (Whole Body Visual Learning Algorithm) in the field of embodied intelligence, highlighting the importance of real data collection and the difficulties faced by newcomers in the field [2][3][4]. Group 1: VLA Development and Challenges - The VLA algorithm is experiencing explosive growth, with various frameworks and tools, such as reinforcement learning (RL), enhancing its performance [2]. - Data collection methods are diversifying, with millions of open-source data becoming available, indicating a potential for industrialization [2]. - Many practitioners express frustration with the challenges of tuning VLA models and the complexities of data collection, particularly for those new to the field [3][5]. Group 2: Data Collection and Training - Data collection methods for VLA primarily include imitation learning and reinforcement learning, with a focus on remote operation and VR for mechanical arms [13]. - Simulation and real-to-sim-to-real (real2sim2real) techniques are crucial for training VLA models, especially when real data is insufficient [14]. - Training techniques are critical, with many practitioners struggling to achieve good results due to the complexity of models like π0 and π0.5, which require high attention to detail [14][10]. Group 3: Model Deployment - After training, VLA models require optimization to reduce their parameter size for deployment, which is essential for edge computing applications [15]. - Techniques such as quantization and distillation are necessary to maintain performance while minimizing model size [15]. Group 4: Educational Initiatives - The article introduces a practical course aimed at helping individuals learn VLA effectively, covering hardware, data collection, algorithm deployment, and real-world experiments [17][20]. - The course is designed to save time and reduce the learning curve for newcomers, providing practical experience that can enhance resumes [18][31].
破解具身仿真瓶颈!地瓜机器人一键生成高保真3D桌面场景!
具身智能之心· 2025-12-20 16:03
编辑丨 RoboX 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 近年来,具身智能提出了新的仿真数据需求——既要求3D场景不仅要具有照片级的真实感,还要求场景中的每个实例都能在物理层面上进行交 互,以支持在仿真环境中训练机器人策略。 在这其中,核心的桌面场景(Tabletops)是此类环境的「最后一步」,也是大多数精细交互和复杂机器人操作任务的基础舞台。 因此,自动化、大规模地生成高保真、可交互的桌面场景,对于推进具身操作策略学习至关重要。 在此背景下,地瓜机器人联合中国科学院大学、地平线、 中科院自动化所等发布了今年的关键研发成果—— TabletopGen :一个统一的、无 需训练的桌面场景生成框架。 3D仿仍存严重不足 据介绍,现有的仿真方法仍存在严重不足: 1、文本驱动方法的局限性 : 例如Holodeck[1],它利用大语言模型(LLM)直接生成 3D 布局,或者通过生成场景图或空间约束,再进行 布局可行性的优化。 然而,这两类路径通常都只是从固 ...
首创ACE具身研发范式,大晓机器人构建具身智能开放新生态
具身智能之心· 2025-12-20 01:02
Core Viewpoint - The article discusses the launch of innovative technologies by DaXiao Robotics, including the ACE embodiment research paradigm, the open-source Kairos 3.0 world model, and the embodiment super brain module A1, aimed at creating a self-controlled, open, and win-win industrial ecosystem for embodied intelligence [1][3][25]. Group 1: ACE Research Paradigm and World Model - DaXiao Robotics has introduced the ACE embodiment research paradigm, which fundamentally innovates the research path for embodied intelligence by focusing on human-centric data collection and interaction with the physical world [12][14]. - The Kairos 3.0 world model enhances the value of real data, achieving a scale of over 100 million hours of data, which is crucial for the development of embodied intelligence [12][18]. - The ACE paradigm allows for the collection of over 10 million hours of data annually through environmental data collection techniques, which integrate various modalities such as visual, tactile, and auditory data [12][14]. Group 2: Technological Innovations and Applications - The embodiment super brain module A1 enables robots to operate in complex and dynamic environments without pre-collected high-precision maps, showcasing robust path generation capabilities [25][27]. - The A1 module can interpret natural language commands and execute tasks with high precision, making it suitable for various applications across multiple industries [27][29]. - DaXiao Robotics has established partnerships with leading chip manufacturers and cloud service providers to enhance the performance of its technologies and facilitate the rapid development of customized embodied intelligence products [23][32][34]. Group 3: Industry Collaboration and Ecosystem Development - The company emphasizes the importance of ecosystem collaboration, working with various partners across the supply chain, including hardware, chip, and cloud service providers, to create a comprehensive and self-controlled ecosystem for embodied intelligence [30][31]. - DaXiao Robotics aims to lead the scale development of China's embodied intelligence industry by continuously innovating technologies and fostering collaborative efforts with industry partners [31][33].
这个具身社区最近又更新了很多内容......
具身智能之心· 2025-12-20 01:02
最近在具身社区内分享了很多行业的内容,包括一些企业投融资、量产、产品设计、模型泛化、部署等。 ★ 融资上:下半年,除了一些明星公司外,本体零部件公司融资金额增大、公司数量增多; ★ 模型泛化上:基于RL的优化思路,使得模型逐渐泛化能力增强。相关的工具箱也逐渐完善,真机部署逐 渐便利。 ★ 部署上,地瓜机器人推出S600,助力边缘侧部署。thor开始应用在人形机器人、移动操作上。2000T以上 算力逐渐成参考配置...... ★ 量产上:多家公司的试点开始慢慢推,很多创业公司带着订单来融资,头部人形机器人开始探索工业级产 品的部署; ★ 产品设计上:本体上机械臂产品逐渐收敛,移动操作和人形还在结构和尺寸上创新,各家也都在压低成 本,供应链管理的能力很大程度上决定了后期的竞争力。头部具身公司,在积极参与投资零部件供应商。 一些多形态机器人,正在慢慢出现在各类场景中...... 最近社区内也在积极筹划研报,我们也很欢迎需要入门/进阶具身领域的同学加入我们的社区。近一年的搭 建,社区内已经完成了技术路线分享、直播、问答、求职、赛事等多个版块的分享。这里实现了产业、学术、 求职、问答交流等多个领域的闭环。我们致力于为行 ...
基于真实数据和物理仿真,国防科大开源具身在线装箱基准RoboBPP
具身智能之心· 2025-12-20 01:02
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在现代工业物流与机器人自动化中,三维装箱问题(3D-BPP)的 物理可行性 与 具身可执行性 是决定算法能否真正落地的关键因素。随着工业自动化水 平不断提高,「在线装箱」问题正受到越来越多关注。然而现有研究在问题设定、测试数据、评估指标等方面差异巨大,且不少先进算法尚未开源,导致研 究社区缺乏一个能够公平、系统评估算法性能与真实可用性的统一基准体系。 在真实硬件上直接评估成本高、周期长,因此仿真环境成为验证算法物理可行性的必然选择。但多数现有研究仍将 3D-BPP 理解为数学优化问题,仅强调 如「空间利用率」等紧凑度指标,而忽略重力、摩擦、碰撞等关键物理因素,使得算法一旦部署到现实场景便可能失效。 而具身可执行性最终要落脚到机器人与每一个箱体的交互,需要考虑机器人末端执行器是否可达目标位姿、是否存在机器人抓取箱体摆放过程的无碰撞运动 路径、是否满足机器人末端执行器抓取的约束等问题。 此外,许多 ...
别让vision拖累VLA中的action!
具身智能之心· 2025-12-20 01:02
Core Insights - The article discusses the challenges and advancements in Visual-Language-Action (VLA) models used in robotics, particularly focusing on the limitations of existing models that rely on low-dimensional sparse action signals to supervise high-dimensional dense visual inputs, which restricts overall performance [6][9]. Research Background - VLA models have shown significant progress but still face issues due to the mismatch between action supervision signals and visual inputs, leading to underutilization of the model's representation capabilities [6]. - The introduction of a visual prediction mechanism is proposed to enhance action generation by predicting future visual states, although high-dimensional visual states often contain redundant information that complicates the training process [8]. Proposed Solutions - Decoupled Visual Forecasting (DVF) is introduced to alleviate the burden on the backbone network by automatically capturing implicit actions and enhancing explicit action generation [7]. - A progressive pre-training approach is suggested to gradually integrate different modalities, introducing language supervision to retain the understanding and reasoning capabilities of the VLA backbone [7]. - Adaptive Temporal Ensemble (ATE) is proposed to dynamically adjust the integration strength during inference, reducing computational costs while maintaining action stability [14]. Architecture Design - The DVF method incorporates implicit action queries and a separate diffusion DVF head, allowing the model to focus on frame-to-frame differences rather than predicting complete future frames [10]. - A progressive training scheme is designed to introduce visual, language, and action information in phases to avoid competition between modalities and achieve stable optimization [10]. Experimental Analysis - Mantis, the proposed model, outperforms existing baseline methods in three out of four tasks on the LIBERO benchmark, achieving the highest average success rate of 96.7% [16][18]. - The convergence speed of Mantis is significantly faster compared to traditional visual prediction methods like UnifiedVLA [20]. - Experiments demonstrate the effectiveness of language supervision in retaining the backbone's capabilities, with Mantis outperforming in both in-domain and out-of-domain instruction tasks [20]. Team Introduction - The research team, SJTU Deng Lab, focuses on generative models and large language models, collaborating with renowned institutions and maintaining a strong research output in top-tier journals and conferences [23].
30亿美元,超越宇树和智元!这家具身公司刷新了人形机器人的最大估值.......
具身智能之心· 2025-12-19 03:00
点击下方 卡片 ,关注" 具身智能 之心 "公众号 重仓基座大模型,已在国内率先实现人形机器人在真实应用场景长期、全自主、规模化落地。宁德时代、丰 田、现代、上汽等与银河达成深度战略合作,落地场景多样化,极大可能率先入场。 估值30亿美元,这家具身公司刷新了人形机器人的最高估值....... 近日,银河通用机器人完成新一轮3亿美元融资(中东土豪也入局了),刷新了具身领域单轮融资新纪录,至 此银河已经完成约8亿美元融资。 知情人士透露,银河通用最新估值已达30亿美元。 几次融资,资本都抢着入局,但入场券不好拿: 2023年5月成立,王鹤迅速完成了种子轮融资。 2024年6月,银河通用完成了7亿元天使轮融资。 2024年11月,银河完成5亿元战略轮融资。 2025年6月,新一轮融资超11亿元(这一次宁德时代等入场)。 本轮的3亿美元融资,来自新加坡、中东的国际投资机构也加注! 为什么资本青睐? 智慧城市服务上,银河太空舱已在北京颐和园、王府井、成都春熙路等多个地点试运营。 仓储领域也在发力,现已部署到数十个零售仓内。除此之外,还有医疗等领域。 了解更多 产业咨询 更多产业信息交流,欢迎添加峰哥微信。 ★ 具身求职 ...
比LoRA更快更强,全新框架LoFA上线,秒级适配大模型
具身智能之心· 2025-12-19 00:05
Core Insights - The article discusses the limitations of existing visual generative models in meeting personalized user demands, particularly in generating precise outputs based on fine-grained instructions [5][6] - It introduces a new framework called LoFA, which allows for rapid adaptation of large models to personalized tasks without lengthy optimization processes, achieving results comparable to traditional methods [24] Group 1: Background and Challenges - The demand for creative media and visual content has led to the development of powerful visual generative models trained on large datasets, but these models struggle with specific user instructions [5][6] - Traditional methods like parameter-efficient fine-tuning (PEFT) require extensive optimization for each personalized task, making them impractical for real-time applications [6][10] Group 2: LoFA Framework - LoFA is designed to predict personalized LoRA parameters directly from diverse user instructions, enabling fast adaptation of visual generative models [8][10] - The framework incorporates a novel guiding mechanism within a hypernetwork to predict complete, uncompressed LoRA weights, avoiding information loss [11][12] Group 3: Methodology - The learning process in LoFA is divided into two phases: first predicting a simplified response map and then using this knowledge to guide the final LoRA weight prediction [10][11] - This structured approach allows the network to focus on key adaptation areas, enhancing stability and efficiency [11] Group 4: Experimental Analysis - The effectiveness of the LoFA framework was evaluated through systematic experiments in video and image generation tasks, demonstrating superior performance compared to baseline methods [13][14] - In video generation, LoFA was tested on personalized human action video generation and style transfer tasks, while in image generation, it focused on ID personalization [13][14] Group 5: Conclusion and Future Outlook - LoFA overcomes key limitations of existing personalization techniques by eliminating lengthy optimization processes and achieving comparable or superior performance to individually optimized models [24] - Future developments aim to create a unified hypernetwork capable of zero-shot learning for various specific instructions, expanding the framework's applicability [24]