Workflow
具身智能之心
icon
Search documents
智源发布具身数据创新基座,携手行业共筑物理AGI基础设施
具身智能之心· 2025-12-03 03:47
Core Insights - The article discusses the launch of the RoboXstudio platform and the RoboCOIN dataset by the Beijing Zhiyuan Artificial Intelligence Research Institute, aimed at addressing challenges in embodied data production and enhancing research efficiency in embodied intelligence [6][19]. Group 1: Challenges in Embodied Data - Embodied data faces three main challenges: data silos, lack of quality control, and high costs [7][8]. - Data silos arise from non-standardized formats and isolated data tools, complicating data processing [7]. - Quality control issues include frame loss, stuttering, and timestamp misalignment, leading to unreliable data records [8]. - The cost of generating embodied data remains high due to reliance on manual operations and the absence of mature platforms for scalability [8]. Group 2: CoRobot Software Framework - The CoRobot framework was developed to standardize operations, improve quality, and enhance efficiency in embodied data management [10]. - It consists of five components: data collection tools, format conversion tools, data processing tools, data management tools, and model training tools [10]. Group 3: RoboCOIN Dataset - The RoboCOIN dataset is a collaboration involving multiple companies and universities, designed to be the global benchmark for dual-arm robot data [14][16]. - It features the largest number of dual-arm entities, with 180,000 data entries covering over ten scenarios, including industrial and retail applications [16]. - The dataset is noted for its fine-grained labeling and ease of use, facilitated by the CoRobot framework [16]. Group 4: RoboXstudio Platform - The RoboXstudio platform aims to streamline the entire process of data collection, annotation, management, model training, evaluation, and deployment [19][22]. - It supports diverse robot types and tasks, ensuring comprehensive data collection without gaps [22]. - The platform integrates open-source frameworks and multimodal models to reduce operational costs and enhance user accessibility [22]. Group 5: Open Source and Collaboration - The Zhiyuan Institute emphasizes the importance of collaborative innovation in advancing artificial intelligence, with a significant number of downloads of their open-source models [23]. - The RoboCOIN dataset and CoRobot framework are made available to the public to foster industry-wide collaboration and innovation [23][25].
VLM也能「自我进化」!RL自我进化框架VisPlay突破视觉推理难题
具身智能之心· 2025-12-02 09:30
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 最新研究 VisPlay 首次提出了一个自进化强化学习框架,使 VLM 能够仅通过海量的未标注图像数据进行自我演化和能力提升。 VisPlay 将基础 VLM 分解为「提问者」和「推理者」两大角色,通过迭代的自我进化机制协同进化,并结合 GRPO 算法和创新的多样性/难度奖励,平衡 了问题的复杂度和答案的质量。 Title:VisPlay: Self-Evolving Vision-Language Models from Images 实验证明,VisPlay 在 Qwen2.5-VL 和 MiMo-VL 等主流模型上实现了持续的性能提升,尤其在视觉推理、组合泛化和幻觉减少方面效果显著,展示了一 条可扩展、低成本的多模态智能进化新路径。 引言: >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 在 Vision-Language Model 领域,提升其复杂推理能力通常依赖于耗费巨大的人工标注数据或启发式奖励。这不仅成本高昂,且难以规模化。 ...
上交&ai lab团队联合提出MM-ACT:一个统一的VLA模型实现感知-规划-执行的高效协同
具身智能之心· 2025-12-02 09:30
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨 具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在机器人操作领域,"通用性" 与 "高效性" 的平衡始终是核心挑战——现有方案要么缺乏动态建模能力,难以应对复杂环境交互;要么推理速度慢,无法满足实时 控制需求。 上海 AI 实验室、上海交通大学等团队联合提出的 MM-ACT ,以 "统一多模态表征 + 并行解码架构" 为核心,创新引入 "上下文共享多模态学习" 范式,实现了文 本、图像、动作的协同生成,既具备精准的语义理解与环境预测能力,又能高效输出执行动作,在模拟与真实场景中均展现出超越现有方案的综合性能。 为什么需要重构视觉 - 语言 - 动作(VLA)模型架构? 当前 VLA 模型陷入 "三重矛盾":语义理解与动态建模难以兼顾、多模态生成效率低下、训练目标存在错位,核心问题可归结为 "无法在统一框架内实现'感知 - 规 划 - 执行'的高效协同": | 方案类型 | 代表思路 | | 核 ...
清华成立具身智能与机器人研究院
具身智能之心· 2025-12-02 09:30
编辑丨 量子位 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 具身智能,火得有些过分, 就在昨天,清华大学宣布成立 具身智能与机器人 研究院。 这是今年3月底,清华开设 具身智能系统北京市重点实验室 以来,围绕具身智能的又一次大动作。 这并非孤例——清华的布局,恰好踩中了国内高校整体加快具身智能布局的节点上。 这一年,国内高校几乎清一色地围着 "具身智能" 出牌: 从研究院、实验室、研究中心到本科专业;从密集的demo演示,到一场场揭牌仪式、行业对谈、学术论坛与专题会议…… 毫不夸张地说,具身智能硬是用一年时间,走完了大模型三年的发展路程。 清华成立具身智能与机器人研究院 11月30日,清华大学 具身智能与机器人研究院 正式揭牌成立。 清华大学校长 李路明 院士等校领导出席揭牌仪式,新晋中国科学院院士、清华大学 刘云浩 教授也在现场发表致辞。 据悉,新研究院由清华大学自动化系主任、信息科学技术学院副院长 张涛 教授出任院长。 智能技术与系统国家重点实验室副 ...
担心买得起机械臂,玩不转代码?小白友好,你的第一台科研机械臂
具身智能之心· 2025-12-02 09:30
重构你的时间:从"硬件调试"回归"算法思考" "买了机械臂,却卡在调试第一步"?要么工具链残缺,要么仿真和真机脱节,遇到问题没人管 —— 实战哪 经得起这么折腾! 没有 Imeta-Y1 之前,你的具身科研实战可能是: 70%的时间在调试硬件通信和校准传感器;在仿真和真机之间手动移植代码,痛苦适配;算法迭代一次, 需要数天才能看到真实效果。 而拥有 Imeta-Y1 之后,你的工作流将变为: 在Gazebo中快速仿真验证算法逻辑;一键将验证好的程序部署至真机,进行精细化调优;利用全流程工具 链,高效完成从"灵感"到"物理动作"的多次迭代。 我们重新定义了"轻量级":不仅是物理结构的轻巧,更是你研发负担的"轻盈"。 面向具身科研领域打造的轻量级高性价比机械臂 还在为具身智能领域的硬件选择发愁吗? 太贵的机械臂买不起,太便宜的又难用、难上手? 别担心,Imeta-Y1 来了——这是一款专为新手和科研初学者设计的轻量级高性价比机械臂。 无论你是学生、教育工作者,还是刚踏入机器人领域的开发者,Imeta-Y1 都能帮你低成本、高效率地完成 算法验证与项目开发。 对小白尤其友好的是: ✅ 提供全流程开源工具链+代码示例 ...
竟速机器人“母港”! 2026具身智能首展,3月杭州集结!
具身智能之心· 2025-12-02 03:03
Core Insights - The core viewpoint of the article emphasizes the rapid growth and potential of the embodied intelligence market in China, predicting a market size of nearly 5.3 billion yuan by 2025 and over 20 billion yuan by 2026, with a global market projection of 870 billion yuan by 2030 [3][5]. Market Potential - The embodied intelligence market in China is expected to approach 5.3 billion yuan by 2025, accounting for over 25% of the global market share [3]. - By 2026, the market is projected to exceed 20 billion yuan, indicating a significant growth trajectory [3]. Industry Ecosystem - Hangzhou is identified as a "global innovation mother port," housing over 700 key enterprises across the entire supply chain from research and development to manufacturing [4][6]. - The city benefits from a complete industrial ecosystem, innovative resource aggregation, and proactive policy legislation, positioning it as a leader in the embodied intelligence sector [6]. Upcoming Events - The 2026 Third China Embodied Intelligent Robot Industry Conference and Exhibition will take place from March 11-13, 2026, at the Hangzhou International Expo Center, featuring over 500 exhibitors and 30,000 professional attendees [8][9]. - The event aims to create a comprehensive industrial ecosystem that integrates conferences, exhibitions, technology, and trends [8][14]. Exhibition Focus - The exhibition will cover a wide range of products, including complete embodied intelligent robots, power systems, industrial robots, control and computing systems, and various application solutions [16][20]. - It aims to connect all segments of the supply chain, facilitating high-quality supply and application of innovative technologies [14]. Awards and Recognition - The 2026 China Embodied Intelligence Industry Annual Awards Ceremony will recognize significant achievements and benchmark forces in the sector, with awards for categories such as "Top Ten Outstanding Complete Machine Brands" and "Top Ten Innovative Enterprises" [38][46]. - The awards will serve as a catalyst for industry development and a gathering point for key stakeholders [46]. Industry Collaboration - The conference will feature discussions on technological breakthroughs and practical applications across various industries, fostering collaboration between academia, industry leaders, and investors [29][52]. - It aims to provide a comprehensive understanding of the future landscape of the embodied intelligence sector, enabling participants to seize opportunities in a changing environment [29].
IPO辅导收官!A股首个人形机器人正式开启冲刺
具身智能之心· 2025-12-02 03:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 自2020年起,宇树科技已步入盈利轨道,未出现持续亏损或业绩剧烈波动等影响财务健康的突出问题。其研 发支出的资本化处理及收入确认等关键会计操作均严格遵循相关会计准则,财务数据真实准确,无需进行额 外的追溯调整。 商业模式明确聚焦: 公司核心业务围绕四足机器人展开,已形成清晰的客户群体划分——涵盖B端工业应用与C端消费市场,并建 立了稳定且可持续的收入流。同时,人形机器人业务正处于积极的研发推进及小批量试产阶段。整体而言, 宇树的业务布局集中,边界清晰,不存在盲目跨界经营或业务多元化导致的资源分散问题。 写在最后 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 2025年11月29日,宇树向浙江证监局提交了更新后的IPO辅导进展报告,其辅导状态正式转为"辅导工作完 成"。这标志着宇树科技已成功通过中国证监会关于A股上市的前期合规审查,即将迈出提交招股说明书的关 键一步,有望成为"A股人形机器人第 ...
转具身最好的机会在昨天,其次是现在...
具身智能之心· 2025-12-01 10:00
Openarm是一款双臂任务框架,目前有几家公司开始生产相关本体,缺乏移动能力,一些叠衣服、pick and place也都能满足。但从数据采集来看,VR版本更舒服。 XLerobot存在一定的移动能力,但不多,适合一些入门科研&个人开发使用,可以适配移动操作的一些任 务。 最近在为大家收敛具身科研的几个重点模块:行业内容、本体形态、算法、还有部署的一些方案,已经汇 总在我们的社区内部。 目前为大家梳理了行业正在从事具身大脑、本体研发的公司(突然发现本体也卷不太动了......),以及一些 比较活跃的具身实验室。方便大家判断和升学,除此之外,还有很多行业的研报,供大家判断具身的发展 与周期。 本体方面,推荐几款适合科研的产品:SO-100系列、openarm系列、XLerobot系列等; SO100及升级版本,能上一些VA和VLA的算法,常见功能可以实现了; 其它开发平台,成本较高,需要一定的资金投入,可以参考方舟无限、星海图、宇树的几款本体。 算法层面,目前我们收拢了关于vla(训练、无需训练方式、vla+RL、vla+世界模型、vla轻量化、部署 等)、vln(时间语言、目标导航、点导航等)、运控(强化、 ...
港理&清华等首个具身程序性综述:让机器人从第一人称视角学习步骤、纠错与问答
具身智能之心· 2025-12-01 10:00
Core Viewpoint - The article presents a comprehensive overview of the concept of an Egocentric Procedural AI Assistant (EgoProceAssist), which aims to assist individuals in performing daily procedural tasks from a first-person perspective. It identifies three core technical tasks necessary for this assistant: Egocentric Procedural Error Detection, Egocentric Procedural Learning, and Egocentric Procedural Question Answering [6][32]. Summary by Sections Motivation - The article emphasizes the prevalence of procedural tasks in daily life, which require a specific sequence of steps to achieve desired outcomes. It highlights the potential of an AI assistant to enhance safety and efficiency in performing these tasks, especially in high-risk scenarios [6][8]. New Classification System - A novel classification system is introduced, categorizing the three core tasks of the AI assistant and summarizing existing methods, datasets, and evaluation metrics relevant to each task [2][6]. Egocentric Procedural Error Detection - This section outlines the existing key technologies for detecting procedural errors from a first-person perspective. It differentiates between methods that require only video data and those that utilize multimodal data, emphasizing the unique challenges of procedural error detection compared to general anomaly detection [9][11][12]. Egocentric Procedural Learning - The article discusses various approaches to procedural learning, categorized by supervision levels: unsupervised, weakly supervised, and self-supervised methods. It highlights the importance of identifying key steps in procedural tasks to improve error detection and planning capabilities [14][16]. Egocentric Procedural Question Answering - This section summarizes current technologies for answering procedural questions from a first-person perspective, noting the challenges posed by occlusions and scene changes. It emphasizes the need for models to possess strong understanding and memory capabilities to effectively respond to user queries [17][20]. Supplementary Experiments - The article presents supplementary experiments that evaluate the performance of existing VLMs and AI agents in procedural error detection and learning tasks. The results indicate significant limitations in their ability to assist with first-person procedural tasks [23][25]. Challenges - The article identifies several challenges in developing the EgoProceAssist, including data scarcity, limited understanding of long-term procedural activities, and heavy reliance on manual annotations, which hinder real-time assistance capabilities [29][30][31]. Conclusion - The research concludes by reiterating the significance of the proposed AI assistant and its core tasks, while also addressing the ongoing challenges and limitations in the field. It aims to provide a foundation for future research directions in egocentric AI applications [32].
带硬件!最全的VLA实战教程来啦
具身智能之心· 2025-12-01 03:12
Core Viewpoint - The article discusses the challenges and advancements in the VLA (Variable Learning Algorithm) field, emphasizing the importance of real machine data collection and the complexities involved in training and deploying VLA models. Group 1: Data Collection - Real machine data collection is crucial for VLA models, with methods including remote operation, VR, and full-body motion capture [2][8] - The effectiveness of data collection methods and ensuring high-quality data are significant challenges, particularly in the context of real-to-sim-to-real transitions [8] Group 2: VLA Training - Training VLA models typically requires simulation debugging before real machine deployment, especially when real machine data is insufficient [10] - Techniques for fine-tuning models and achieving good results with small data sets are critical, as many students struggle with training models effectively [10] Group 3: VLA Model Deployment - After training, VLA models often require "slimming" due to their large parameter sizes, which poses challenges for deployment on edge chips [12] - Lightweight operations such as quantization and distillation are essential to minimize parameter size while maintaining performance [12] Group 4: Educational Initiatives - The article introduces a practical course aimed at helping students effectively learn about VLA, covering hardware, data collection, algorithms, and deployment [14][16] - The course is designed for various audiences, including those seeking jobs in the field, beginners looking to advance, and researchers in embodied intelligence [27]