具身智能之心
Search documents
CMU团队等!机器人记忆新架构:物体中心状态建模,实现长时序操作!
具身智能之心· 2025-11-18 00:46
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Nhat Chung等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 针对机器人在非马尔可夫场景下缺乏对象级记忆的问题,阿肯色大学联合卡内基梅隆大学等研究团队提出LIBERO-Mem基准套件与Embodied-SlotSSM模型,通过 结构化对象记忆与时间序列建模,实现长时程、部分可观测环境下的稳健操作决策。 核心贡献 LIBERO-Mem基准:非马尔可夫机器人操作评估 设计目标 聚焦对象级部分可观测-非马尔可夫场景,通过引入对象身份、位置、关系历史的模糊性,强制模型依赖时间推理而非仅当前视觉信息。 核心特征 任务类型:包含四类任务(figure 1),覆盖不同记忆维度 现实机器人操作场景中,任务成功依赖对象交互历史(如"是否已操作过某个物体""物体之前的位置"),而非仅当前观测。 现有视觉-语言-动作模型多遵循马尔可夫假设,仅依赖即时感官输入,缺乏对象级记忆机制,在重复操作、视觉相似 ...
离了大谱,21%的ICLR 2026审稿意见竟是AI生成的?官方回应来了
具身智能之心· 2025-11-18 00:46
Core Insights - The article discusses the prevalence of AI-generated content in the review process for ICLR 2026, highlighting significant statistics regarding the extent of AI involvement in both submissions and reviews [2][11]. Group 1: AI Usage in Submissions - A total of 39% of submitted papers utilized AI in some capacity as a writing assistant, with a notable correlation between higher AI usage and lower average scores [8]. - The breakdown of AI content in submissions shows that 61% of papers had 0-10% AI content, averaging a score of 4.36, while only 1% of papers with 90-100% AI content averaged a score of 2.9 [9]. Group 2: AI Usage in Review Comments - The analysis revealed that 21% of review comments were fully generated by AI, with these comments averaging 0.3 points higher than those written by humans [11]. - Review comments generated by AI were found to be 26% longer than those written by humans, indicating a trend towards more verbose AI-generated feedback [11]. Group 3: Statistical Analysis by Pangram Labs - Pangram Labs conducted a detailed analysis of AI usage in the ICLR 2026 review process, employing advanced models to quantify AI involvement [5][10]. - The study found that the average confidence level for fully AI-generated reviews was slightly higher, although the difference was minimal and should be interpreted cautiously [18]. Group 4: Community Response and Official Actions - The ICLR 2026 organizing committee acknowledged the issue of low-quality reviews generated by AI and is considering appropriate measures to address it [25]. - Suggestions from the community included removing poor reviews and automatically flagging reviewers who fail to meet their responsibilities [26].
3DV 2026最新 | GaussianArt:清华智源通过高斯模型解决机器人操作仿真中关键问题
具身智能之心· 2025-11-17 10:01
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨 Licheng Shen等 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 导读 模拟铰链物体是计算机视觉与机器人操作仿真中的重要问题。现有方法通常采用两阶段流程——先建模物体不同状态,再推断关节运动——这不仅复杂化工作流 程,还限制了可扩展性。我们提出GaussianArt单阶段训练框架,通过关节式3D高斯模型统一运动与外观建模。本方法支持最多20个部件的复杂物体,并集成鲁 棒部件分割模块以精确分解关节级运动。相较于仅在19个物体上评估的先前的研究,我们通过90个铰接物体进行了大规模扩展评估,涵盖广泛的运动组合与几何 形态。GaussianArt在几何建模、视觉重建和运动估计方面均达到当前最佳水平,并能支持操作仿真等下游应用。本工作于近期被三维计算机视觉领域的垂类学 术会议3DV 2026正式接收。 论文链接 : https://arxiv.org/abs/2508.14891 代码仓 ...
具身智能之心招募VLA+RL方向的合作伙伴~
具身智能之心· 2025-11-17 10:01
Group 1 - The article discusses the recruitment of a lecturer for an online course focused on VLA (Variational Learning Algorithms) and RL (Reinforcement Learning) [1][2] - The ideal candidate should have a PhD or higher in the academic field, or practical experience in the industry, particularly with real machine debugging [2] - The community, known as "Embodied Intelligence Heart," is the first full-stack technology exchange platform in China, gathering many individuals interested in VLA and RL [3] Group 2 - The company offers compensation above the industry average along with abundant industry resources for the recruited lecturer [4] - For more detailed information, interested individuals are encouraged to add a specified WeChat contact for consultation [5]
具身界影响力最大的两位博士创业了!
具身智能之心· 2025-11-17 04:00
Core Insights - The article highlights the entrepreneurial ventures of two influential figures in the field of embodied intelligence, Tony Z. Zhao and Cheng Chi, who have recently co-founded a company named Sunday Robotics [2][4]. Group 1: Key Individuals - Tony Z. Zhao is a dropout PhD student from Stanford University, known for his contributions to ALOHA, ALOHA2, and Mobile ALOHA during his academic tenure [4][5]. - Cheng Chi, a PhD from Columbia University and a student of Shuran Song at Stanford, is recognized for his work on Universal Manipulation Interface (UMI) and Diffusion Policy, the latter being a finalist for Best Systems Paper at RSS 2024 [10]. Group 2: Company Overview - Sunday Robotics is the new venture launched by Tony Z. Zhao and Cheng Chi, indicating a significant step in the development of embodied intelligence technologies [2].
登上Science Robotic!一天学习1000个任务,内燃机的风还是吹到了机器人
具身智能之心· 2025-11-17 00:47
点击下方 卡片 ,关注" 具身智能 之心 "公众号 作者丨编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 在机器人操作领域,"高效学习" 始终是核心难题——现有模仿学习方法往往需要数百甚至数千次演示才能掌握单个任务,规模化扩展到千种日常任务更是需要 海量数据与资源。而由帝国理工学院机器人学习实验室提出的 Multi-Task Trajectory Transfer(MT3) ,用 "轨迹分解为对齐 - 交互两阶段 + 检索式泛化" 的创 新思路,打破了这一困局:仅需单条演示即可教会机器人完成单个任务,在不到 24 小时的人类演示时间内,成功掌握 1000 种不同的日常操作任务,同时还能泛 化到全新物体实例,彻底革新了机器人模仿学习的效率天花板。 对齐阶段:解决 "去哪里操作" 的定位问题 为什么要重构机器人模仿学习的范式? 当前主流的机器人模仿学习方案陷入了 "数据效率困境":要么依赖单阶段整体策略,学习过程复杂且数据需求大;要么泛化能力弱,无法 ...
3D视觉被过度设计?字节Depth Anything 3来了,谢赛宁点赞
具身智能之心· 2025-11-17 00:47
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 现在,只需要一个简单的、用深度光线表示训练的 Transformer 就行了。 这项研究证明了,如今大多数 3D 视觉研究都存在过度设计的问题。 本周五,AI 社区最热门的话题是一篇新论文,有关 3D 建模的。 经过一年多的探索,来自字节跳动的团队推出了 Depth Anything 3(DA3),将单目深度估计扩展到了任何视角场景,让计算机实现了媲美人类的空间感知。 论文:https://arxiv.org/abs/2511.10647 项目页面:https://depth-anything-3.github.io 为了追求最小建模,DA3 的工作获得了 两个关键见解 : 就是这样的方法, 在姿态估计方面比当前业界最先进的方法 (SOTA) 提升了 44%,在几何估计方面提升了 25%。 原来 3D 视觉竟然这么简单? 纽约大学计算机科学助理教授、知名 AI 学者谢赛宁表示,论文有点像电影: ...
4个旷视天才具身创业获投近10亿,阿里独家很瞩目
具身智能之心· 2025-11-17 00:47
Core Insights - The article highlights the rapid growth and investment in the embodied intelligence sector, particularly focusing on the company Dexmal, which has raised nearly 1 billion RMB in funding [2][6]. Group 1: Company Overview - Dexmal, established in March 2025, specializes in the research and development of embodied intelligence hardware and software technologies [10]. - The company's mission is to create intelligent, useful, and trustworthy robots to enhance quality of life [11]. - The founding team consists of members with extensive backgrounds in AI and practical experience in scaling AI-native products, primarily from Megvii Technology [12][13]. Group 2: Recent Funding and Investment - Dexmal recently completed a significant A+ round of financing, with Alibaba as the exclusive investor, marking a notable endorsement from a major player in the tech industry [3][5]. - In just over two months, Dexmal has secured nearly 1 billion RMB across three funding rounds, indicating strong investor confidence and market interest [6][9]. Group 3: Technological Advancements - Dexmal has published over ten papers in top conferences related to AI and embodied intelligence, showcasing its research capabilities [16]. - The company has developed two frameworks, Real-time VLA and MemoryVLA, aimed at optimizing robot performance for real-time and long-duration tasks [17]. - Dexmal has also launched an open-source VLA toolbox called Dexbotic, designed to facilitate research in embodied intelligence by addressing issues of fragmentation in research ecosystems [20][22]. Group 4: Product Development - Alongside software, Dexmal has introduced a hardware product, DOS-W1, which serves as a modular and open experimental platform for data collection and robotics research [23][25]. - The DOS-W1 is designed to lower the barriers to entry for research in embodied intelligence while enhancing data collection efficiency [28]. Group 5: Competitive Edge and Achievements - Dexmal's team has achieved significant recognition in the robotics field, winning gold medals in global competitions such as the ICRA 2025 and CVPR 2025 [33][34]. - The team's past experiences and accolades serve as a quality assurance for their products, positioning Dexmal as a formidable player in the embodied intelligence landscape [35]. Group 6: Founding Team Background - The founding team includes notable figures such as Tang Wenbin, Fan Haoqiang, Zhou Yuzhen, and Wang Tiancai, all of whom have extensive experience in AI and have previously contributed to major advancements in the field [36][78]. - Their collective expertise and past successes in AI competitions and research provide a strong foundation for Dexmal's future endeavors in embodied intelligence [78].
性能超越GPT和Google,北京人形机器人创新中心开源全球最强具身VLM
具身智能之心· 2025-11-17 00:47
作者丨 咖啡不加糖 编辑丨 焉知机器人 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 2025 年 11 月 14 日,北京具身智能机器人创新中心正式发布 Pelican-VL 1.0 具身视觉语言模型( VLM ),不仅宣称性能超越 GPT-5 同类模型 和 Google Gemini 系列,更以 " 全球最大规模开源具身多模态大模型 " 的身份,展示了中国在具身智能领域的技术硬实力。 具身智能,简单来说就是让机器人像人类一样感知世界、做出决策并执行动作的技术,而视觉语言模型( VLM )相当于机器人的 " 眼睛 " 和 " 大脑中 枢 " ,负责把看到的图像信息转化为可理解的语言指令,再规划出具体的行动步骤。 图 Pelican-VL 1.0 (中文是塘鹅或者鹈鹕的意思)在抱脸虫和魔搭都可下载 Pelican-VL 1.0 称为 " 视觉语言大脑 " ,它 的开源有力推动了 具身 智能技术的进步 。 一、北京人形机器人创新中心和 Pelican-VL ...
微软&港科对比多种迁移技术!VLA 到底如何有效地继承 VLM 中丰富的视觉-语义先验?
具身智能之心· 2025-11-15 16:03
Core Insights - The article discusses the introduction of the GrinningFace benchmark, which aims to address the challenges in knowledge transfer from Visual Language Models (VLM) to Visual Language Action Models (VLA) by using emoji-based tasks as a testing ground [1][2][4]. Group 1: Challenges in VLA Training - VLA training relies heavily on VLM initialization but faces three main challenges: unclear transfer effects, the risk of catastrophic forgetting, and lack of standardized comparison for different transfer techniques [2][4]. - Existing datasets have low overlap with VLM pre-training data, making it difficult to isolate contributions from "robotic action skills" and "VLM prior knowledge" [2]. Group 2: GrinningFace Benchmark Design - The GrinningFace benchmark uses emojis as a bridge to separate action execution from semantic recognition, allowing for precise measurement of knowledge transfer effects [4][5]. - The benchmark includes a standardized task where a robotic arm must place a cube on an emoji card based on language instructions [4]. Group 3: Evaluation Metrics - The evaluation framework consists of two core metrics: execution success rate (SR) and recognition SR, which quantify the robot's ability to perform actions and recognize semantic cues, respectively [5][8]. - The study found that different fine-tuning strategies have varying impacts on knowledge transfer, with a focus on retaining VLM prior knowledge while adapting to specific tasks [5][11]. Group 4: Key Findings on Transfer Techniques - The research highlights that co-training, latent action prediction, and diverse pre-training data are critical for effective knowledge transfer [7][19]. - The balance between retaining VLM prior knowledge and adapting robotic actions is identified as a core principle in VLA design [19]. Group 5: Future Directions - Future work should focus on optimizing parameter-efficient fine-tuning techniques, enhancing knowledge transfer efficiency, and designing complex tasks that reflect real-world applications [19]. - Exploring multimodal prior fusion, including tactile and auditory information, could improve VLA's adaptability to various environments [19].