具身智能之心 - filings, earnings calls, financial reports, news - Reportify

具身智能之心

Search documents

具身智能之心招募合伙人啦！课程共建/项目开发/咨询服务等

具身智能之心· 2025-10-02 10:04

转眼快到下半年了，总感觉今年的规划完不成了，和年初的预期差异较大，因为真的有太多事情值得去做了。回想当时从0开始写技术分享，到现在各个模块逐渐铺开完善。一个社区的运营，离不开大家的鼎力支持，具身智能之心期望能够在这场领域发展过程中贡献自己的力量，而不是仅仅局限于媒体的角色，致力于成为一个真的能给行业带来价值的平台。少数人的力量始终是有限的，我们真诚邀请那些对具身领域产生影响力的大佬。和我们一起在开源项目复现、咨询服务、课程研发、学科共建、硬件研发等多个方向展开合作。 11）课程开发合作内容和我们一起搭建能让更多初学者受益的课程，推动行业向前发展。包括C端、企业培训、高校学科建设。 22）硬件研发和我们一起搭建好用、性价比高的具身科研平台，让每个开发者都能用得起，每个初学者都能用得顺利。 44）咨询服务一起承接B端和C端在具身数据、本体、算法和部署等方面的咨询，助力产业升级转型、促进行业人才发展。在企业就职的同学也不用担心啦，我们将充分保护个人隐私。岗位要求我们期望您具备一定的领域工程经验，或具备博士及以上的title（手握顶会的大牛）。全职和兼职均可哦～待遇说明我们提供行业有竞 ...

具身科研平台

具身领域课程

具身领域咨询服务

具身领域开源项目

具身科研平台

具身领域课程

具身领域咨询服务

具身领域开源项目

斯坦福机器人新作！灵巧操作跟人学采茶做早餐，CoRL 2025提名最佳论文

具身智能之心· 2025-10-02 10:04

点击下方卡片，关注" 具身智能之心 "公众号编辑丨量子位 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。手把手教机器人，直接就能让它学到真本事！ △ 星动纪元灵巧手星动XHAND 1 △ 星动纪元灵巧手星动XHAND 1 来自斯坦福大学、哥伦比亚大学、摩根大通AI研究院、卡耐基梅隆大学、英伟达提出了一种数据采集与策略学习框架 DexUMI —— 利用人手作为自然接口将灵巧操作技能迁移至多种灵巧手。该框架通过硬件与软件的双重适配，最大限度缩小人手与各类灵巧手之间的具身差异。相信不少人看过去年他们发布的UMI，通过记录并学习人类操作，让夹爪类机器人学会洗碗，结果在行业引发不小的关注和轰动。除了效果惊艳，更深层的原因在于他们让夹爪的操作数据采集迅速便利化，行业多家厂商迅速跟进，推出了工业化数采产品。今年，他们将夹爪升级到更复杂更高自由度的灵巧手，让机器人学会更丰富更精细的操作任务，势必也将引发新一轮灵巧手数采革命。不管是采茶沏茶做早餐，这些精细活儿都「手」到擒来。还是灵巧手的那种。正 ...

Sim，Real还是World Model？具身智能数据的“困境”与解法

具身智能之心· 2025-10-01 12:48

更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。在具身智能的征途上，我们究竟该依赖仿真的效率，还是现实的真实数据，甚或期待世界模型改变游戏规则？随着物理仿真进入深水区，"仿真派"能否笑到最后？然而Physical Intelligence (PI)联合创始人、具身智能领域的先行者Sergey Levine始终坚称：替代数据是叉勺（叉子勺子二合一的产物，既不如勺子，也不如叉子），真实交互数据不可替代——这究竟是策略局限，还是数据本质的铁律？如今，Genie3携世界模型横空出世，能够从文本生成可交互的动态环境，甚至驱动在线规划。这是否意味着我们正站在"仿真"与"现实"二元对立终结的前夜？世界模型会成为数据问题的终极答案，还是仅仅换了一种形式的sim，并依然难逃Sim-to-Real gap的宿命？本场技术圆桌，我们邀请到国内Sim2Real领域四位杰出青年科学家—— 与他们四位共话前沿，从高保真3D资产构建、神经渲染的物理瓶颈、铰链体结构优化，到VLA模型的解耦设计等方面入手深入探讨：具身智能的数据之路，究竟通向仿真、现实，还是那个正在 ...

Sim-to-Real gap

Sim-to-Real gap

国人之光！CoRL2025最佳机器人论文出炉（北京通用人工智能研究院&宇树等）

具身智能之心· 2025-09-30 08:27

点击下方卡片，关注" 具身智能之心 "公众号编辑丨具身智能之心本文只做学术分享，如有侵权，联系删文 >> 点击进入→ 具身智能之心技术交流群 best student paper为加州大学伯克利分校团队的"Visual Imitation Enables Contextual Humanoid Control"，主要涉及跨具身智能体的运动控制。 0 es S Best Student Paper Award Visual Imitation Enables Contextual Humanoid Control and Andress Context One Dest Books Amore Mary Chang Moren, The Sund For Alber, Incon Wat, Agen Kession regul C @RL 2025 ROBOT LEARNING ak NEW STARTS 1/1 A U e 8 D A finalist一览： | 2025 CoRL 最佳机器人论文 Finalist | | | --- | --- | | Learning a Unified Po ...

纯血VLA综述来啦！从VLM到扩散，再到强化学习方案

具身智能之心· 2025-09-30 04:00

Core Insights - The article discusses the evolution and potential of Vision Language Action (VLA) models in robotics, emphasizing their integration of perception, language understanding, and action generation to enhance robotic capabilities [11][17]. Group 1: Introduction and Background - Robotics has traditionally relied on pre-programmed instructions and control strategies, limiting their adaptability in dynamic environments [2][11]. - The emergence of VLA models marks a significant advancement in embodied intelligence, combining visual perception, language understanding, and executable actions into a unified framework [11][12]. Group 2: VLA Methodologies - VLA methods are categorized into four paradigms: autoregressive, diffusion, reinforcement learning, and hybrid/specialized methods, each with unique strategies and mechanisms [8][10]. - The article highlights the importance of high-quality datasets and realistic simulation platforms for the development and evaluation of VLA models [16][18]. Group 3: Challenges and Future Directions - Key challenges identified include data limitations, reasoning speed, and safety concerns, which need to be addressed to advance VLA models and general robotics [10][17]. - Future research directions focus on enhancing the robustness and generalization of VLA models in real-world applications, emphasizing the need for efficient training paradigms and safety assessments [44][47].

视觉-语言-动作（VLA）模型

大语言模型（LLMs）

视觉语言模型（VLMs）

自回归范式

视觉-语言-动作（VLA）模型

大语言模型（LLMs）

视觉语言模型（VLMs）

自回归范式

产品和业务相似度极高，具身的内卷才刚刚开始......

具身智能之心· 2025-09-30 01:46

Core Viewpoint - The article highlights the increasing number of companies in the embodied intelligence sector in China, nearing 200, indicating a potential for market saturation and competition [1]. Group 1: Industry Landscape - The number of companies in the embodied intelligence field, including robotics and internet companies, is approaching 200, leading to high similarity in business and product offerings [1]. - Companies are adopting different strategies, with some focusing on integrating applications while others prioritize core research and development, aiming for long-term sustainability [1]. Group 2: Community and Support - The "Embodied Intelligence Knowledge Planet" community aims to create a large platform for both beginners and advanced learners in the field, providing job referrals and academic guidance [3]. - The community has established a closed loop across various domains, including industry, academia, and job exchanges, facilitating problem-solving and knowledge sharing [5]. Group 3: Educational Resources - The community has compiled over 30 technical routes for newcomers, significantly reducing the time needed for research and learning [6]. - Various resources, including open-source projects, datasets, and technical learning paths, are available to assist individuals at different stages of their careers [15][32]. Group 4: Networking and Collaboration - The community connects members with industry leaders and provides opportunities for collaboration through forums and live discussions on various topics related to embodied intelligence [6][21]. - Members can freely ask questions and receive guidance on career choices and research directions, fostering a supportive environment for professional growth [75].

具身智能之心知识星球

具身智能之心知识星球

邀请更多具身领域优秀创作者加入我们一起分享！

具身智能之心· 2025-09-30 01:46

具身智能之心是国内具身领域优秀的创作平台，致力于推动具身产业的发展、人才的培育。我们高度重视产业和学术领域的最新进展，持续创作最新内容。一个产业的发展需要众人不断地持续推进，具身智能之心诚邀学术界&工业界大佬加入我们一起创作，为全行业带来最专业和最有深度的工作，让更多人受益。主要创作内容最新技术/paper分享、核心技术模块讲解、行业类分析文章、深度的技术栈分享；联系我们提供一定的稿费支持和个人IP扶持，加入我们的圈子和我们一起共享行业资源。更多详细内容欢迎添加微信：oooops-life咨询。 ...

最后1个名额，即将开课！VLA方向1v6论文辅导来啦～

具身智能之心· 2025-09-30 01:46

最近有同学后台留言，刚开学导师跨行做具身，让自己先去摸索下，最好能产出论文和项目。没有基础最快能多久出论文？针对跨行或者新入门的同学，我们一直建议先把基础打好。然后找一些研究价值比较大的领域突破。特别是有一定的工作基础、数据基础的领域，如果完全不成熟，没有人同行后期科研的难度很大。从今年各个机器人与AI顶会来看，VLA及其相关衍生方向，占据了近一半的具身产出。特别是长程操作、泛化、少样本、VLA+RL、人形相关。如果有同学不知道怎么选择方向，可以多关注这个领域！具身智能之心最近也出品了一套1v6的科研辅导论文课程，也欢迎关注报名。那么VLA是什么？想象一下，如果能通过语言下达指令，并且丝滑执行任何你想要的动作，是一件多么幸福的事情！如果能长时间连续动作完成，将会非常方便。下面给大家介绍下VLA到底是啥？ VLA打破了传统方法的单任务局限，使得机器人能够在多样化的场景中自主决策，灵活应对未见过的环境，广泛应用于制造业、物流和家庭服务等领域。此外，VLA模型已成为研究热点，推动了多个前沿项目的发展，如pi0、RT-2、OpenVLA、QUAR-VLA和HumanVLA，这些研究促进了学术界与 ...

智能机器人

VLA科研辅导小班课

智能机器人

VLA科研辅导小班课

更为稳健，具备泛化！BumbleBee: 通用人形机器人全身控制范式

具身智能之心· 2025-09-29 02:08

点击下方卡片，关注" 具身智能之心 "公众号编辑丨具身智能之心专家学习 —— 首先在全数据上训练一个基础控制策略，作为专家模型的初始点。随后，针对聚类结果在各动作簇上分别微调，得到更具针对性的专家模型。接着，将专家模型部署到真实机器人上执行以采集轨迹，并基于这些轨迹为每个类别单独训练动作增量模型，再冻结增量模型对专家进行微调，实现对仿真与现实间偏差的补偿。通过迭代更新，专家模型在"更优策略—更高质量数据—更精准增量—再优化专家"的循环中逐步提升性能。本文只做学术分享，如有侵权，联系删文 >> 点击进入→ 具身智能之心技术交流群更多干货，欢迎加入国内首个具身智能全栈学习社区：具身智能之心知识星球 (戳我) ，这里包含所有你想要的。 BumbleBee 提出了一条完整的人形机器人全身控制训练流程。首先，利用 AMASS 数据集训练基础的全身控制模型；在此基础上，通过聚类区分不同类型的动作，并分别训练相应的专家控制模型；随后，将这些专家模型部署到真实机器人上，采集执行轨迹；基于采集的轨迹序列，为每个专家模型训练对应的动作增量模型（delta model），以缓解仿真与现实之间的差距（ ...

通用人形机器人全身控制范式

专家—通才训练范式

多模态自监督聚类

分簇的仿真 - 现实补偿

通用人形机器人全身控制范式

专家—通才训练范式

多模态自监督聚类

分簇的仿真 - 现实补偿

AnywhereVLA：在消费级硬件上实时运行VLA

具身智能之心· 2025-09-29 02:08

Core Background and Objectives - The current mobile operation technology is expanding from closed, structured work units to open, unstructured large indoor environments, requiring robots to explore unfamiliar and cluttered spaces, interact with diverse objects and humans, and respond to natural language commands for tasks such as home service, retail automation, and warehousing logistics [3] - AnywhereVLA proposes a modular architecture that integrates the robustness of classical navigation with the semantic understanding capabilities of VLA models to achieve language-driven pick-and-place tasks in unknown large indoor environments, capable of real-time operation on consumer-grade hardware [3] Review of Existing Solutions: Advantages and Limitations - VLA models and lightweight optimization strategies are discussed, highlighting their limitations in spatial perception and adaptability to large environments [4] - Existing solutions like MoManipVLA and SmolVLA show performance close to larger models while reducing resource requirements, but they lack spatial awareness for large environments [4] - The limitations of visual-language navigation (VLN) and classical navigation frameworks are outlined, emphasizing the need for improved language understanding and semantic reasoning capabilities [4] AnywhereVLA Architecture: Four Core Modules and Workflow - The AnywhereVLA architecture processes natural language commands through four modules to output low-level control instructions for driving base wheels and robotic arm joints [4] - The workflow includes language instruction parsing, guiding VLA operations, constructing 3D semantic maps, and executing operations based on the identified targets [7] VLA Model Fine-tuning and Hardware Platform - The SmolVLA model is fine-tuned to enhance its operational capabilities, with specific input data and key steps outlined for optimizing performance [13][15] - The HermesBot mobile operation platform is designed specifically for AnywhereVLA, balancing sensing and computational capabilities [16] Experimental Results: Performance and Effectiveness Validation - In an unknown multi-room laboratory setting, 50 pick-and-place tasks were executed, with a core success rate of 46%, and the fine-tuned SmolVLA operation module achieving an 85% success rate [17][22] - The performance metrics for various modules are provided, indicating robust SLAM performance and varying success rates for active environment exploration, navigation, object detection, and VLA manipulation [22] - Time efficiency metrics show that the average task completion time is under 133 seconds for a 5m exploration radius, meeting real-time scene requirements [23]

视觉-语言-动作（VLA）模型

视觉-语言导航（VLN）

视觉-语言-动作（VLA）模型

视觉-语言导航（VLN）