Workflow
具身智能之心
icon
Search documents
如何做到的?20分钟机器人真机数据,即可跨本体泛化双臂任务
具身智能之心· 2025-08-11 00:14
Core Insights - Vidar represents a significant breakthrough in the field of embodied intelligence, being the first global model to transfer video understanding capabilities to physical decision-making systems [2] - The model innovatively constructs a multi-view video prediction framework that supports collaborative tasks for dual-arm robots, demonstrating state-of-the-art performance while exhibiting significant few-shot learning advantages [2] - The model requires only 20 minutes of real robot data to generalize quickly to new robot bodies, significantly reducing the data requirements compared to industry-leading models [2][6] Group 1 - Vidar is based on a general video model and achieves systematic migration of video understanding capabilities [2] - The model's data requirement is approximately one-eighth of the leading RDT model and one-thousand-two-hundredth of π0.5, greatly lowering the barrier for large-scale generalization in robotics [2] - After fine-tuning, the model can perform multi-view dual-arm tasks effectively, executing commands as instructed [2] Group 2 - The Tsinghua University team proposed a new paradigm to address challenges in embodied intelligence, breaking down tasks into "prediction + execution" [6] - This approach utilizes visual generative models like Vidar to learn target predictions from vast amounts of internet video, while employing task-agnostic inverse dynamics models like Anypos for action execution [6] - The method significantly reduces the dependency on large-scale paired action-instruction data, requiring only 20 minutes of task data to achieve high generalization [6] Group 3 - The presentation includes an overview and demonstration video, discussing the rationale for utilizing video modalities and considering embodied video base models [8] - It covers the training of Vidar and the concept of task-agnostic actions with AnyPos [8] - The speaker, Hengkai Tan, is a PhD student at Tsinghua University, focusing on the integration of embodied large models and multi-modal large models [11]
推荐几个具身智能与机器人私房菜!
具身智能之心· 2025-08-10 06:54
Core Viewpoint - The furniture and autonomous driving industries are experiencing significant growth in production, financing, and recruitment, with a strong emphasis on practical technology and skilled talent acquisition [1][2]. Group 1: Industry Trends - The autonomous driving sector is seeing a surge in companies scaling up production and hiring, indicating a competitive job market where securing positions is challenging due to high skill requirements [1]. - The emergence of high-level autonomous driving demonstration zones, such as in Beijing, is fostering innovation in policy, technology, and commercialization [1]. Group 2: Learning and Community Resources - Several influential communities focused on embodied intelligence, autonomous driving, computer vision, and AI are recommended for systematic learning and skill enhancement [1]. - The "Automatic Driving Heart" community is the largest developer community in China, focusing on various technical aspects of autonomous driving, attracting significant attention from industry professionals [2]. - The "Computer Vision Research Institute" shares the latest research and practical applications in AI, emphasizing technology research and implementation [5]. - The "Embodied Intelligence Heart" community is the first full-stack technical exchange platform in China, covering a wide range of topics related to embodied intelligence [8].
Astribot Suite:面向多样化真实环境、聚焦全身操作的框架
具身智能之心· 2025-08-09 00:48
Core Viewpoint - The article discusses the development of a comprehensive robotic learning suite, Astribot Suite, aimed at enabling robots to perform a wide range of daily tasks through human-like interaction and learning from the environment [3][4]. Group 1: Challenges in Robotic Control - Achieving full-body autonomous control in robots faces three main challenges: designing safe and capable hardware, developing intuitive data collection systems, and creating efficient algorithms for learning from human demonstrations [6]. - A unified framework is proposed to address these challenges, consisting of a high-performance robot platform, a full-body teleoperation system, and a full-body visual-motor strategy [6]. Group 2: High-Performance Robot Platform - The robot platform is designed to be high-performance, durable, and capable of safe mobile operations, utilizing an innovative rope-driven design that mimics human muscle tissue for precise movement and force application [7]. - The design features a lightweight structure, low friction transmission, and soft cushioning, enabling high-resolution force control essential for AI-driven tasks [7]. Group 3: Full-Body Teleoperation - An intuitive and cost-effective teleoperation system is introduced, consisting of a VR headset and handheld joystick, allowing non-experts to efficiently collect data for various tasks [9]. - The system supports first-person and third-person control modes, optimized for different types of tasks with low transmission latency [9]. Group 4: Full-Body Motion Operation Model (DuoCore-WB) - DuoCore-WB is a simple yet effective imitation learning algorithm designed to simulate full-body actions, emphasizing RGB-based visual perception and real-time trajectory generation [10][12]. - The model demonstrates an average success rate of 80% across various tasks, with a peak success rate of 100%, indicating its effectiveness in real-world applications [12]. Group 5: Evaluation of Astribot Suite - Astribot Suite was evaluated on six representative real-world tasks, including delivering drinks, storing cat food, throwing away trash, organizing shoes, throwing toys, and picking up toys, showcasing its capabilities in complex coordination and dynamic stability [12][23]. - The success rates for these tasks varied, with detailed performance metrics provided for each subtask, highlighting the system's robustness and adaptability [23]. Group 6: Key Findings on Motion Representation - The use of end-effector (EE) space action representation reduces error accumulation and enhances task performance compared to joint space representation [25]. - Incremental action representation improves trajectory smoothness and execution stability, particularly in high-frequency control scenarios [25]. - The relative trajectory representation based on the end-effector self-coordinate system enhances visual-action alignment and generalization capabilities [28].
AI眼镜“隔空取物”,戴上即可随心选中现实世界任意物体
具身智能之心· 2025-08-09 00:48
Core Viewpoint - The article discusses the introduction of a new technology called Reality Proxy, which enhances human-computer interaction by allowing users to seamlessly select and manipulate real-world objects through a mixed reality interface, overcoming limitations of traditional XR devices [10][13][14]. Group 1: Technology Overview - Reality Proxy is a digital representation of real-world objects that allows users to interact with them without being hindered by physical constraints such as distance or size [14][16]. - The interaction process involves three main steps: activating the proxy, generating the proxy, and interacting with the proxy [17][19][24]. - The system captures the semantic structure of the environment and creates proxies that maintain spatial relationships, allowing for intuitive manipulation [20][22]. Group 2: Interaction Features - Users can browse object previews, select multiple objects, filter objects by attributes, and utilize physical features for interaction [30][31][32][34]. - The technology supports semantic grouping and custom grouping, enabling users to organize and manipulate objects efficiently [36][40]. Group 3: Practical Applications - Reality Proxy can be applied in various scenarios, such as quickly locating specific books in an office or interacting with kitchen appliances [41][43]. - It facilitates efficient navigation and interaction in large buildings and allows for dynamic control of real-world objects like drones [45][47]. Group 4: User Feedback and Evaluation - Participants in the study found Reality Proxy to be practical and effective in addressing interaction challenges with distant or hard-to-reach objects [53]. - The system was praised for its speed and reduced physical fatigue, although some users noted a learning curve and the need for improved accuracy in proxy positioning [54][55].
具身智能之心运营实习生招募来啦!合伙人1v1培养
具身智能之心· 2025-08-09 00:48
Group 1 - The company aims to connect academia and industry through technical content, focusing on cutting-edge AI fields such as autonomous driving, embodied intelligence, and large models [1] - The team has established deep collaborations with mainstream companies and relevant universities in the fields of autonomous driving and embodied intelligence, while rapidly building partnerships in the large model sector [1] - The company provides a variety of content including academic paper interpretations, industry production solutions, large model evaluations, business dynamics, industry recruitment, and open-source projects [1] Group 2 - The company is looking for interns to assist in academic paper selection, interpretation, and summarization in the fields of large models, autonomous driving, and embodied intelligence [3] - Interns are expected to have a strong passion for research and sharing knowledge related to technological advancements and events [3] - The internship offers a combination of salary, one-on-one mentorship, industry resource recommendations, and internal job referrals [5]
近2000人了,这个具身领域的黄埔军校有哪些料?
具身智能之心· 2025-08-08 16:02
Core Viewpoint - The article emphasizes the value of a community that provides solutions to problems in the field of embodied intelligence, facilitating knowledge sharing and job opportunities in various sectors related to robotics and AI [3][17]. Group 1: Community and Resources - The community has established a closed loop in various fields including industry, academia, job seeking, and Q&A exchanges, providing timely solutions and research insights [3][5]. - It offers a comprehensive collection of over 30 technical routes, benchmarks, and learning paths to help users quickly find relevant information [5][12]. - The community invites industry experts to answer questions and share insights through roundtable forums and live broadcasts, covering a wide range of topics from data to algorithms [5][18]. Group 2: Job Opportunities and Networking - The community has set up a job referral mechanism with multiple leading companies in the field of embodied intelligence, facilitating direct connections between job seekers and employers [11][18]. - Members can share their resumes and receive job recommendations in real-time, enhancing their chances of finding suitable positions [11][18]. Group 3: Educational Support - For beginners, the community provides structured technical stacks and learning paths to ease their entry into the field [12][14]. - For those already engaged in research, valuable industry frameworks and project proposals are available to support their work [14][18]. Group 4: Research and Development - The community has compiled a wealth of resources including open-source projects, datasets, and research reports related to embodied intelligence, aiding in the development and application of new technologies [17][24][31]. - It covers various research directions and provides insights into the latest advancements in the field, helping members stay updated on industry trends [21][24][37].
NavA3框架:理解任何指令,导航到任何地方找任何目标(清华大学)
具身智能之心· 2025-08-08 00:08
Core Insights - The article introduces the concept of embodied navigation, emphasizing the gap between current research and the complex, open-ended navigation tasks that humans perform in real environments [3][4] - A new long-range navigation task is proposed, requiring agents to understand advanced human instructions and navigate in real-world settings, leading to the development of a hierarchical framework called NavA³ [4][6] Research Background and Motivation - Embodied navigation is essential for agents to move and interact within physical environments, but existing studies focus on predefined object navigation or instruction following, which do not meet the nuanced demands of human navigation [3] Key Contributions - A challenging long-range navigation task is introduced, requiring agents to comprehend advanced human instructions and locate objects with complex spatial relationships in indoor environments [6] - The NavA³ framework is designed to combine global and local strategies for understanding diverse high-level instructions, cross-region navigation, and object localization [11] - A dataset containing 1 million samples of spatial perception object affordance is constructed to train the NaviAfford model, enabling it to understand complex spatial relationships and achieve precise object pointing [11] Methodology Framework: NavA³ - NavA³ employs a "global to local" hierarchical strategy, integrating semantic reasoning with precise spatial localization to tackle long-range navigation tasks [9] - The global strategy involves parsing instructions and determining target areas using a Reasoning-VLM model, which translates high-level human instructions into executable navigation goals [12] - The local strategy focuses on exploration within the target area and precise object localization, utilizing the NaviAfford model trained on the spatial perception dataset [17] Experimental Validation - Experiments were conducted across five scenarios with 50 tasks, evaluating performance through navigation error (NE) and success rate (SR), with NavA³ outperforming existing methods [22] - NavA³ achieved an average success rate of 66.4%, significantly higher than the best baseline method, MapNav, which had a success rate of 25.2% [23] Ablation Studies - The impact of annotations was significant, with complete annotations improving success rates in specific areas by 28.0% and 36.0% [26] - The Reasoning-VLM model demonstrated a substantial increase in average success rates when using advanced reasoning capabilities compared to open-source models [27] Qualitative Analysis - NavA³ effectively understands spatial relationships and can navigate from complex instructions, showcasing adaptability across different robotic platforms [34]
万字长文聊具身智能“成长史”:具身智能跨越了哪些山海,又将奔向哪里
具身智能之心· 2025-08-08 00:08
点击下方 卡片 ,关注" 具身智能 之心 "公众号 本篇内容来源于2025年7月27日由智元机器人主办,以"全球视角下的具身智能新机遇"为主题的"智启具身论 坛"。作为2025世界人工智能大会(WAIC 2025)的重要高峰论坛之一, 本次论坛汇聚包括来自PI、Intrinsic、 清华大学、Sanctuary AI、英伟达、亚马逊等全球具身智能领域的 "最强大脑",聚焦机器人基础模型泛化、高性 能操控等关键方向。 论坛嘉宾如下: 罗剑岚 :智元机器人首席科学家/具身研究中心主任/上海创智学院副教授 Sergey Levine :Physical Intelligence(Pl) 联合创始人/UC Berkeley 副教授 Stefan Schaal :Instrinsic (Alphabet)科学与AI 事务负责人 苏航 :清华大学计算机系副研究员/IEEE TPAMI 期刊编委 陈曦 :亚马逊应用科学、前沿人工智能与机器人部门负责人 姚卯青 :智元机器人合伙人/具身业务部总裁 前言 我们正身处一个机器人技术蓬勃发展的激动人心的时代,公众对机器人技术的关注度达到了前所未有的高度。伴随 这些进展,人工智能从业 ...
这个2000人的具身社区,帮助大家解决了各种各样的难题!
具身智能之心· 2025-08-08 00:08
这类问题前面在咱们的具身社区里面已经碰到过多次了,如何使用设备?如何有效采集数据?如何部署 VA、VLA模型等。是采集背景太复杂还是数据比较dirty? 后面我们也很快给他相关答复,快速用到项目里 面了。 一个社区能在大家最需要帮助的时候解决问题,无疑是非常有价值的。具身智能之心知识星球(国内首个 具身全栈技术社区),目前已经完成了产业、学术、求职、问答交流等多个领域的闭环。遇到什么问题就 分享什么解决方案,哪块研究最前沿,就给大家源源不断提供解决思路,还有求职岗位第一时间对接给大 家!除了上面的问题,我们还为大家梳理了很多其它的内容: 机器人仿真和数据采集有哪些平台? 人形机器人怎么做模仿学习?VLA为什么难做? VLA在机器人抓取与规划任务中是怎么用的? VLA+RL是怎么做的?为什么work? sim2real效果不好怎么办?real2sim2real是怎么work的? 分层决策一般是怎么做的?和端到端比优势劣势有哪些? 具身机器人的研报有哪些?30家汇总 多家头部具身机器人公司岗位分享招聘 具身智能,如何选择研究方向?哪个方向容易出成果? ...... 更有料的是: 星球内部为大家梳理了近30+技术路 ...
具身智能之心运营实习生招募来啦!合伙人1v1培养(只有1个名额哦)
具身智能之心· 2025-08-07 12:00
大家好,我们是自动驾驶之心/具身智能/大模型之心Tech团队。非常高兴在这里和你相遇,如果你也认同技 术内容可以改变世界,那你可能就是我们在找的人! 1. 自驾、大模型、具身相关研究方向,本科及以上学历,硕士优先; 2. 对技术相关的前沿进展和事件有极高的研究热情和分享欲; 3. 较强的执行力、效率意识和沟通意识; 4. 有一定的文字功底,逻辑清晰,表达流畅; 5. 具备较强的学习能力和知识梳理能力; 6. 加分项: 有技术背景,独立解读学术论文,运行部署开源项目和撰写代码demo; 有产品背景,能深入体验和拆解AI产品,提炼核心价值; 有运营背景,主导运营过原创科技自媒体账号; 我们在做什么? 我们希望通过技术内容连接学术界和工业界,成为企业和学校沟通的桥梁,更乃至数十万的AI开发者和创 业者。我们致力于为大家带来全网最新最权威的技术信息,团队聚焦在自动驾驶、具身智能、大模型等AI 最前沿的技术领域,涵盖学术论文解读、业内量产方案分析、大模型评测、商业动态、行业招聘、开源项 目等,并通过公众号、社群、视频号、知乎、小红书、B站等平台进行内容分享、粉丝交流及企业联系。 目前自动驾驶和具身智能两个方向我们已经和 ...