具身智能之心
Search documents
IGL-Nav:基于增量式3D高斯定位的图像目标导航(ICCV'25)
具身智能之心· 2025-09-22 00:03
作者丨 Wenxuan Guo等 编辑丨视觉语言导航 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 主要贡献 研究背景 图像目标导航任务要求智能体在未知环境中导航到由图像指定的位置和朝向,这对于智能体理解空间信息以及基于过往观测探索场景的能力提出了很高要求。 提出了 IGL-Nav 框架,通过增量式更新3D高斯表示(3DGS),实现了高效的3D感知图像目标导航,显著优于现有方法。 设计了 粗粒度到细粒度 的目标定位策略,先利用几何信息进行离散空间匹配实现粗粒度定位,再通过可微渲染优化求解精确定位,有效解决了6自由度相机姿态估 计的复杂搜索空间问题。 IGL-Nav能够处理更具挑战性的 自由视角图像 目标设置,并可部署在真实机器人平台上,使用手机拍摄的任意姿态目标图像引导机器人导航。 传统方法或依赖端到端的强化学习,或基于模块化策略使用拓扑图或鸟瞰图作为记忆,但都无法充分建模已探索3D环境与目标图像之间的几何关系。 近期虽有基于可渲染神经辐射图(如RN ...
当机器人学会 “模仿” 人类:RynnVLA-001 如何突破操作数据稀缺困境?
具身智能之心· 2025-09-22 00:03
Core Insights - The article discusses the development of a new VLA model, RynnVLA-001, by Alibaba DAMO Academy, which addresses the scarcity of high-quality operational data in robot manipulation by utilizing human demonstration data [1][5][35]. Group 1: Model Overview - RynnVLA-001 leverages 12 million ego-centered human operation videos and employs a two-stage pre-training strategy to teach robots human operational logic and action trajectories [1][2][5]. - The model achieves an average success rate of 90.6% in various tasks, significantly outperforming existing models like GR00T N1.5 and Pi0, which have lower success rates [2][15]. Group 2: Methodology - The training process consists of three core stages: ego-centered video generation pre-training, human-centered trajectory perception modeling, and robot-centric visual-language-action modeling [7][10][11]. - The introduction of ActionVAE optimizes action representation by compressing action sequences into compact latent embeddings, enhancing the model's ability to predict smooth and coherent actions [6][13][24]. Group 3: Experimental Results - RynnVLA-001 demonstrates superior performance across multiple tasks, achieving success rates of 90.0% for picking and placing green blocks, 91.7% for strawberries, and 90.0% for pen placement [15][17]. - In complex scenarios with distractors, RynnVLA-001 maintains a high success rate of 91.7%, showcasing its robustness in instruction-following tasks [18][19]. Group 4: Pre-training Effectiveness - The two-stage pre-training process is validated through ablation studies, showing that models without video pre-training perform poorly, while those with it exhibit significant improvements in task success rates [19][20][21]. - The model's ability to predict human trajectories effectively bridges the gap between visual prediction and action generation, leading to enhanced performance [21][22]. Group 5: Limitations and Future Directions - Current testing is limited to the LeRobot SO100 robotic arm, indicating a need for broader applicability across different robotic platforms [41]. - Future work should focus on improving environmental generalization and exploring dynamic camera perspectives to enhance robustness [41].
小扎把马斯克机器人一号位挖走了
具身智能之心· 2025-09-22 00:03
Core Insights - The article discusses the recent departures of key personnel from Tesla's Optimus AI team, highlighting the competitive landscape in the AI sector, particularly between Tesla and Meta [1][5][15]. Group 1: Key Personnel Departures - Ashish Kumar, the head of the Optimus AI team, has left Tesla to join Meta as a research scientist, emphasizing the importance of AI in unlocking humanoid robotics [4][6]. - Milan Kovac, another significant figure in the Optimus project, also announced his departure earlier this year, having played a crucial role in developing Tesla's humanoid robot from concept to a fully functional model [10][12]. Group 2: Competitive Dynamics - The article notes that Meta's aggressive hiring strategy, led by Mark Zuckerberg, has raised questions about the financial backing for such moves, with speculation about a $1 billion budget for talent acquisition [5]. - The departures from Tesla's AI team raise concerns about the company's ability to maintain momentum in its robotics initiatives, especially as Musk's ambitions for the Optimus project are significant, with predictions that 80% of Tesla's future value could derive from it [14][15]. Group 3: Internal Conflicts - Reports indicate that several executives at xAI, a company associated with Musk, have left due to conflicts with Musk's close advisors over management and financial concerns, suggesting potential organizational restructuring may be necessary [16][19].
PhysicalAgent:迈向通用认知机器人的基础世界模型框架
具身智能之心· 2025-09-22 00:03
Core Viewpoint - The article discusses the development of PhysicalAgent, a robotic control framework designed to overcome key limitations in the current robot manipulation field, specifically addressing the robustness and generalizability of visual-language-action (VLM) models and world model-based methods [2][3]. Group 1: Key Bottlenecks and Solutions - Current VLM models require task-specific fine-tuning, leading to a significant drop in robustness when switching robots or environments [2]. - World model-based methods depend on specially trained predictive models, limiting their generalizability due to the need for carefully curated training data [2]. - PhysicalAgent aims to integrate iterative reasoning, diffusion video generation, and closed-loop execution to achieve cross-modal and cross-task general manipulation capabilities [2]. Group 2: Framework Design Principles - The framework's design allows perception and reasoning modules to remain independent of specific robot forms, requiring only lightweight skeletal detection models for different robots [3]. - Video generation models have inherent advantages due to pre-training on vast multimodal datasets, enabling quick integration without local training [5]. - The framework aligns with human-like reasoning, generating visual representations of actions based solely on textual instructions [5]. - The architecture demonstrates cross-modal adaptability by generating different manipulation tasks for various robot forms without retraining [5]. Group 3: VLM as the Cognitive Core - VLM serves as the cognitive core of the framework, facilitating a multi-step process of instruction, environment interaction, and execution [6]. - The innovative approach redefines action generation as conditional video synthesis rather than direct control strategy learning [6]. - The robot adaptation layer is the only part requiring specific robot tuning, converting generated action videos into motor commands [6]. Group 4: Experimental Validation - Two types of experiments were conducted to validate the framework's cross-modal generalization and iterative execution robustness [8]. - The first experiment focused on verifying the framework's performance against task-specific baselines and its ability to generalize across different robot forms [9]. - The second experiment assessed the iterative execution capabilities of physical robots, demonstrating the effectiveness of the "Perceive→Plan→Reason→Act" pipeline [12]. Group 5: Key Results - The framework achieved an 80% final success rate across various tasks for both the bimanual UR3 and humanoid G1 robots [13][16]. - The first-attempt success rates were 30% for UR3 and 20% for G1, with average iterations required for success being 2.25 and 2.75, respectively [16]. - The iterative correction process significantly improved task completion rates, with a sharp decline in the proportion of unfinished tasks after the first few iterations [16].
合伙人招募!和我们一起运营这个具身社区吧~
具身智能之心· 2025-09-21 10:00
Core Viewpoint - The article emphasizes the importance of collaboration in the field of embodied intelligence, aiming to create a platform that adds real value to the industry rather than just serving as a media outlet [1]. Group 1: Course Development - The company invites collaboration to develop courses that benefit beginners and promote industry advancement, targeting both consumer and enterprise training as well as academic curriculum development [2][3]. Group 2: Hardware Development - The goal is to create an affordable and user-friendly research platform for embodied intelligence, ensuring accessibility for developers and ease of use for beginners [4]. Group 3: Open Source Projects - The company seeks to build globally influential open source projects in collaboration with others in the field [5][6]. Group 4: Consulting Services - There is an invitation to partner in providing consulting services for both B2B and B2C sectors, focusing on embodied data, ontology, algorithms, and deployment to facilitate industry upgrades and talent development [7][8]. Group 5: Job Opportunities - The company is looking for individuals with engineering experience in the field or those holding a PhD or higher, offering competitive compensation and access to industry resources for both full-time and part-time positions [9][10].
灵御智能遥操TeleAvatar机器人开始交付啦!
具身智能之心· 2025-09-21 04:01
Core Insights - Lingyu Intelligent has achieved a significant milestone in the field of artificial intelligence and robotics with the delivery of its first TeleAvatar robot, marking a breakthrough in product commercialization and market expansion [2][3]. Group 1: Product Delivery - The first TeleAvatar robot (model 001) was officially delivered to the Xiganghu Robotics Institute, with multiple strategic clients set to receive their first batch of robots in the coming weeks [2][3]. - The delivery event was attended by key representatives, including Dr. Xu Qinhua, the technical director of the Xiganghu Robotics Institute, highlighting the importance of this collaboration [3]. Group 2: Technological Innovation - TeleAvatar is an embodiment of advanced technology, featuring high-precision motion control, multi-modal perception integration, and low-latency remote operation capabilities [5][6]. - The robot boasts an operating precision at the sub-millimeter level, with an end-to-end operation delay of less than 30 milliseconds, ensuring real-time responsiveness [6]. Group 3: Market Potential - The application areas for TeleAvatar are extensive, including scientific data collection, smart manufacturing, medical services, research exploration, and emergency response, providing robust technical support for industry transformation [7]. - The competitive pricing of TeleAvatar, starting at 79,900 yuan, positions it as a cost-effective solution in the market [6]. Group 4: Company Background - Lingyu Intelligent was founded by a top team from Tsinghua University's Department of Automation, focusing on robot planning control and human-machine interaction [10]. - The company's mission is to create practical benchmarks for embodied intelligence, liberating humans from dangerous, heavy, and tedious tasks through a comprehensive self-research solution that spans hardware, software, and data platforms [10].
具身领域的大模型基础部分,都在这里了......
具身智能之心· 2025-09-20 16:03
Core Viewpoint - The article emphasizes the importance of a comprehensive community for learning and sharing knowledge about large models, particularly in the fields of embodied AI and autonomous driving, highlighting the establishment of the "Large Model Heart Tech Knowledge Planet" as a platform for collaboration and technical exchange [1][3]. Group 1: Community and Learning Resources - The "Large Model Heart Tech" community aims to provide a platform for technical exchange related to large models, inviting experts from renowned universities and leading companies in the field [3][67]. - The community offers a detailed learning roadmap for various aspects of large models, including RAG, AI Agents, and multimodal models, making it suitable for beginners and advanced learners [4][43]. - Members can access a wealth of resources, including academic progress, industrial applications, job recommendations, and networking opportunities with industry leaders [7][70]. Group 2: Technical Roadmaps - The community has outlined specific learning paths for RAG, AI Agents, and multimodal large models, detailing subfields and applications to facilitate systematic learning [9][43]. - For RAG, the community provides resources on various subfields such as Graph RAG, Knowledge-Oriented RAG, and applications in AIGC [10][23]. - The AI Agent section includes comprehensive introductions, evaluations, and advancements in areas like multi-agent systems and self-evolving agents [25][39]. Group 3: Future Plans and Engagement - The community plans to host live sessions with industry experts, allowing members to engage with leading figures in academia and industry [66]. - There is a focus on job sharing and recruitment information to empower members in their career pursuits within the large model domain [70].
PhysicalAgent:迈向通用认知机器人的基础世界模型框架
具身智能之心· 2025-09-20 16:03
Core Viewpoint - The article discusses the development of a new robotic control framework called PhysicalAgent, which aims to overcome existing limitations in the field of robot manipulation by integrating iterative reasoning, diffusion video generation, and closed-loop execution [2][4]. Group 1: Key Challenges in Robotics - Current mainstream visual-language-action (VLM) models require task-specific fine-tuning, leading to a significant drop in robustness when switching robots or environments [2]. - World model-based methods depend on specially trained predictive models and carefully curated training data, limiting their generalizability [2]. Group 2: Framework Design and Principles - The PhysicalAgent framework separates perception and reasoning from specific robot forms, requiring only lightweight skeletal detection models for different robots, which minimizes computational costs and data requirements [4]. - The framework leverages pre-trained video generation models that understand physical processes and object interactions, allowing for quick integration without local training [4]. - It aligns human-like reasoning by generating visual representations of actions based on textual instructions, facilitating intuitive robot control [4]. Group 3: VLM's Grounding Reasoning Role - The VLM serves as the cognitive core of the framework, enabling grounding through multiple calls to achieve "instruction-environment-execution" rather than a single planning step [6]. - The framework innovatively reconstructs action generation as conditional video synthesis, moving away from traditional direct control strategy learning [6]. Group 4: Execution Process and Adaptation - The robot adaptation layer translates generated action videos into motor commands, which is the only part requiring robot-specific adaptation [6]. - The process includes task decomposition, contextual scene description, execution monitoring, and model independence, allowing for flexibility in model selection [6]. Group 5: Experimental Validation - Experiments validate the framework's cross-form and perception modality generalization, as well as the robustness of iterative execution [8]. - The first experiment demonstrated that the framework significantly outperformed task-specific baselines in success rates across different robotic platforms [12]. - The second experiment confirmed the robustness of the iterative "Perceive→Plan→Reason→Act" pipeline, achieving an 80% success rate across physical robots [13].
头部具身智能人形机器人公司最新估值/市值
具身智能之心· 2025-09-20 06:12
编辑丨具身智能之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有 你想要的。 头部具身智能人形机器人公司最新估值或市值一览。除了已上市公司外,这里展示的都是已完成或 正在交割的真实估值,未经实际交割、未获交易确认的估值均未列入,单位为人民币。注意,各公 司成立时间和融资阶段差异大。估值高低与技术、商业化水平不能简单划等号。 以下数字仅做参考,如有不足或者遗漏,欢迎后台留言。 Figure AI 2736亿 优必选 555亿 Sklid AI 324亿 Physical Intelligence 170亿 宇树科技 160亿 智元机器人 150亿 Apptronik 144亿 Field AI 144亿 Agility Robotics 126亿 云深处机器人 80亿 傅利叶机器人 80亿 乐聚机器人 80亿 World labs 70亿 Sanctuary AI 70亿 Boston Dynamics 70亿 银河通用 70亿 星海图 70亿 自变量 60亿 ...
英伟达50亿美元入股英特尔,将发布CPU+GPU合体芯片,大结局来了?
具身智能之心· 2025-09-19 16:04
编辑丨 机器之心 点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要的。 他们共同宣布,要把电脑上的 CPU 和 GPU 合成为超级 SoC。 周四晚间,英伟达收购 50 亿美元英特尔股份的新闻引爆了科技圈。 两家公司在 9 月 18 日同时发布公告,宣布达成长期战略合作。英伟达将投资 50 亿美元购买英特尔普通股,基于全新合作,两家公司将共同开发多代定制 数据中心和 PC 产品。 在具体内容上,两家公司将专注于利用 NVIDIA NVLink 无缝连接 NVIDIA 和 Intel 架构 —— 将英伟达的 AI 和加速计算优势与英特尔领先的 CPU、x86 生态系统相结合,为客户提供顶尖解决方案。 对于数据中心,英特尔将构建英伟达定制版 x86 CPU,英伟达会将其集成到其 AI 基础设施平台中并提供给市场。 在个人计算领域,英特尔将打造并向市场推出集成 RTX GPU 芯片组的 x86 系统级芯片 (SoC)。这些全新的 x86 RTX SoC 将为各种需 ...