Workflow
具身智能之心
icon
Search documents
VLA和VLN技术交流群来啦!
具身智能之心· 2025-08-06 08:30
Group 1 - The establishment of multiple VLA and VLN related communities by the company aims to facilitate discussions on developments in academia, industry, and product implementations [1] - The company encourages individuals interested in VLA/VLN to join the community by adding a specific assistant on WeChat [2]
具身智能之心招募科研辅导老师了!学术圈的大佬看过来~
具身智能之心· 2025-08-06 08:30
具身智能之心招募科研辅导老师了!如果您是具身智能方向,手里握有多篇顶会、顶刊,欢迎和我们一起带动 学术界的发展。 方向一览 行业资源共享,享有论文署名与现金激励!详细请咨询小助理微信oooops-life了解更多。 要求说明 博士及以上学历(包含在读),2篇A会或一区以上期刊/会议,有辅导经验的优先。 待遇说明 包括但不限于:VLA、VLN、遥操作、Diffusion Policy、强化学习、VLA+RL、sim2real、多模态大模型、仿 真、运动控制、目标导航等方向。 ...
ICCV 2025具身研讨会&挑战赛征稿来啦!人机场景交互与协作多个方向
具身智能之心· 2025-08-06 03:37
✨ 场景融合实践:将机器人集成到交互式环境中,以促进无缝且有效的团队合作。 点击下方 卡片 ,关注" 具身智能之心 " 公众号 ICCV 2025 "人机场景交互与协作"研讨会&挑战赛将于 2025年10月20日下午(UTC-10) 在夏威夷檀香山举 行。 -------------------------研讨会简介-------------------------- 智能机器人正以前所未有的速度融入我们的生活场景——从家庭、医院到工厂和学校,它们将逐渐成为人 类的伙伴与助手。 如何让这些具身智能体更安全、更智能、更自然地与人类协作,并适应不断变化的环 境? 这正是本次研讨会希望与您共同探索的核心议题。 本次研讨会将聚焦以下前沿方向,诚邀您的参与: ✨ 知识迁移创新:从人与人以及人与场景的交互和协作中转移知识,为人形和其他具身智能体的开发提供 信息(例如,通过重定向)。 ✨ 视觉表征突破:探索不同的方法来提取视觉表征信息,以捕获与人机协作相关的对象属性、动态和可供 性。 ✨ 意图预测革命:研究对人类意图进行建模和预测的方法,使机器人能够预测行动并安全做出反应。 ✨ 评估体系构建:建立有意义的基准和指标来衡量人 ...
具身智能数采方案:全身动捕工作一览
具身智能之心· 2025-08-06 00:19
Core Viewpoint - The article discusses various advanced full-body motion capture solutions, highlighting their technical complexities and potential applications in robotics and teleoperation [1][3]. Group 1: OpenWBC Project - OpenWBC enables full-body control of the Unitree G1 robot using Apple Vision Pro for upper body teleoperation and OpenHomie algorithm for lower body movement, supporting full-body data collection [3][5]. Group 2: TWIST Project - TWIST, developed by Stanford University, represents a significant advancement in humanoid robot teleoperation, utilizing full-body motion imitation to achieve coordinated actions through a single neural network controller [6][11]. Group 3: AMO Project - The Adaptive Motion Optimization (AMO) framework from UC San Diego combines reinforcement learning and trajectory optimization for real-time adaptive full-body control, demonstrating superior stability and expanded workspace [9][11]. Group 4: R²S² Framework - The Real-world-Ready Skill Space (R²S²) framework focuses on developing a skill library for humanoid robots, ensuring optimal performance and robust transferability from simulation to real-world tasks [16]. Group 5: CLONE Project - The CLONE system from Beijing Institute of Technology addresses the challenges of humanoid robot teleoperation by implementing a closed-loop correction system, achieving high fidelity in full-body operations with minimal drift [20]. Group 6: Community and Resources - The article promotes the "Embodied Intelligence Knowledge Planet," a community for sharing cutting-edge academic content, open-source projects, and job opportunities in the field of embodied intelligence [23][25][32].
具身智能数采方案:全身动捕工作一览
具身智能之心· 2025-08-05 05:44
Core Viewpoint - The article discusses various advanced full-body motion capture solutions, highlighting their technical complexities and potential applications in robotics and teleoperation [1][22]. Group 1: OpenWBC Project - OpenWBC enables full-body control of the Unitree G1 robot using Apple Vision Pro for upper body teleoperation and OpenHomie for lower body movement, supporting full-body data collection [3][4]. Group 2: TWIST Project - TWIST, developed by Stanford University, allows remote control of humanoid robots through full-body motion imitation, utilizing motion capture data and reinforcement learning to enhance tracking accuracy and coordination [5][10]. Group 3: AMO Project - The AMO framework from UC San Diego combines reinforcement learning and trajectory optimization for real-time adaptive full-body control, demonstrating superior stability and expanded workspace capabilities [8][10]. Group 4: R²S² Framework - The R²S² framework focuses on developing a skill library for humanoid robots, ensuring optimal performance and robust transferability from simulation to real-world tasks [15]. Group 5: CLONE Project - The CLONE system from Beijing Institute of Technology addresses the challenges of full-body teleoperation by implementing a closed-loop correction system, achieving high fidelity in long-distance movements and complex coordinated actions [19]. Group 6: Community and Resources - The article promotes a community platform for sharing cutting-edge content, technical routes, and job opportunities in the field of embodied intelligence, aiming to foster collaboration and knowledge exchange [22][25][31].
哈工大提出UAV-ON:面向空中智能体的开放世界目标导航基准测试
具身智能之心· 2025-08-05 00:03
Research Background and Motivation - The application of drones in various fields such as cargo transport, emergency rescue, and environmental monitoring is increasing, necessitating autonomous navigation in complex, dynamic environments [2] - Existing research primarily relies on visual-language navigation (VLN) methods, which require detailed step-by-step language instructions, limiting scalability and autonomy in open-world scenarios [2] - Object navigation (ObjectNav) is proposed as an alternative, focusing on semantic cues for target localization without dense instruction sequences, yet it has been underexplored in large-scale, unstructured outdoor environments [2] UAV-ON Benchmark Overview - UAV-ON is the first large-scale benchmark for instance-level target navigation of drones in open-world settings, featuring diverse environments built on Unreal Engine, covering urban, forest, mountainous, and aquatic scenes, with a total area of approximately 9 million square units [4] - The benchmark defines 1,270 annotated targets, each associated with an instance-level semantic instruction, introducing real-world ambiguities and reasoning challenges for drones [4] Task Setup - Each task involves the drone being randomly placed in the environment, relying solely on RGB-D sensor data for navigation, requiring autonomous obstacle avoidance and path planning without global maps or external information [6] - The task terminates when the drone issues a stop command, collides with an obstacle, or reaches a maximum of 150 steps, with success defined as being within 20 units of the target [6] Sensor and Action Space - The drone is equipped with four synchronized RGB-D cameras capturing images from different orientations, relying entirely on first-person perception and memory for navigation [9] - The action space includes parameterized continuous actions for translation and rotation, requiring physical execution of movements, which enhances realism compared to existing benchmarks [9] Dataset and Evaluation Metrics - The training set consists of 10 environments and 10,000 navigation episodes, while the test set includes 1,000 episodes across familiar and new environments to assess generalization capabilities [10] - Evaluation metrics include success rate (SR), potential success rate (OSR), success distance (DTS), and success-weighted path length (SPL) [10] Baseline Methods and Experimental Results - Four baseline methods were implemented to compare performance across different strategies, including random strategy, CLIP heuristic exploration, and aerial object navigation agents (AOA) [11][13] - Results indicate significant performance differences among methods, with AOA-V showing the highest OSR (26.30%) but lower SR (4.20%) and SPL (0.87%), highlighting challenges in simultaneous semantic understanding and motion planning [14][16] - Collision rates exceeded 30% across all methods, indicating deficiencies in obstacle avoidance and robust control [15] Conclusion - The UAV-ON benchmark provides a comprehensive framework for advancing drone navigation research in open-world environments, addressing the limitations of existing methods and paving the way for future developments in autonomous navigation technologies [2][4][10]
腾讯入局具身智能,宇树首批用上“大脑”
具身智能之心· 2025-08-05 00:03
Core Viewpoint - Tencent is entering the embodied intelligence trend by developing a modular external brain for robots, rather than engaging in hardware production or commercialization [2][8][22]. Group 1: Tairos Platform Overview - The Tairos platform serves as a general external brain for robots, allowing various manufacturers to access specific capabilities without needing to develop their own hardware [3][10]. - It integrates Tencent's software capabilities in embodied intelligence, including multi-modal perception, planning, and action models [6][12]. - The platform is designed to be modular, enabling robot manufacturers to select and utilize the components that best fit their needs [16][20]. Group 2: Model and Cloud Services - The core of the Tairos platform consists of three main models: multi-modal perception model, planning model, and perception-action joint model, each analogous to parts of the human brain [12][13]. - The platform provides cloud services in a tool platform format for developing, training, testing, and deploying robotic applications [14][17]. - It allows for flexible integration, where manufacturers can use specific models from Tairos to enhance their robots' capabilities [18][19]. Group 3: Strategic Focus and Industry Position - Tencent emphasizes its "three no principles": not producing hardware, not engaging in mass production, and not pursuing commercialization [22][23]. - The company aims to focus on foundational research and exploration in embodied intelligence, rather than limiting itself to hardware development [34][35]. - By collaborating with various robot manufacturers, Tencent seeks to gain insights into industry trends and challenges, enhancing its software offerings [37][39]. Group 4: Future Directions - The future development of embodied intelligence will revolve around the "IDEAS" framework, which includes virtual-real integration, lowering technical barriers, intelligent evolution, agentification, and perception expansion [39][40]. - Tencent's goal is to accelerate the industry towards a pivotal moment in embodied intelligence, akin to the introduction of the iPhone [40][41].
无界智慧招募操作算法、导航算法、运动控制等方向(社招+实习)
具身智能之心· 2025-08-05 00:03
Group 1 - The core viewpoint of the article emphasizes the development of spatial-temporal AI technology, focusing on creating intelligent systems that integrate multi-modal perception, autonomous cognition, decision-making, and precise task execution capabilities [1] - The company aims to develop a "growing digital family" companion robot for healthcare and elderly care scenarios, leveraging spatial perception, environmental understanding, and behavioral decision-making [1] - The team consists of members from prestigious institutions such as PKU, Tsinghua University, CASIA, CMU, and MBZUAI, indicating a strong academic and research foundation [1] Group 2 - The article discusses the establishment of a community called "Embodied Intelligence Heart Knowledge Planet," which serves as a platform for technical exchange in the field of embodied intelligence [4][19] - The community has created a closed loop in various fields, including industry, academia, job seeking, and Q&A exchanges, aiming to provide valuable resources and networking opportunities [6][19] - Members of the community can access a wealth of resources, including over 30 technical routes, open-source projects, and job recommendations, facilitating both entry-level and advanced learning [7][19][25] Group 3 - The community has established a job referral mechanism with multiple embodied intelligence companies, allowing members to submit their resumes for potential job opportunities [13] - Various learning paths and resources are available for beginners and those already engaged in related research, helping them to enhance their skills and knowledge in the field [14][16] - The community also hosts discussions on various topics related to robotics and embodied intelligence, providing a platform for members to seek advice and share experiences [20][78]
Interleave-VLA:首个支持交错图文指令的VLA框架,跨域泛化提升2-3倍
具身智能之心· 2025-08-05 00:03
Core Viewpoint - The article introduces the Interleave-VLA framework, which enhances robot manipulation by utilizing interleaved image-text instructions, demonstrating significant improvements in performance over existing models [2][3][7]. Group 1: Interleave-VLA Framework - Interleave-VLA is the first framework capable of understanding interleaved image-text instructions and generating continuous action sequences in the physical world [2]. - The framework is model-agnostic and requires minimal modifications to current state-of-the-art VLA models, providing strong zero-shot generalization capabilities [2][3]. Group 2: Data Set Development - A major challenge in implementing Interleave-VLA was the lack of a large-scale interleaved embodied dataset. To address this, an automated process was developed to convert pure text instructions from the Open X-Embodiment dataset into interleaved image-text instructions [2]. - The resulting dataset contains 210,000 interaction data points and 13 million frames of images, marking the first large-scale real-world interleaved embodied dataset [2]. Group 3: Performance Evaluation - Comprehensive evaluations in simulation benchmarks and real robot experiments show that Interleave-VLA significantly enhances cross-domain generalization capabilities by 2-3 times compared to state-of-the-art baseline models [3]. - The framework supports flexible task interfaces and can handle various user-provided image instructions, such as hand-drawn sketches, in a zero-shot manner [3]. Group 4: Advantages of Interleaved Instructions - The interleaved instruction paradigm effectively utilizes heterogeneous datasets and diverse instruction images, including those sourced from the internet, showcasing its substantial scalability potential [3][7].
具身机器人公司无界智慧招募操作算法、导航算法、运动控制等方向(社招+实习)
具身智能之心· 2025-08-04 10:19
Group 1 - The core viewpoint of the article emphasizes the development of spatial-temporal AI technology, focusing on creating intelligent systems that integrate multi-modal perception, autonomous cognition, decision-making, and precise task execution [1] - The company aims to develop a "growth-type digital family" companion robot for healthcare and elderly care, leveraging spatial perception, environmental understanding, and behavioral decision-making [1] - The team consists of members from prestigious institutions such as PKU, Tsinghua, CASIA, CMU, and MBZUAI, indicating a strong academic foundation [1] Group 2 - The article discusses the establishment of a community called "Embodied Intelligence Heart Knowledge Planet," which serves as a platform for technical exchange in the field of embodied intelligence [4][19] - The community has created a closed loop in various fields, including industry, academia, job seeking, and Q&A exchanges, aiming to provide valuable resources and networking opportunities [6][19] - Members of the community can access cutting-edge academic content, roundtable discussions, open-source code solutions, and timely job information [7][19] Group 3 - The community has established a job referral mechanism with multiple embodied intelligence companies, facilitating connections between job seekers and potential employers [13] - Various learning paths and technical stacks have been organized for beginners and those already engaged in related research, helping them to quickly enter the field [14][16] - The community offers a wealth of resources, including open-source projects, datasets, and industry reports, to support members in their professional development [19][27][34]