Workflow
具身人工智能
icon
Search documents
VLN-PE:一个具备物理真实性的VLN平台,同时支持人形、四足和轮式机器人(ICCV'25)
具身智能之心· 2025-07-21 08:42
Core Insights - The article introduces VLN-PE, a physically realistic platform for Vision-Language Navigation (VLN), addressing the gap between simulated models and real-world deployment challenges [3][10][15] - The study highlights the significant performance drop (34%) when transferring existing VLN models from simulation to physical environments, emphasizing the need for improved adaptability [15][30] - The research identifies the impact of various factors such as robot type, environmental conditions, and the use of physical controllers on model performance [15][32][38] Background - VLN has emerged as a critical task in embodied AI, requiring agents to navigate complex environments based on natural language instructions [6][8] - Previous models relied on idealized simulations, which do not account for the physical constraints and challenges faced by real robots [9][10] VLN-PE Platform - VLN-PE is built on GRUTopia, supporting various robot types and integrating high-quality synthetic and 3D rendered environments for comprehensive evaluation [10][13] - The platform allows for seamless integration of new scenes, enhancing the scope of VLN research and assessment [10][14] Experimental Findings - The experiments reveal that existing models show a 34% decrease in success rates when transitioning from simulated to physical environments, indicating a significant gap in performance [15][30] - The study emphasizes the importance of multi-modal robustness, with RGB-D models performing better under low-light conditions compared to RGB-only models [15][38] - The findings suggest that training on diverse datasets can improve the generalization capabilities of VLN models across different environments [29][39] Methodologies - The article evaluates various methodologies, including single-step discrete action classification models and multi-step continuous prediction methods, highlighting the potential of diffusion strategies in VLN [20][21] - The research also explores the effectiveness of map-based zero-shot large language models (LLMs) for navigation tasks, demonstrating their potential in VLN applications [24][25] Performance Metrics - The study employs standard VLN evaluation metrics, including trajectory length, navigation error, success rate, and others, to assess model performance [18][19] - Additional metrics are introduced to account for physical realism, such as fall rate and stuck rate, which are critical for evaluating robot performance in real-world scenarios [18][19] Cross-Embodiment Training - The research indicates that cross-embodiment training can enhance model performance, allowing a unified model to generalize across different robot types [36][39] - The findings suggest that using data from multiple robot types during training leads to improved adaptability and performance in various environments [36][39]
港大强化学习驱动连续环境具身导航方法:VLN-R1
具身智能之心· 2025-07-04 09:48
Core Viewpoint - The article presents the VLN-R1 framework, which utilizes large vision-language models (LVLM) for continuous navigation in real-world environments, addressing limitations of previous discrete navigation methods [5][15]. Research Background - The VLN-R1 framework processes first-person video streams to generate continuous navigation actions, enhancing the realism of navigation tasks [5]. - The VLN-Ego dataset is constructed using the Habitat simulator, providing rich visual and language information for training LVLMs [5][6]. - The importance of visual-language navigation (VLN) is emphasized as a core challenge in embodied AI, requiring real-time decision-making based on natural language instructions [5]. Methodology - The VLN-Ego dataset includes natural language navigation instructions, historical frames, and future action sequences, designed to balance local details and overall context [6]. - The training method consists of two phases: supervised fine-tuning (SFT) to align action predictions with expert demonstrations, followed by reinforcement fine-tuning (RFT) to optimize model performance [7][9]. Experimental Results - In the R2R task, VLN-R1 achieved a success rate (SR) of 30.2% with the 7B model, significantly outperforming traditional models without depth maps or navigation maps [11]. - The model demonstrated strong cross-domain adaptability, outperforming fully supervised models in the RxR task with only 10K samples used for RFT [12]. - The design of predicting future actions was found to be crucial for performance, with the best results obtained by predicting six future actions [14]. Conclusion and Future Work - VLN-R1 integrates LVLM and reinforcement learning fine-tuning, achieving state-of-the-art performance in simulated environments and showing potential for small models to match larger ones [15]. - Future research will focus on validating the model's generalization capabilities in real-world settings and exploring applications in other embodied AI tasks [15].
机器人视觉语言导航进入R1时代!港大联合上海AI Lab提出全新具身智能框架
量子位· 2025-06-25 00:33
Core Insights - The article discusses the advancements in visual language navigation technology, specifically the VLN-R1 model developed by the University of Hong Kong and Shanghai AI Lab, which enables robots to navigate complex environments using natural language instructions without relying on discrete maps [1][3]. Group 1: Performance and Efficiency - VLN-R1 demonstrates strong performance in the VLN-CE benchmark, surpassing the results of larger models with only a 2 billion parameter model after RFT training [2]. - In long-distance navigation tasks, VLN-R1 showcases "cross-domain transfer," achieving superior performance with only 10,000 RxR samples after pre-training on R2R, highlighting its data efficiency [2][15]. Group 2: Innovation in Navigation - The core challenge of visual language navigation (VLN) is to enable agents to autonomously complete navigation tasks based on natural language commands while integrating real-time visual perception [3]. - Traditional navigation systems rely on discrete topological maps, limiting their adaptability to complex environments and dynamic changes [4][5]. Group 3: Training Mechanisms - VLN-R1 employs a two-stage training approach combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) to enhance decision-making capabilities [7]. - The model utilizes a group comparison optimization (GRPO) method to generate multiple action plans for the same instruction, optimizing strategies based on relative performance [7]. - A time decay reward (TDR) mechanism is introduced to prioritize immediate actions, ensuring the model focuses on current obstacles before planning future steps [8][9]. Group 4: Data Set and Memory Management - The VLN-Ego dataset, created using the Habitat simulator, includes 630,000 R2R and 1.2 million RxR training samples, emphasizing first-person perspectives and real-time decision-making [12]. - A long-short term memory sampling strategy is implemented to balance recent experiences with long-term memory, allowing the model to respond effectively to sudden changes in the environment [14]. Group 5: Future Implications - The research indicates that the key to embodied intelligence lies in creating a closed-loop learning system that mimics human perception, decision-making, and action [16]. - The framework's reproducibility and scalability are enhanced with the open availability of the VLN-Ego dataset and training methods, promoting the transition of AI from "digital intelligence" to "embodied cognition" across various applications [16].
博原资本携手银河通用成立“博银合创”,加速具身人工智能赋能工业自动化
投中网· 2025-06-18 02:21
Core Viewpoint - The establishment of "博银合创" marks a significant step towards the industrialization of embodied artificial intelligence in China, aiming to enhance global smart manufacturing through collaboration and innovation [1][22]. Group 1: Company Formation and Objectives - Bosch Group's investment platform, 博原资本, has partnered with leading Chinese embodied intelligence company, 银河通用, to form a joint venture named "博银合创" [1]. - The new company will focus on complex assembly and intelligent quality inspection, developing agile robots to promote the large-scale implementation of embodied AI in industrial settings [1][9]. - 博银合创 aims to create a complete growth path from early incubation to independent financing and commercialization, establishing a globally competitive smart manufacturing enterprise [9][15]. Group 2: Market Potential and Technological Advancements - According to the International Federation of Robotics (IFR), the global industrial robot market is expected to exceed $80 billion by 2025, with embodied intelligence-driven collaborative robots likely to capture over half of this market [5]. - Embodied AI integrates perception, cognition, and action capabilities, enabling robots to make autonomous decisions and execute tasks accurately in dynamic environments, thus driving the flexibility and intelligence of manufacturing [5][12]. Group 3: Strategic Collaborations and Innovations - 博银合创 has signed a strategic cooperation memorandum with UAES to establish a joint laboratory, "RoboFab," focusing on pilot applications of embodied AI in manufacturing [19]. - The collaboration aims to bridge the gap between foundational research and industrial practice, accelerating the development of reliable and efficient smart robot solutions [20]. - 博原资本's "博原启世" platform will play a crucial role in supporting the joint venture by facilitating resource integration and market expansion [14][22]. Group 4: Future Directions and Global Strategy - 博银合创 is positioned to explore a new paradigm of "global design, local manufacturing" in smart manufacturing, with plans for localized deployment in key manufacturing markets such as Europe, North America, and Southeast Asia [22]. - The company will continue to collaborate with industry partners to build an open and efficient industrial cooperation system, promoting the large-scale deployment of embodied AI in global manufacturing [22].
博原资本设立全资控股平台「博原启世」:已携手银河通用成立「博银合创」
IPO早知道· 2025-06-18 01:26
Core Viewpoint - The establishment of "博银合创" marks a significant step towards the industrialization of embodied artificial intelligence, focusing on complex manufacturing processes and the development of agile robots to enhance automation in the manufacturing sector [2][4][23]. Group 1: Company Initiatives - 博原资本 has launched a wholly-owned platform "博原启世" to strategically incubate and reconstruct the ecosystem of embodied artificial intelligence [2][12]. - A joint venture, "博银合创," has been formed with 银河通用 to focus on core manufacturing scenarios such as complex assembly and intelligent quality inspection [2][8]. - 博银合创 aims to create a complete growth path from early incubation to independent financing and commercialization, establishing a globally competitive intelligent manufacturing enterprise [9][14]. Group 2: Technological Advancements - The global industrial robot market is projected to exceed $80 billion by 2025, with embodied intelligence-driven collaborative robots expected to capture over half of this market [4]. - 博银合创 will leverage 银河通用's self-developed simulation training and synthetic data technology to build a standardized, modular training and deployment system for rapid iteration and large-scale deployment of robotic products [8][12]. - The company is positioned to address key challenges in traditional automation, focusing on high-complexity manufacturing processes that require flexible and precise solutions [8][11]. Group 3: Strategic Collaborations - 博银合创 has signed a strategic cooperation memorandum with UAES to establish a joint laboratory "RoboFab," focusing on pilot applications of embodied artificial intelligence in typical manufacturing processes [19][20]. - 博原启世 will facilitate connections between cutting-edge technology companies and industrial resources, expanding collaborative practices to create a tailored network for embodied artificial intelligence [15][21]. - The OpenBosch innovation platform will play a crucial role in the global collaboration system of 博原启世, providing scenario matching and pilot support for incubation projects [21]. Group 4: Future Outlook - 博原资本 plans to deepen its layout in key areas such as technology standards, production line modules, and data systems to promote localized deployment of embodied robots in major manufacturing markets like Europe, North America, and Southeast Asia [23][24]. - The future strategy includes building an open and efficient industrial cooperation system to facilitate the large-scale deployment of embodied artificial intelligence in global manufacturing [24].
17视触觉传感器+70%表面触觉覆盖!北大×北通院《自然·机器智能》发表F-TAC Hand,提供全新灵巧手思路!
机器人大讲堂· 2025-06-15 04:41
Core Viewpoint - The development of the F-TAC Hand represents a significant advancement in tactile embodied intelligence, addressing the limitations of existing robotic hands in dynamic environments and enhancing their adaptability and performance in complex tasks [2][6][39]. Group 1: Technological Innovations - The F-TAC Hand integrates 17 high-resolution tactile sensors with a spatial resolution of 0.1 mm, covering 70% of its surface area, achieving near-biological tactile perception while maintaining natural hand movement characteristics [3][12]. - A novel humanoid hand-type generation algorithm has been developed to efficiently process high-dimensional tactile data, creating a complete closed-loop tactile control system that addresses key technical challenges in multi-modal perception and motion coordination [3][6]. Group 2: Performance Validation - The F-TAC Hand demonstrated superior adaptability in dynamic real-world conditions, outperforming traditional non-tactile solutions in 600 multi-object grasping experiments, particularly in environments with noise and dynamic interference (p<0.0001) [5][32]. - The system successfully completed the Kapandji test, achieving all 10 specific contact points between the thumb and other fingers, and executed 33 typical human grasp types, showcasing its high flexibility [33][35]. Group 3: Practical Applications - The F-TAC Hand's design combines high degrees of freedom with extensive tactile coverage, breaking through traditional limitations in robotic hand design, making it suitable for applications in prosthetics, teleoperation systems, collaborative robots, and human-robot interaction [39][45]. - The innovative modular design of the tactile sensors allows for effective integration into the hand structure, enhancing the system's practicality and usability in real-world scenarios [39][42].
10%训练数据超越100%表现,机器人学习领域迎来重要突破
机器之心· 2025-06-11 03:54
Core Viewpoint - The ViSA-Flow framework represents a revolutionary approach to robot skill learning, significantly enhancing learning efficiency in data-scarce situations by extracting semantic action flows from large-scale human videos [4][36]. Group 1: Research Background and Challenges - Traditional robot imitation learning methods require extensive, meticulously curated datasets, which are costly to collect, creating a bottleneck for developing robots capable of diverse real-world tasks [7]. - Humans exhibit remarkable abilities to learn new skills through observation, focusing on semantically relevant components while filtering out irrelevant background information [8]. Group 2: Key Innovations - The core innovation of the ViSA-Flow framework is the introduction of Semantic Action Flow as an intermediate representation, capturing the essential spatiotemporal features of operator-object interactions, unaffected by surface visual differences [11]. - Key components of the framework include: 1. Semantic entity localization using pre-trained visual language models to describe and locate operators and task-related objects [11]. 2. Hand-object interaction tracking to maintain stable segmentation across frames [12]. 3. Flow-conditioned feature encoding to generate rich feature vectors while preserving visual context [13]. Group 3: Experimental Evaluation - In the CALVIN benchmark tests, ViSA-Flow outperformed all baseline methods using only 10% of annotated robot trajectories (1,768), achieving a success rate of 31.4% in completing five consecutive tasks, nearly double that of the next best method [19]. - The average sequence length of 2.96 further demonstrates ViSA-Flow's effectiveness in handling long-duration operational tasks [20]. Group 4: Ablation Studies - Ablation studies indicate that removing semantic entity localization significantly reduces performance, while omitting the time tracking phase decreases the average success length [26]. - The full ViSA-Flow model achieved a success rate of 89.0% in task completion, showcasing its robustness [21]. Group 5: Real-World Experiments - Real-world evaluations of ViSA-Flow included single-stage and long-duration operational tasks, demonstrating its ability to maintain performance across varying task complexities [23][30]. - The model's focus on operator and task-related objects allows for smooth transitions in spatial support as scenes change [31]. Group 6: Technical Advantages and Limitations - Advantages include data efficiency, cross-domain generalization, long-duration stability, and semantic consistency in task execution [40]. - Limitations involve the absence of explicit 3D geometric modeling, reliance on pre-trained components, and potential challenges in tasks requiring precise physical interactions [40]. Group 7: Future Directions - Future developments may include integrating physical modeling, reducing reliance on pre-trained components, combining with reinforcement learning algorithms, and expanding pre-training datasets [40]. Group 8: Significance and Outlook - ViSA-Flow represents a significant breakthrough in robot learning, proving the feasibility of extracting semantic representations from large-scale human videos for skill acquisition [36]. - The framework bridges the gap between human demonstration observation and robot execution, paving the way for more intelligent and efficient robotic learning systems [37].
“AI教母”李飞飞揭秘“世界模型”:要让AI像人类一样理解三维空间
3 6 Ke· 2025-06-06 12:31
Core Insights - The conversation highlighted the vision and research direction behind World Labs, founded by renowned AI expert Fei-Fei Li, focusing on the concept of "world models" that enable AI systems to understand and reason about both textual and physical realities [2][4][6] Group 1: Company Vision and Goals - World Labs aims to tackle unprecedented deep technology challenges, particularly in developing AI systems that possess spatial intelligence, which is crucial for understanding the three-dimensional physical world and virtual environments [2][4] - Fei-Fei Li emphasizes the need for a "perfect partner" who understands computer science and AI, as well as market dynamics, to help guide the company towards its goals [4][5] Group 2: Limitations of Current AI Models - The discussion began with the limitations of large language models (LLMs), with Li arguing that while language is a powerful tool, it is not the best medium for describing the complexities of the three-dimensional physical world [6][10] - Li points out that many capabilities exceed the scope of language, and understanding the world requires building human-like spatial models [11][12] Group 3: Applications of World Models - The potential applications of successfully developed world models are vast, including creativity in design, film, architecture, and robotics, where machines must adapt to and understand their three-dimensional environments [12][13] - Li envisions a future where advancements in world models will allow humans to live in "multiverses," expanding the boundaries of imagination and creativity [13] Group 4: Importance of Spatial Intelligence - Spatial intelligence is identified as a core capability for AI, essential for understanding and interacting with the three-dimensional world, which has been a fundamental aspect of human evolution [10][11] - Li shares personal experiences to illustrate the significance of three-dimensional perception, highlighting the challenges faced by AI systems that lack this capability [14]
AI Agents:从工具到伙伴 | 2025 HongShan AI Day(上篇)
红杉汇· 2025-05-30 06:40
红杉中国合伙人周逵在开场致辞中,从AI技术进化、AI产品特征、AI公司特征、AI商业模式以及未来智能 公司的竞争态势和结果等多个维度,分享了他对AI当下发展与未来走向的思考和见解。他表示,AI是人类 技术进步的新里程碑, "具身"的含义好似给现实生活的各类存在都能带上"大脑"的机会。他说:"无论 是'硬'的机器人还是软的'Agent',共同特点都是在获得信息同时有进一步交付的能力。企业选择Leval 2还是 Leval 4的智能目标,导致的智能能力和商业结果大不相同。"他尤其期待看到"世界模型"的重要进展,期待 下一个AI智能的Aha Moment出现。 Genesis创始人及CEO周衔和红杉中国合伙人公元进行了连线对话。周衔表示,具身人工智能技术的发展, 大概率不会出现陡然的转折点。人们或许会目睹机器人逐步渗透进一些To B的应用场景,在这一阶段,它 暂时无需与人类开展复杂的交互。随着技术的经年打磨与渐次升级,其能力将得到稳步提升,逐步迈向家 庭领域,成为人们日常生活中的得力助手。若持乐观态度,机器人技术有望在约3年左右实现关键性突破, 迎来真正意义上的商业化转折。 红杉中国合伙人郑庆生在演讲中表示,目前, ...
快讯|我国自研国际首创深水海管铺设智能装备完成海试;MIT研发高速精准乒乓球机器人;Persona AI融资2700万美元等
机器人大讲堂· 2025-05-19 13:12
Group 1: Deepwater Pipeline Installation Technology - China's self-developed intelligent monitoring equipment for deepwater pipeline installation, named "Haiwei" system, has successfully completed sea trials, marking a significant breakthrough in the field of intelligent and unmanned deepwater oil and gas equipment [1] - The "Haiwei" system incorporates innovative technologies such as high-resilience unmanned surface vessels, underwater autonomous robots, repeaters, and optical communication, designed for operations at depths of up to 1500 meters [1] - The system includes the first domestic 18-meter unmanned vessel "Guardian" for surface monitoring and the 1500-meter deepwater autonomous underwater robot "Navigator," which can autonomously identify and track sediment points while transmitting data in real-time [1] Group 2: Robotics and AI Innovations - Persona AI Inc. has raised $27 million in seed funding to accelerate the development of humanoid robots designed for shipbuilding and manufacturing tasks [2][4] - The company is led by experienced professionals from the robotics field, including CEO Nic Radford, who has a background with NASA and Nauticus Robotics [4] - MIT engineers have developed a lightweight, high-precision ping pong robot capable of returning balls at speeds up to 19 meters per second, closely matching top human players [5][7] - Ground Control Robotics (GCR) has launched the first commercial bionic multi-legged robot designed for complex agricultural terrains, capable of autonomous navigation and weed removal [8][10] Group 3: Material Science Breakthroughs - Researchers from AMOLF and ARCNL in the Netherlands have developed a counterintuitive "Countersnapping" metamaterial that contracts when stretched, challenging traditional material mechanics [11][13] - This discovery opens new avenues for applications in soft robotics, smart wearables, and earthquake-resistant technologies, showcasing three major breakthroughs: unpowered unidirectional movement, dynamic stiffness adjustment, and self-damping vibration control [13]