Workflow
具身人工智能
icon
Search documents
CoRL 2025|隐空间扩散世界模型LaDi-WM大幅提升机器人操作策略的成功率和跨场景泛化能力
机器之心· 2025-08-17 04:28
Core Viewpoint - The article discusses the introduction of LaDi-WM (Latent Diffusion-based World Models), a novel world model that utilizes latent space diffusion to enhance robot operation performance through predictive strategies [2][28]. Group 1: Innovation Points - LaDi-WM employs a latent space representation constructed using pre-trained vision foundation models, integrating both geometric features (derived from DINOv2) and semantic features (derived from Siglip), which enhances the generalization capability for robotic operations [5][10]. - The framework includes a diffusion strategy that iteratively optimizes output actions by integrating predicted states from the world model, leading to more consistent and accurate action results [6][12]. Group 2: Framework Structure - The framework consists of two main phases: world model learning and policy learning [9]. - **World Model Learning**: Involves extracting geometric and semantic representations from observation images and implementing a diffusion process that allows interaction between these representations to improve dynamic prediction accuracy [10]. - **Policy Model Training and Iterative Optimization**: Utilizes future predictions from the world model to guide policy learning, allowing for multiple iterations of action optimization, which reduces output distribution entropy and enhances action prediction accuracy [12][18]. Group 3: Experimental Results - In extensive experiments on virtual datasets (LIBERO-LONG, CALVIN D-D), LaDi-WM demonstrated a significant increase in success rates for robotic tasks, achieving a 27.9% improvement on the LIBERO-LONG dataset, reaching a success rate of 68.7% with minimal training data [15][16]. - The framework's scalability was validated, showing that increasing training data and model parameters consistently improved success rates in robotic operations [18][20]. Group 4: Real-World Application - The framework was also tested in real-world scenarios, including tasks like stacking bowls and opening drawers, where LaDi-WM improved the success rate of original imitation learning strategies by 20% [24][25].
"杭州六小龙"之一的群核科技招股书正式失效
Jin Rong Jie· 2025-08-14 09:48
Core Viewpoint - The IPO application of Qunhe Technology has officially lapsed due to the expiration of the six-month validity period, requiring the company to prepare new listing materials [1][2]. Company Overview - Qunhe Technology, established in 2011, focuses on spatial design software and utilizes AI technology along with specialized graphics processing unit clusters to create a physically accurate world simulator [1]. - The company is recognized as the largest spatial design platform globally based on average monthly active users in 2023 and holds approximately 22.2% market share in China by revenue, leading the industry [1]. Financial Performance - Qunhe Technology operates on a subscription model, with revenue increasing from 601 million yuan in 2022 to 664 million yuan in 2023, reflecting a growth of 10.5% [1]. - For the first nine months of 2024, the company reported revenue of 553 million yuan, a year-on-year increase of 13.8% compared to the same period in 2023 [1]. - The gross profit margins were 72.7% in 2022 and improved to 76.8% in 2023 [1]. - The annual loss decreased from 704 million yuan in 2022 to 646 million yuan in 2023, marking an 8.2% reduction [1]. Product Portfolio - Qunhe Technology's product offerings include the spatial design software KuJiaLe and its overseas version Coohom, along with a virtual environment training platform for AI-generated content, embodied AI, augmented reality/virtual reality, and robotics [1]. Market Context - Qunhe Technology is part of the "Hangzhou Six Little Dragons," a group of innovative tech companies from Hangzhou, Zhejiang, showcasing strong innovation capabilities in their respective fields [2]. - The lapse of the IPO application indicates that Qunhe Technology must re-submit an updated prospectus if it intends to pursue listing again [2].
辛顿敷衍走场,是对科学的败坏
Guan Cha Zhe Wang· 2025-08-04 06:24
Core Viewpoint - The article discusses the contrasting perspectives on artificial intelligence (AI), highlighting the divide between optimistic and pessimistic views regarding AI's capabilities and understanding [1][2][4]. Group 1: Perspectives on AI - There is a debate between AI optimists, like Geoffrey Hinton, and pessimists who argue that AI lacks true understanding and relies on statistical methods [1][2]. - The notion that "intelligence is based on reasoning" is criticized as overly simplistic and reflective of Western rationalism, which may overlook the complexities of human understanding [2][7]. - The article emphasizes the need for a scientific approach to AI, advocating for a realistic assessment of its capabilities rather than subjective interpretations [2][4]. Group 2: Scientific Foundations of AI - AI currently lacks a foundational scientific theory, functioning more as a craft based on empirical methods rather than established scientific principles [8][9]. - The historical context of scientific breakthroughs is discussed, noting that modern science has faced stagnation in foundational theories since the mid-20th century [8]. - The article argues that the recognition of AI researchers with Nobel Prizes does not signify a theoretical breakthrough in AI but rather highlights a broader stagnation in scientific understanding [8]. Group 3: Technical Principles and Applications - AI is described as a statistical method that has gained prominence due to its practical applications, rather than theoretical advancements [9][12]. - The relationship between AI and information technology is outlined, indicating that AI is a subset of broader technological applications aimed at enhancing human capabilities [12][14]. - The article posits that while AI can surpass human performance in specific tasks, it does not equate to achieving human-like consciousness or understanding [14][16]. Group 4: Historical Context and Future Implications - The evolution of technology from craft-based methods to modern scientific approaches is discussed, emphasizing the limitations of purely empirical methods [15]. - The article warns against the potential for misinformation and exaggerated claims about AI's capabilities, suggesting that such narratives may distract from genuine scientific inquiry [16][18]. - It concludes with a cautionary note about entering a "post-scientific" era, where the integrity of scientific discourse may be compromised by unsubstantiated claims [18].
谢耘:诺奖得主辛顿敷衍走场,是对科学的败坏
Hu Xiu· 2025-08-04 05:57
Group 1 - The article discusses the contrasting views on artificial intelligence (AI), highlighting a divide between pessimistic and optimistic perspectives among experts [2][3][5] - It emphasizes that while AI can perform certain tasks, it lacks true understanding and reasoning capabilities, relying instead on statistical methods [7][8][10] - The article critiques the notion that AI's intelligence is akin to human intelligence, arguing that there are fundamental differences in understanding and reasoning [11][12][24] Group 2 - The lack of a solid scientific foundation for AI is noted, with historical references to Turing's work being described as subjective and not meeting scientific standards [10][12][14] - The article points out that AI's reliance on statistical methods has led to practical applications but does not equate to theoretical breakthroughs in science [15][17] - It suggests that AI is merely a part of the broader information technology landscape, which aims to enhance human capabilities rather than replace them [19][20][21] Group 3 - The historical context of technological development is discussed, indicating that reliance on empirical craftsmanship has limitations compared to scientific advancements [22][24] - The article warns against the potential for misinformation and the dilution of scientific rigor in the discourse surrounding AI, especially as society enters a "post-science" era [24][25] - It concludes that the aspiration to create machines with human-like consciousness remains unattainable without a deeper scientific understanding of consciousness itself [23][24]
硬蛋创新(00400):以边缘AI算力“Nvidia Jetson”为基石,赋能人形机器人赛道
智通财经网· 2025-07-28 11:55
Group 1 - Nvidia and Hede Innovation held an online seminar focusing on humanoid robots and their integrated hardware and software solutions [1] - The upcoming flagship platform, Jetson Thor, is set to launch in August, emphasizing edge AI computing for humanoid robots [1] - Nvidia's three computing platforms—DGX, Jetson, and Omniverse—provide a comprehensive solution for training, simulation optimization, and deployment of embodied robots [1] Group 2 - Humanoid robots are seen as a key hardware node for breakthroughs in embodied artificial intelligence, with global spending in the robotics sector projected to approach $370 billion by 2028, growing at a CAGR of 13.2% [2] - Hede Innovation is a core supplier in the AI computing supply chain, representing major brands like Nvidia, Intel, and Microsoft, and is focusing on the Jetson series for edge AI applications [2] - The performance of Hede Innovation is expected to benefit from the leadership of Nvidia Jetson products in the edge AI field, reinforcing its core position in the AI computing supply chain [3]
VLN-PE:一个具备物理真实性的VLN平台,同时支持人形、四足和轮式机器人(ICCV'25)
具身智能之心· 2025-07-21 08:42
Core Insights - The article introduces VLN-PE, a physically realistic platform for Vision-Language Navigation (VLN), addressing the gap between simulated models and real-world deployment challenges [3][10][15] - The study highlights the significant performance drop (34%) when transferring existing VLN models from simulation to physical environments, emphasizing the need for improved adaptability [15][30] - The research identifies the impact of various factors such as robot type, environmental conditions, and the use of physical controllers on model performance [15][32][38] Background - VLN has emerged as a critical task in embodied AI, requiring agents to navigate complex environments based on natural language instructions [6][8] - Previous models relied on idealized simulations, which do not account for the physical constraints and challenges faced by real robots [9][10] VLN-PE Platform - VLN-PE is built on GRUTopia, supporting various robot types and integrating high-quality synthetic and 3D rendered environments for comprehensive evaluation [10][13] - The platform allows for seamless integration of new scenes, enhancing the scope of VLN research and assessment [10][14] Experimental Findings - The experiments reveal that existing models show a 34% decrease in success rates when transitioning from simulated to physical environments, indicating a significant gap in performance [15][30] - The study emphasizes the importance of multi-modal robustness, with RGB-D models performing better under low-light conditions compared to RGB-only models [15][38] - The findings suggest that training on diverse datasets can improve the generalization capabilities of VLN models across different environments [29][39] Methodologies - The article evaluates various methodologies, including single-step discrete action classification models and multi-step continuous prediction methods, highlighting the potential of diffusion strategies in VLN [20][21] - The research also explores the effectiveness of map-based zero-shot large language models (LLMs) for navigation tasks, demonstrating their potential in VLN applications [24][25] Performance Metrics - The study employs standard VLN evaluation metrics, including trajectory length, navigation error, success rate, and others, to assess model performance [18][19] - Additional metrics are introduced to account for physical realism, such as fall rate and stuck rate, which are critical for evaluating robot performance in real-world scenarios [18][19] Cross-Embodiment Training - The research indicates that cross-embodiment training can enhance model performance, allowing a unified model to generalize across different robot types [36][39] - The findings suggest that using data from multiple robot types during training leads to improved adaptability and performance in various environments [36][39]
港大强化学习驱动连续环境具身导航方法:VLN-R1
具身智能之心· 2025-07-04 09:48
Core Viewpoint - The article presents the VLN-R1 framework, which utilizes large vision-language models (LVLM) for continuous navigation in real-world environments, addressing limitations of previous discrete navigation methods [5][15]. Research Background - The VLN-R1 framework processes first-person video streams to generate continuous navigation actions, enhancing the realism of navigation tasks [5]. - The VLN-Ego dataset is constructed using the Habitat simulator, providing rich visual and language information for training LVLMs [5][6]. - The importance of visual-language navigation (VLN) is emphasized as a core challenge in embodied AI, requiring real-time decision-making based on natural language instructions [5]. Methodology - The VLN-Ego dataset includes natural language navigation instructions, historical frames, and future action sequences, designed to balance local details and overall context [6]. - The training method consists of two phases: supervised fine-tuning (SFT) to align action predictions with expert demonstrations, followed by reinforcement fine-tuning (RFT) to optimize model performance [7][9]. Experimental Results - In the R2R task, VLN-R1 achieved a success rate (SR) of 30.2% with the 7B model, significantly outperforming traditional models without depth maps or navigation maps [11]. - The model demonstrated strong cross-domain adaptability, outperforming fully supervised models in the RxR task with only 10K samples used for RFT [12]. - The design of predicting future actions was found to be crucial for performance, with the best results obtained by predicting six future actions [14]. Conclusion and Future Work - VLN-R1 integrates LVLM and reinforcement learning fine-tuning, achieving state-of-the-art performance in simulated environments and showing potential for small models to match larger ones [15]. - Future research will focus on validating the model's generalization capabilities in real-world settings and exploring applications in other embodied AI tasks [15].
机器人视觉语言导航进入R1时代!港大联合上海AI Lab提出全新具身智能框架
量子位· 2025-06-25 00:33
Core Insights - The article discusses the advancements in visual language navigation technology, specifically the VLN-R1 model developed by the University of Hong Kong and Shanghai AI Lab, which enables robots to navigate complex environments using natural language instructions without relying on discrete maps [1][3]. Group 1: Performance and Efficiency - VLN-R1 demonstrates strong performance in the VLN-CE benchmark, surpassing the results of larger models with only a 2 billion parameter model after RFT training [2]. - In long-distance navigation tasks, VLN-R1 showcases "cross-domain transfer," achieving superior performance with only 10,000 RxR samples after pre-training on R2R, highlighting its data efficiency [2][15]. Group 2: Innovation in Navigation - The core challenge of visual language navigation (VLN) is to enable agents to autonomously complete navigation tasks based on natural language commands while integrating real-time visual perception [3]. - Traditional navigation systems rely on discrete topological maps, limiting their adaptability to complex environments and dynamic changes [4][5]. Group 3: Training Mechanisms - VLN-R1 employs a two-stage training approach combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) to enhance decision-making capabilities [7]. - The model utilizes a group comparison optimization (GRPO) method to generate multiple action plans for the same instruction, optimizing strategies based on relative performance [7]. - A time decay reward (TDR) mechanism is introduced to prioritize immediate actions, ensuring the model focuses on current obstacles before planning future steps [8][9]. Group 4: Data Set and Memory Management - The VLN-Ego dataset, created using the Habitat simulator, includes 630,000 R2R and 1.2 million RxR training samples, emphasizing first-person perspectives and real-time decision-making [12]. - A long-short term memory sampling strategy is implemented to balance recent experiences with long-term memory, allowing the model to respond effectively to sudden changes in the environment [14]. Group 5: Future Implications - The research indicates that the key to embodied intelligence lies in creating a closed-loop learning system that mimics human perception, decision-making, and action [16]. - The framework's reproducibility and scalability are enhanced with the open availability of the VLN-Ego dataset and training methods, promoting the transition of AI from "digital intelligence" to "embodied cognition" across various applications [16].
博原资本携手银河通用成立“博银合创”,加速具身人工智能赋能工业自动化
投中网· 2025-06-18 02:21
Core Viewpoint - The establishment of "博银合创" marks a significant step towards the industrialization of embodied artificial intelligence in China, aiming to enhance global smart manufacturing through collaboration and innovation [1][22]. Group 1: Company Formation and Objectives - Bosch Group's investment platform, 博原资本, has partnered with leading Chinese embodied intelligence company, 银河通用, to form a joint venture named "博银合创" [1]. - The new company will focus on complex assembly and intelligent quality inspection, developing agile robots to promote the large-scale implementation of embodied AI in industrial settings [1][9]. - 博银合创 aims to create a complete growth path from early incubation to independent financing and commercialization, establishing a globally competitive smart manufacturing enterprise [9][15]. Group 2: Market Potential and Technological Advancements - According to the International Federation of Robotics (IFR), the global industrial robot market is expected to exceed $80 billion by 2025, with embodied intelligence-driven collaborative robots likely to capture over half of this market [5]. - Embodied AI integrates perception, cognition, and action capabilities, enabling robots to make autonomous decisions and execute tasks accurately in dynamic environments, thus driving the flexibility and intelligence of manufacturing [5][12]. Group 3: Strategic Collaborations and Innovations - 博银合创 has signed a strategic cooperation memorandum with UAES to establish a joint laboratory, "RoboFab," focusing on pilot applications of embodied AI in manufacturing [19]. - The collaboration aims to bridge the gap between foundational research and industrial practice, accelerating the development of reliable and efficient smart robot solutions [20]. - 博原资本's "博原启世" platform will play a crucial role in supporting the joint venture by facilitating resource integration and market expansion [14][22]. Group 4: Future Directions and Global Strategy - 博银合创 is positioned to explore a new paradigm of "global design, local manufacturing" in smart manufacturing, with plans for localized deployment in key manufacturing markets such as Europe, North America, and Southeast Asia [22]. - The company will continue to collaborate with industry partners to build an open and efficient industrial cooperation system, promoting the large-scale deployment of embodied AI in global manufacturing [22].
博原资本设立全资控股平台「博原启世」:已携手银河通用成立「博银合创」
IPO早知道· 2025-06-18 01:26
Core Viewpoint - The establishment of "博银合创" marks a significant step towards the industrialization of embodied artificial intelligence, focusing on complex manufacturing processes and the development of agile robots to enhance automation in the manufacturing sector [2][4][23]. Group 1: Company Initiatives - 博原资本 has launched a wholly-owned platform "博原启世" to strategically incubate and reconstruct the ecosystem of embodied artificial intelligence [2][12]. - A joint venture, "博银合创," has been formed with 银河通用 to focus on core manufacturing scenarios such as complex assembly and intelligent quality inspection [2][8]. - 博银合创 aims to create a complete growth path from early incubation to independent financing and commercialization, establishing a globally competitive intelligent manufacturing enterprise [9][14]. Group 2: Technological Advancements - The global industrial robot market is projected to exceed $80 billion by 2025, with embodied intelligence-driven collaborative robots expected to capture over half of this market [4]. - 博银合创 will leverage 银河通用's self-developed simulation training and synthetic data technology to build a standardized, modular training and deployment system for rapid iteration and large-scale deployment of robotic products [8][12]. - The company is positioned to address key challenges in traditional automation, focusing on high-complexity manufacturing processes that require flexible and precise solutions [8][11]. Group 3: Strategic Collaborations - 博银合创 has signed a strategic cooperation memorandum with UAES to establish a joint laboratory "RoboFab," focusing on pilot applications of embodied artificial intelligence in typical manufacturing processes [19][20]. - 博原启世 will facilitate connections between cutting-edge technology companies and industrial resources, expanding collaborative practices to create a tailored network for embodied artificial intelligence [15][21]. - The OpenBosch innovation platform will play a crucial role in the global collaboration system of 博原启世, providing scenario matching and pilot support for incubation projects [21]. Group 4: Future Outlook - 博原资本 plans to deepen its layout in key areas such as technology standards, production line modules, and data systems to promote localized deployment of embodied robots in major manufacturing markets like Europe, North America, and Southeast Asia [23][24]. - The future strategy includes building an open and efficient industrial cooperation system to facilitate the large-scale deployment of embodied artificial intelligence in global manufacturing [24].