具身多模态大模型

Search documents
从近1000篇工作中,看具身智能的技术发展路线!
具身智能之心· 2025-09-05 00:45
Core Insights - The article discusses the evolution and challenges of embodied intelligence, emphasizing the need for a comprehensive understanding of its development, issues faced, and future directions [3][4]. Group 1: Robotic Manipulation - The survey on robotic manipulation highlights the transition from mechanical programming to embodied intelligence, focusing on the evolution from simple grippers to dexterous multi-fingered hands [5][6]. - Key challenges in dexterous manipulation include data collection methods such as simulation, human demonstration, and teleoperation, as well as skill learning frameworks like imitation learning and reinforcement learning [5][6]. Group 2: Navigation and Manipulation - The discussion on robotic navigation emphasizes the importance of physics simulators in addressing high costs and data scarcity in real-world training, with a focus on the Sim-to-Real transfer challenges [9][15]. - The evolution of navigation techniques is outlined, transitioning from explicit memory to implicit memory, and the role of various simulators in narrowing the Sim-to-Real gap is analyzed [15][16]. Group 3: Multimodal Large Models - The exploration of embodied multimodal large models (EMLMs) reveals their potential to bridge perception, cognition, and action gaps, driven by advancements in large model technologies [17][19]. - Challenges identified include cross-modal alignment difficulties, high computational resource demands, and weak domain generalization [19]. Group 4: Teleoperation and Data Collection - The survey on teleoperation of humanoid robots discusses the integration of human cognition with robotic capabilities, particularly in hazardous environments, while addressing challenges such as high degrees of freedom and communication limitations [29][30]. - Key components of teleoperation systems include human state measurement, motion retargeting, and multimodal feedback mechanisms [30][33]. Group 5: Vision-Language-Action Models - The analysis of Vision-Language-Action (VLA) models covers their evolution from cross-modal learning architectures to the integration of visual language models and action planners [33][36]. - The article identifies core challenges in real-time control, multimodal action representation, and system scalability, while proposing future directions for adaptive AI and cross-entity generalization [36][41].
申万宏源银河通用投资项目突破融资新纪录
申万宏源证券上海北京西路营业部· 2025-07-09 02:45
Group 1 - The core viewpoint of the article highlights the successful financing of Beijing Galaxy General Robot Co., Ltd., which raised 1.1 billion RMB, setting records in the field of embodied large model robots [1] - The financing round was led by CATL and Puxuan Capital, attracting major domestic state-owned investment platforms, strategic and industrial investors, and internationally renowned investment institutions [1] - Since its establishment in May 2023, Galaxy General has accumulated over 2.4 billion RMB in financing, receiving high recognition from market-oriented investment institutions, industrial capital, research institution funds, and state-owned investment platforms [1] Group 2 - Galaxy General focuses on the research and innovation of embodied multimodal large model general robots [1] - The company launched the world's first humanoid robot smart pharmacy solution in March 2024, achieving full automation of drug inventory, replenishment, delivery, and packaging processes, with 100 store orders already received [1] - In the industrial sector, Galaxy General has collaborated with internationally renowned automotive companies to execute tasks such as sunroof glass handling and real-time anomaly processing, all based on visual guidance without relying on QR codes [1][2]