多模态感知

Search documents
北京将显著提升文物研究阐释与智慧化展示水平
Xin Hua She· 2025-08-24 14:14
新华社北京8月24日电 记者24日从北京市文物局获悉,为加大科技创新工作力度,推动文物保护传承利 用提质增效,北京市文物局近日印发的《北京市文物科技创新发展规划(2025—2035年)》提出,到 2035年,显著提升文物研究阐释与智慧化展示水平。通过文物与科技深度融合,赋能北京"博物馆之 城"建设。 据介绍,北京文物工作在科研体系建设和科技创新方面尚存在一些短板弱项,主要表现在科技创新能力 不足、科技应用水平不高、基础设施不完善等方面,包括在不可移动文物保护实践中,与文物保护传统 理念相结合的技术手段创新性不足;博物馆依托5G、大数据、人工智能等新技术,推进文化体验、搭 建应用场景、促进文化消费等方面供给不足;文博单位高层次科技人才、行业领军人才缺乏等。 根据规划,北京将持续推进人工智能、大数据、虚拟现实、多模态感知等前沿技术在文博领域的创新应 用。充分利用第四次全国文物普查成果,汇聚历次普查数据,形成不可移动文物数据基座。加大不可移 动文物三维数据采集力度,构建古建筑数字资源库,研发适用于木结构古建筑保护修复、保养维护的新 材料新工艺。 规划提出,强化馆藏文物科技保护,重点加强丝织品、书画等脆弱易损材质的保 ...
Humanoid Occupancy:首个多模态人形机器人感知系统!解决运动学干扰和遮挡问题
具身智能之心· 2025-08-21 00:03
Core Viewpoint - The article discusses the rapid development of humanoid robot technology, emphasizing the introduction of a generalized multimodal occupancy perception system called Humanoid Occupancy, which enhances environmental understanding for humanoid robots [2][3][6]. Group 1: Humanoid Robot Technology - Humanoid robots are considered the most complex form of robots, embodying aspirations for advanced robotics and artificial intelligence [6]. - The technology is at a critical breakthrough stage, with ongoing iterations in motion control and autonomous perception [6]. Group 2: Humanoid Occupancy System - The Humanoid Occupancy system integrates hardware, software components, data collection devices, and a specialized labeling process to provide comprehensive environmental understanding [3]. - It utilizes advanced multimodal fusion technology to generate grid-based occupancy outputs that encode spatial occupancy states and semantic labels [3]. - The system addresses unique challenges such as kinematic interference and occlusion, establishing effective sensor layout strategies [3]. Group 3: Research and Development - A panoramic occupancy dataset specifically designed for humanoid robots has been developed, providing valuable benchmarks and resources for future research [3]. - The network architecture combines multimodal features and temporal information to ensure robust perception capabilities [3]. Group 4: Live Broadcast and Expert Insights - A live broadcast is scheduled to discuss humanoid robot motion control, multimodal perception systems, autonomous movement, and operational data [6][8]. - The session will feature insights from Zhang Qiang, the academic committee director at the Beijing Humanoid Robot Innovation Center [8].
自动驾驶之心项目与论文辅导来了~
自动驾驶之心· 2025-08-07 12:00
Core Viewpoint - The article announces the launch of the "Heart of Autonomous Driving" project and paper guidance, aimed at assisting students facing challenges in their research and development efforts in the field of autonomous driving [1]. Group 1: Project and Guidance Overview - The project aims to provide support for students who encounter difficulties in their research, such as environmental configuration issues and debugging challenges [1]. - Last year's outcomes were positive, with several students successfully publishing papers in top conferences like CVPR and ICRA [1]. Group 2: Guidance Directions - **Direction 1**: Focus on multi-modal perception and computer vision, end-to-end autonomous driving, large models, and BEV perception. The guiding teacher has published over 30 papers in top AI conferences with a citation count exceeding 6000 [3]. - **Direction 2**: Emphasis on 3D Object Detection, Semantic Segmentation, Occupancy Prediction, and multi-task learning based on images or point clouds. The guiding teacher is a top-tier PhD with multiple publications in ECCV and CVPR [5]. - **Direction 3**: Concentration on end-to-end autonomous driving, OCC, BEV, and world model directions. The guiding teacher is also a top-tier PhD with contributions to several mainstream perception solutions [6]. - **Direction 4**: Focus on NeRF / 3D GS neural rendering and 3D reconstruction. The guiding teacher has published four CCF-A class papers, including two in CVPR and two in IEEE Transactions [7].
辅助驾驶的AI进化论 - 站在能力代际跃升的历史转折点
2025-08-05 03:15
Summary of Key Points from the Conference Call Industry Overview - The autonomous driving industry is at a pivotal point transitioning from L2 to L3 commercialization, with full-stack self-research manufacturers and third-party suppliers gaining a competitive edge [1][4] - Major players in the autonomous driving sector include Tesla, Xpeng, Li Auto, NIO, and third-party suppliers like Momenta and Yunrong Qixing [1][5] Core Insights and Arguments - The development of cloud-based intelligent computing centers and mass production of high-performance chips are crucial drivers for the industry [1] - Companies are investing heavily in R&D, with Tesla's HW5.0 featuring 4D millimeter-wave radar and Li Auto's L series equipped with laser radar [6][10] - Regulatory policies significantly impact the industry, with L2 standardization and multiple regions opening L4 commercialization pilot projects [8] Technological Developments - Xpeng is shifting to a pure vision solution to enhance visual perception and reduce hardware costs, while Huawei's ADS 4.0 supports high-speed L3 commercialization [3][12] - The VLA model integrates visual, language, and behavioral modules to optimize vehicle decision-making [3] - The industry is witnessing a shift towards data-driven development, with companies showcasing their cloud-based world models and parameter scales [29] Competitive Landscape - Leading companies in autonomous driving include Tesla, Xpeng, Li Auto, NIO, and Xiaomi, with significant contributions from domestic suppliers like SUTENG, Hesai Technology, and others [5][26] - Traditional manufacturers are increasingly opting for third-party solutions to shorten product cycles and reduce time costs [17] R&D and Investment Trends - Companies like NIO have invested over 10 billion yuan in R&D for three consecutive years, but face challenges in achieving commercial breakthroughs [14] - Xiaomi's growth in the autonomous driving sector is driven by its potential rather than current capabilities, with expectations for its models to feature laser radar [16] Consumer Perception and Market Trends - The development of intelligent driving technology includes advancements in features like high-speed NOA and parking functionalities [32] - Safety features are evolving, with the introduction of proactive avoidance systems to enhance driving experience [33] Investment Opportunities - Investors should focus on leading autonomous driving solution providers and full-stack self-research manufacturers, especially as regulatory frameworks evolve [36]
中国电子学会:中国人形机器人整体水平处全球第一方阵
Xin Lang Cai Jing· 2025-08-02 13:55
Core Insights - The core viewpoint of the article highlights significant advancements in humanoid robotics in China, positioning the country among the global leaders in this field [1] Technological Innovation - Major breakthroughs have been achieved in core technologies such as large robot models, intelligent collaborative control, human-machine interaction, and multimodal perception [1] - The motion capabilities of domestic robots have been significantly enhanced, with AI control algorithms optimized to achieve millisecond-level action response, improving stability, flexibility, and coherence [1] Industry Development - The sales volume of industrial robots in China has increased from 70,000 units in 2015 to 302,000 units in 2024, maintaining its status as the largest industrial robot market globally for 12 consecutive years [1] - China is the world's largest robot producer, with industrial robot output rising from 33,000 units in 2015 to 556,000 units in 2024 [1]
从技术秀到真突破:解码WAIC 2025的核心价值
3 6 Ke· 2025-08-01 03:49
Core Insights - The World Artificial Intelligence Conference (WAIC) 2025 showcases the transition of AI from laboratory experiments to practical applications in various industries and daily life, emphasizing its potential to change societal dynamics rather than just demonstrating capabilities [1][3][21] - The event highlights the importance of understanding how these technologies can integrate into everyday life, serving as a driving force for progress [3][19] Technological Breakthroughs - AI technologies are evolving from simple mechanical responses to more complex interactions, with robots now capable of understanding human emotions and actions, as demonstrated by the GR-3 humanoid robot designed for companionship and care [4][7] - The introduction of advanced AI systems, such as Baidu's NOVA digital human technology, allows for rapid cloning and collaborative content creation, breaking traditional boundaries in content production [6][10] Industry Empowerment - AI is moving beyond experimental stages to become integral in sectors like entertainment, education, and healthcare, enhancing user experiences and creating new business models [10][11] - In the entertainment industry, AI-driven virtual characters are revolutionizing content creation, significantly reducing production costs and time [11][13] - The education sector is witnessing a shift where AI acts as a personalized learning partner, adapting to student needs and enhancing engagement through interactive methods [14][17] - In healthcare, AI innovations are optimizing drug development and improving diagnostic processes, showcasing a transformative impact on medical services [16][19] Emotional AI and Market Growth - The emotional computing and human-like interaction market is projected to grow at an annual rate of 35%, with significant potential in healthcare, education, and customer service sectors [17] - The integration of emotional AI into daily life is expected to redefine human-machine interactions, making AI a more relatable and supportive presence [9][19] Social Impact and Future Directions - The AI Empowerment for Sustainable Development Initiative emphasizes the role of AI in addressing global challenges such as green transformation and equitable healthcare and education [19][22] - The advancements in AI are not just about efficiency but also about fostering social equity and enhancing the quality of life, positioning AI as a true collaborator in human civilization [21][22]
驾驭工业场景挑战!灵心巧手即将推出全新“工业大师”灵巧手
机器人大讲堂· 2025-07-11 10:35
Core Viewpoint - The article highlights the imminent mass production of humanoid robots in 2025 and the significant advancements in dexterous hands, particularly the Linker Hand series by Lingxin Qiaoshou, which is set to revolutionize the industrial dexterous hand market with high degrees of freedom and precision [1][2]. Group 1: Product Development - The Linker Hand series, including models L10, L20, and L30, features over 20 degrees of freedom, showcasing excellent precision and performance, enabling the completion of complex tasks in various industrial settings [1]. - Lingxin Qiaoshou is set to launch two new high-performance dexterous hands, Linker Hand L6 and L20 industrial versions, designed specifically for industrial applications, with L6 having 6 active degrees of freedom and L20 having 17 [2]. Group 2: Technological Advancements - The new "super strong electric cylinder" drive module in the industrial dexterous hands achieves a drive efficiency of over 90%, which is more than double that of traditional products, with a thrust capacity of 200N and fingertip force of 20N, meeting high load requirements in industrial environments [4]. - The super strong electric cylinder has a lifespan exceeding one million cycles, which is 2-3 times that of competitors, ensuring efficient operation in high-frequency repetitive tasks [4]. Group 3: Material and Reliability - Lingxin Qiaoshou employs innovative smart materials that are lightweight, strong, and durable, achieving industrial-grade quality to withstand the rigors of production environments [5]. Group 4: Market Potential and Future Trends - The industrial environment's characteristics, such as clear physical boundaries and standardized workflows, make it suitable for the application of dexterous hands, which can adapt to various disturbances through multi-modal sensing capabilities [7]. - The value of dexterous hands is evolving from merely mimicking human fingers to becoming decision-making execution terminals in flexible manufacturing, reflecting a shift in industrial evolution towards "flexibility as competitiveness" [7].
【重磅深度】灵巧手持续迭代,关注技术路线收敛中的边际增量
东吴汽车黄细里团队· 2025-06-27 15:44
Core Viewpoint - The dexterous hand market is expected to grow significantly, reaching $1.706 billion in 2024 and projected to increase to $1.921 billion in 2025 and $3.036 billion by 2030, driven by the demand for humanoid robots that require more advanced dexterous hands with higher degrees of freedom [2][11]. Market Overview - The dexterous hand market is anticipated to reach 760,100 units in 2024, with projections of 861,800 units in 2025 and 1,412,100 units by 2030, reflecting a compound annual growth rate (CAGR) of 10.38% and 9.59% respectively [28][29]. Driving Solutions - The mainstream driving solutions include underactuated, external/mixed, and electric drives, with a shift from hollow cup motors to brushless gear motors. Underactuated designs sacrifice precision for cost reduction and faster deployment, while electric drives are favored for their modular design and high precision [3][11][45]. - Tesla's third-generation dexterous hand has replaced some hollow cup motors with brushless gear motors, indicating a potential shift in motor solutions [3][11]. Transmission Solutions - Transmission solutions encompass gear/worm gear, linkages, screws, and tendon-driven systems, each with its advantages and disadvantages. The tendon + screw composite transmission can enhance transmission precision while maintaining flexibility, exemplified by Tesla's third-generation dexterous hand [4][5][51]. Perception Solutions - Multi-modal perception is a defined trend, with force/torque sensors evolving towards strain gauge types and flexible sensors focusing on enhancing sensitivity and stability. MEMS pressure sensors, particularly resistive types, are becoming more prevalent in dexterous hand applications [6][66][74]. Industry Trends - Both domestic and international products are increasingly pursuing high degrees of freedom and multi-modal perception, highlighting the industry's development trends. Investment recommendations include companies involved in reducers and screw chains, such as Fuda Co., Zhejiang Rongtai, and Wuzhou Xinchun [8][11]. Future Outlook - The iteration of Tesla's dexterous hand clearly indicates a mainstream shift towards tendon-driven systems, achieving a doubling of degrees of freedom, transmission upgrades, drive switching, and breakthroughs in multi-modal perception [7][11].
人形机器人行业深度报告:灵巧手持续迭代,关注技术路线收敛中的边际增量
Soochow Securities· 2025-06-27 07:32
Investment Rating - The report recommends "Buy" for companies involved in the reduction gear and screw chain sectors, specifically highlighting 福达股份 (Fuda Co.), and suggests attention to micro screw chain companies like 浙江荣泰 (Zhejiang Rongtai), 五洲新春 (Wuzhou Xinchun), and 震裕科技 (Zhenyu Technology) [90][92]. Core Insights - The downstream scenarios are driving the evolution of dexterous hands towards humanoid hands, with a broad market outlook. The dexterous hand market is expected to reach USD 1.706 billion in 2024, growing to USD 1.921 billion in 2025 and USD 3.036 billion by 2030 [2][20]. - The report identifies the main driving solutions as underactuated, external/mixed, and electric drives, with a shift from hollow cup motors to brushless gear motors [2][35]. - The transmission solutions include gear/worm gear, linkages, screws, and tendon-driven systems, each with its advantages and disadvantages, with a trend towards tendon and screw combinations for improved flexibility and precision [2][39][49]. - Multi-modal perception is established as a trend, with advancements in force/torque sensors, flexible sensors, and MEMS pressure sensors [2][59][65]. Summary by Sections 1. Dexterous Hands: The Interface Between Humanoid Robots and the External World - Dexterous hands are a type of end effector that replaces traditional tools with claws, evolving from two-fingered to five-fingered humanoid designs to meet complex application requirements [11][12]. 2. Diverse Dexterous Hand Solutions, Routes Still Unconsolidated - Dexterous hands can be categorized by degrees of freedom, drive structure, and sensing technology, with underactuated designs being more prevalent due to lower costs and broader applications [17][24][30]. 3. Future Trends from Tesla's Dexterous Hand Iteration - Tesla's third-generation dexterous hand has doubled its degrees of freedom to 22, with significant changes in motor and transmission solutions, indicating a trend towards higher flexibility and precision [84][87].
同济大学最新!多模态感知具身导航全面综述
具身智能之心· 2025-06-25 13:52
Core Insights - The article presents a comprehensive analysis of multimodal navigation methods, emphasizing the integration of various sensory modalities such as visual, audio, and language processing to enhance navigation capabilities [4][32]. Group 1: Research Background - Goal-oriented navigation is a fundamental challenge in autonomous systems, requiring agents to navigate complex environments to reach specified targets. Over the past decade, navigation technology has evolved from simple geometric path planning to complex multimodal reasoning [7][8]. - The article categorizes goal-oriented navigation methods based on reasoning domains, revealing commonalities and differences among various tasks, thus providing a unified framework for understanding navigation methods [4]. Group 2: Navigation Tasks - Navigation tasks have increased in complexity, evolving from simple point navigation (PointNav) to more complex multimodal paradigms such as ObjectNav, ImageNav, and AudioGoalNav, each requiring different levels of semantic understanding and reasoning [8][12]. - The formal definition of navigation tasks is framed as a decision-making process where agents must reach specified goals in unknown environments through a series of actions [8]. Group 3: Datasets and Evaluation - The Habitat-Matterport 3D (HM3D) dataset is highlighted as the largest collection, encompassing 1,000 reconstructed buildings and covering 112.5k square meters of navigable area, with varying complexities across other datasets like Gibson and Matterport3D [9]. - Evaluation metrics for navigation tasks include success rate (SR), path length weighted success rate (SPL), and distance-related metrics, which assess the efficiency and effectiveness of navigation strategies [14]. Group 4: Methodologies - Explicit representation methods, such as ANM and LSP-UNet, construct and maintain environmental representations to support path planning, while implicit representation methods, like DD-PPO and IMN-RPG, encode spatial understanding without explicit mapping [15][16]. - Object navigation tasks are modularly approached, breaking down the task into mapping, strategy, and path planning, with methods like Sem-EXP and PEANUT focusing on semantic understanding [17]. Group 5: Challenges and Future Work - Current challenges in multimodal navigation include the effective integration of sensory modalities, the transfer from simulation to real-world applications, and the development of robust multimodal representation learning methods [31][32]. - Future work is suggested to focus on enhancing human-robot interaction, developing balanced multimodal representation learning methods, and addressing the computational efficiency of navigation systems [32].