激光雷达(LiDAR)

Search documents
ICML'25 | 统一多模态3D全景分割:图像与LiDAR如何对齐和互补?
自动驾驶之心· 2025-07-16 11:11
Core Insights - The article discusses the innovative IAL (Image-Assists-LiDAR) framework that enhances multi-modal 3D panoptic segmentation by effectively combining LiDAR and camera data [2][3]. Technical Innovations - IAL introduces three core technological breakthroughs: 1. An end-to-end framework that directly outputs panoptic segmentation results without complex post-processing [7]. 2. A novel PieAug paradigm for modal synchronization enhancement, improving training efficiency and generalization [7]. 3. Precise feature fusion through Geometric-guided Token Fusion (GTF) and Prior-driven Query Generation (PQG), achieving accurate alignment and complementarity between LiDAR and image features [7]. Problem Identification and Solutions - Existing multi-modal segmentation methods often enhance only LiDAR data, leading to misalignment with camera images, which negatively impacts feature fusion [9]. - The "cake-cutting" strategy segments scenes into fan-shaped slices along angle and height axes, creating paired point clouds and multi-view image units [9]. - The PieAug strategy is compatible with existing LiDAR-only enhancement methods while achieving cross-modal alignment [9]. Feature Fusion Module - The GTF feature fusion module aggregates image features accurately through physical point projection, addressing significant positional biases in voxel-level projections [10]. - Traditional methods overlook the receptive field differences between sensors, limiting feature expression capabilities [10]. Query Initialization - The PQG query initialization employs a three-pronged query generation mechanism to improve recall rates for distant small objects [12]. - This mechanism includes geometric prior queries, texture prior queries, and no-prior queries to enhance detection of challenging samples [12]. Model Performance - IAL achieved state-of-the-art (SOTA) performance on nuScenes and SemanticKITTI datasets, surpassing previous methods by up to 5.1% in PQ [16]. - The model's performance metrics include a PQ of 82.0, RO of 91.6, and mIoU of 79.9, demonstrating significant improvements over competitors [14]. Visualization Results - IAL shows notable enhancements in distinguishing adjacent targets, detecting distant targets, and identifying false positives and negatives [17].
清华大学最新综述!具身AI中多传感器融合感知:背景、方法、挑战
具身智能之心· 2025-06-27 08:36
Core Insights - The article emphasizes the significance of embodied AI and multi-sensor fusion perception (MSFP) as a critical pathway to achieving general artificial intelligence (AGI) through real-time environmental perception and autonomous decision-making [3][4]. Group 1: Importance of Embodied AI and Multi-Sensor Fusion - Embodied AI represents a form of intelligence that operates through physical entities, enabling autonomous decision-making and action capabilities in dynamic environments, with applications in autonomous driving and robotic swarm intelligence [3]. - Multi-sensor fusion is essential for robust perception and accurate decision-making in embodied AI systems, integrating data from various sensors like cameras, LiDAR, and radar to achieve comprehensive environmental awareness [3][4]. Group 2: Limitations of Current Research - Existing AI-based MSFP methods have shown success in fields like autonomous driving but face inherent challenges in embodied AI applications, such as the heterogeneity of cross-modal data and temporal asynchrony between different sensors [4][7]. - Current reviews often focus on single tasks or research areas, limiting their applicability to researchers in related fields [7][8]. Group 3: Structure and Contributions of the Research - The article organizes MSFP research from various technical perspectives, covering different perception tasks, sensor data types, popular datasets, and evaluation standards [8]. - It reviews point-level, voxel-level, region-level, and multi-level fusion methods, focusing on collaborative perception among multiple embodied agents and infrastructure [8][21]. Group 4: Sensor Data and Datasets - Various sensor types are discussed, including camera data, LiDAR, and radar, each with unique advantages and challenges in environmental perception [10][12]. - The article presents several datasets used in MSFP research, such as KITTI, nuScenes, and Waymo Open, detailing their modalities, scenarios, and the number of frames [12][13][14]. Group 5: Perception Tasks - Key perception tasks include object detection, semantic segmentation, depth estimation, and occupancy prediction, each contributing to the overall understanding of the environment [16][17]. Group 6: Multi-Modal Fusion Methods - The article categorizes multi-modal fusion methods into point-level, voxel-level, region-level, and multi-level fusion, each with specific techniques to enhance perception robustness [21][22][23][24][28]. Group 7: Multi-Agent Fusion Methods - Collaborative perception techniques are highlighted as essential for integrating data from multiple agents and infrastructure, addressing challenges like occlusion and sensor failures [35][36]. Group 8: Time Series Fusion - Time series fusion is identified as a key component of MSFP systems, enhancing perception continuity across time and space through various query-based fusion methods [38][39]. Group 9: Multi-Modal Large Language Model (LLM) Fusion - The integration of multi-modal data with LLMs is explored, showcasing advancements in tasks like image description and cross-modal retrieval, with new datasets designed to enhance embodied AI capabilities [47][50].
清华大学最新综述!当下智能驾驶中多传感器融合如何发展?
自动驾驶之心· 2025-06-26 12:56
Group 1: Importance of Embodied AI and Multi-Sensor Fusion Perception - Embodied AI is a crucial direction in AI development, enabling autonomous decision-making and action through real-time perception in dynamic environments, with applications in autonomous driving and robotics [2][3] - Multi-sensor fusion perception (MSFP) is essential for robust perception and accurate decision-making in embodied AI, integrating data from various sensors like cameras, LiDAR, and radar to achieve comprehensive environmental awareness [2][3] Group 2: Limitations of Current Research - Existing AI-based MSFP methods have shown success in fields like autonomous driving but face inherent challenges in embodied AI, such as the heterogeneity of cross-modal data and temporal asynchrony between different sensors [3][4] - Current reviews on MSFP often focus on single tasks or research areas, limiting their applicability to researchers in related fields [4] Group 3: Overview of MSFP Research - The paper discusses the background of MSFP, including various perception tasks, sensor data types, popular datasets, and evaluation standards [5] - It reviews multi-modal fusion methods at different levels, including point-level, voxel-level, region-level, and multi-level fusion [5] Group 4: Sensor Data and Datasets - Various sensor data types are critical for perception tasks, including camera data, LiDAR data, and radar data, each with unique advantages and limitations [7][10] - The paper presents several datasets used in MSFP research, such as KITTI, nuScenes, and Waymo Open, detailing their characteristics and the types of data they provide [12][13][14] Group 5: Perception Tasks - Key perception tasks include object detection, semantic segmentation, depth estimation, and occupancy prediction, each contributing to the overall understanding of the environment [16][17] Group 6: Multi-Modal Fusion Methods - Multi-modal fusion methods are categorized into point-level, voxel-level, region-level, and multi-level fusion, each with specific techniques to enhance perception robustness [20][21][22][27] Group 7: Multi-Agent Fusion Methods - Collaborative perception techniques integrate data from multiple agents and infrastructure, addressing challenges like occlusion and sensor failures in complex environments [32][34] Group 8: Time Series Fusion - Time series fusion is a key component of MSFP systems, enhancing perception continuity across time and space, with methods categorized into dense, sparse, and hybrid queries [40][41] Group 9: Multi-Modal Large Language Model (MM-LLM) Fusion - MM-LLM fusion combines visual and textual data for complex tasks, with various methods designed to enhance the integration of perception, reasoning, and planning capabilities [53][54][57][59]
已秘密提交香港上市申请?山西80后天才级人物“闷声干大事”
Sou Hu Cai Jing· 2025-05-19 15:31
Core Viewpoint - Hesai Technology, a leading Chinese LiDAR manufacturer, has secretly submitted an application for a Hong Kong IPO, aiming to capitalize on its recent success and growing market demand for LiDAR technology [1][6][13] Company Overview - Founded by Li Yifan and his team, Hesai Technology specializes in LiDAR for autonomous driving, robotics, and industrial automation [2][4] - The company initially focused on laser gas measurement systems before pivoting to the LiDAR market in 2016, competing against established players like Velodyne [4][6] Recent Developments - Hesai Technology went public on NASDAQ in February 2023, raising $190 million and achieving a market valuation of approximately $2.4 billion (around 16 billion RMB) [6][10] - The company has secured significant contracts with major automotive manufacturers, including Baidu and BYD, and has established partnerships with 22 domestic and international automakers for 120 vehicle models [7][10] Financial Performance - For 2024, Hesai Technology projects revenue of 2.08 billion RMB, a year-on-year increase of 10.7%, and has achieved its first annual profit with a net profit of approximately 137 million RMB [10][11] - The company anticipates revenue growth to reach between 3 billion to 3.5 billion RMB by 2025, with a gross margin of around 40% [11] Market Outlook - The global automotive LiDAR market is expected to grow significantly, with a projected market size of $6.92 billion in 2024, driven by increasing demand for autonomous driving technologies [11][13] - Chinese brands are expected to capture 92% of the market share, with Hesai aiming for a long-term market share of over 40% domestically and nearly 50% internationally [13]