视觉定位技术
Search documents
ICCV25!百度U-Vilar:视觉定位多任务SOTA,无痛兼容端到端框架~
自动驾驶之心· 2025-07-14 11:30
Core Insights - The article discusses the U-ViLAR framework developed by Baidu, which focuses on uncertainty-aware visual localization for autonomous driving, addressing the challenges posed by GNSS signal interference in urban environments [2][26]. Group 1: Importance of Visual Localization - In urban settings, GNSS signals can be unreliable due to obstructions like buildings and tunnels, making visual localization technology crucial [2]. - Traditional methods rely on feature matching between images and 3D maps, which are sensitive to changes in perspective and lighting, and are costly to construct on a large scale [2]. Group 2: U-ViLAR Framework - U-ViLAR effectively models perception and localization uncertainties separately, improving performance in both large-scale re-localization and fine localization tasks [2][26]. - The framework consists of two key modules: PU-Guided Association, which uses perception uncertainty to guide visual and map feature association, and LU-Guided Registration, which utilizes localization uncertainty for precise registration [4]. Group 3: Technical Implementation - The framework employs a shared backbone network (like ResNet) for feature extraction from multi-view images, projecting them into BEV (Bird's Eye View) space [6]. - It supports HD maps and navigation maps, extracting BEV features from map elements using a U-Net structure [7]. - Cross-modal fusion is achieved through alternating self-attention and cross-attention mechanisms to enhance visual and map BEV features [8]. Group 4: Experimental Results - U-ViLAR demonstrated superior performance in fine-grained localization tasks on the nuScenes and SRoad datasets, significantly reducing localization errors [20]. - In large-scale re-localization tasks, it outperformed existing methods on datasets like KITTI, nuScenes, and SRoad, showcasing robustness in both coarse and fine localization [20]. - The framework achieves a processing speed of 28 frames per second on NVIDIA V100 GPUs and 15 frames per second on optimized NVIDIA Orin platforms [20]. Group 5: Ablation Studies - Ablation studies confirmed the effectiveness of key components such as perception uncertainty-guided association and localization uncertainty-guided registration, indicating that removing any component would lead to performance degradation [21]. Group 6: Future Directions - Future work will focus on optimizing localization accuracy in challenging scenarios and enhancing the model's generalization capabilities to support various datasets and map types [26].
海康威视申请一种定位方法相关专利,提高对目标设备进行定位的准确性
Jin Rong Jie· 2025-06-28 09:05
Group 1 - The core viewpoint of the news is that Hangzhou Hikvision Digital Technology Co., Ltd. has applied for a patent related to a positioning method and device, which aims to enhance the accuracy of positioning technology [1] - The patent application, titled "A positioning method, device, electronic equipment, and storage medium," was published with the number CN120219481A and was filed on December 2023 [1] - The method involves obtaining a global map of the target scene, a target image captured by a camera, and the latest positioning state of the target device, which helps in generating a temporary map to improve positioning accuracy [1] Group 2 - Hangzhou Hikvision Digital Technology Co., Ltd. was established in 2001 and is located in Hangzhou, primarily engaged in the manufacturing of computers, communications, and other electronic devices [2] - The company has a registered capital of 923,319,832.6 RMB and has invested in 68 enterprises, participated in 5,000 bidding projects, and holds 5,000 patent information records [2] - Additionally, the company has 833 trademark information records and 571 administrative licenses [2]