计算机视觉

Search documents
实验室10篇论文被ICCV 2025录用
自动驾驶之心· 2025-07-02 13:54
Core Insights - The article discusses the acceptance of 10 papers from a laboratory at the 20th ICCV International Conference on Computer Vision, highlighting advancements in 3D vision and related technologies [25]. Paper Summaries Paper 1: Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds - This paper addresses domain generalization in 3D scene segmentation, proposing a framework that couples geometric embedding with semantic learning to enhance model generalization [1]. Paper 2: Hierarchical Variational Test-Time Prompt Generation for Zero-Shot Generalization - The authors introduce a hierarchical variational method for dynamic prompt generation during inference, significantly improving the zero-shot generalization capabilities of visual language models [3]. Paper 3: Knowledge-Guided Part Segmentation - A new framework is proposed that utilizes structural knowledge to enhance the segmentation of fine-grained object parts, improving understanding of complex structures [5][6]. Paper 4: TopicGeo: An Efficient Unified Framework for Geolocation - TopicGeo presents a unified framework for geolocation that improves computational efficiency and accuracy by directly matching query images with reference images [9]. Paper 5: Vision-Language Interactive Relation Mining for Open-Vocabulary Scene Graph Generation - This paper explores a model that enhances the understanding of relationships in open-vocabulary scene graph generation through multimodal interaction learning [11]. Paper 6: VGMamba: Attribute-to-Location Clue Reasoning for Quantity-Agnostic 3D Visual Grounding - The authors propose a mechanism that combines attribute and spatial information to improve the accuracy of 3D visual grounding tasks [13]. Paper 7: Meta-Learning Dynamic Center Distance: Hard Sample Mining for Learning with Noisy Labels - A new metric called Dynamic Center Distance is introduced to enhance the learning process in the presence of noisy labels by focusing on hard samples [15]. Paper 8: Learning Separable Fine-Grained Representation via Dendrogram Construction from Coarse Labels for Fine-grained Visual Recognition - The paper presents a method for learning fine-grained representations from coarse labels without predefined category numbers, enhancing adaptability to dynamic semantic structures [17]. Paper 9: Category-Specific Selective Feature Enhancement for Long-Tailed Multi-Label Image Classification - This research addresses the issue of label imbalance in multi-label image classification by enhancing feature sensitivity for underrepresented categories [19]. Paper 10: Partially Matching Submap Helps: Uncertainty Modeling and Propagation for Text to Point Cloud Localization - The authors redefine the task of text to point cloud localization by allowing partial spatial matches, improving the model's ability to handle real-world ambiguities [21].
极智嘉 全栈技术筑壁垒掘金仓储自动化黄金赛道
Sou Hu Cai Jing· 2025-07-02 09:30
Company Overview - Beijing Geek+ Technology Co., Ltd. (referred to as "Geek+") is launching its IPO from today until July 4, 2025, with plans to list on the Hong Kong Stock Exchange on July 9, 2025 [2] - The company plans to issue 140,353,000 H-shares, raising approximately HKD 2.358 billion at an issue price of HKD 16.80 per share [2] - Geek+ has attracted four cornerstone investors, collectively subscribing USD 91.3 million (approximately HKD 716.7 million) [2] Technology and Innovation - Geek+ has developed a comprehensive technology stack covering hardware, software, and algorithms, creating a significant technological moat [3] - The company introduced laser-vision fusion SLAM technology, achieving an average positioning accuracy of less than ±10mm, leading the industry [4] - The Hyper+ core algorithm platform is one of the most advanced in the AMR market, optimizing resource allocation and maximizing cost efficiency [5] - Geek+ has created the world's first universal robot technology platform, Robot Matrix, enhancing R&D efficiency by over 30% [6][7] - The company has filed over 2,000 patents by 2024, with its PopPick solution leading globally in compatibility and throughput efficiency [8] Market Landscape - The global AMR market is projected to grow from CNY 38.7 billion in 2024 to CNY 162.1 billion by 2029, with a CAGR of 33.1% [10] - The penetration rate of AMR in warehouse automation is expected to rise from 4.4% in 2020 to 20.2% in 2029 [10] - Key growth drivers include the booming e-commerce sector, increasing demand for logistics automation, and the need for manufacturing efficiency [13] - AMR robots have diverse applications across various industries, including logistics, manufacturing, healthcare, and food service [14] Competitive Advantages - Geek+ has established a global service network and collaborates with partners like Bosch Rexroth and Mujin, creating a complete ecosystem from hardware to systems [18] - The company has received strategic investments from firms like Warburg Pincus, Ant Group, and Intel, with net proceeds of approximately HKD 2.206 billion allocated for R&D and market expansion [19] - Geek+ maintains a leading market share in the AMR sector, with a revenue increase from CNY 790 million in 2021 to CNY 2.41 billion in 2024, reflecting a CAGR of 45% [23] - The company has a customer repurchase rate of 74.6%, indicating strong client retention and satisfaction [24] Industry Outlook - The intelligent logistics automation industry is experiencing rapid growth, with favorable policies supporting technological innovation and application promotion [15] - Advances in AI, machine learning, computer vision, and IoT are enhancing AMR robot performance and functionality [16] - The global labor shortage and the decline of China's demographic dividend are driving the shift towards automation, with Geek+ solutions reducing labor needs by 65% [17]
重磅直播!清华&博世开源SOTA性能纯血VLA:Impromptu-VLA告别双系统~
自动驾驶之心· 2025-07-01 12:58
Core Viewpoint - The article discusses the advancements and challenges in autonomous driving systems, particularly in unstructured environments, and introduces the Impromptu VLA framework developed by Tsinghua AIR and Bosch Research Institute to address data gaps in these scenarios [1]. Group 1: Advancements in Autonomous Driving - Current autonomous driving systems have made significant progress in structured environments like cities and highways, but face challenges in unstructured scenarios such as rural roads and construction zones [1]. - Existing large-scale autonomous driving datasets primarily focus on conventional traffic conditions, leading to a lack of specialized, large-scale, and finely annotated data for complex unstructured environments [1]. Group 2: Impromptu VLA Framework - The Impromptu VLA framework aims to provide an open-weight and open-data driving vision-language-action model, which is a fully end-to-end system that extracts multimodal features directly from driving video segments [1]. - Impromptu VLA generates driving commands in natural language format without the need for manually designed perception modules or intermediate representations [1]. - In the NeuroNCAP closed-loop safety evaluation system, Impromptu VLA demonstrates strong decision robustness and generalization capabilities, significantly outperforming the latest BridgeAD system proposed at CVPR 2025 (2.15 vs. 1.60) [1].
暑假打打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛正式启动~
自动驾驶之心· 2025-06-30 12:51
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key supporting technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd., which also provides technical support [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power, for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - Spatial Intelligence requires building a 3D reconstruction model based on multi-view aerial images, while Embodied Intelligence involves completing tasks in dynamic occlusion scenarios [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on PSNR and F1-Score metrics [19][20]. - For Embodied Intelligence, evaluation focuses on task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Submission and Awards - Results must be submitted in a specified format, and top-ranking teams will have their results reproduced for evaluation [24]. - Awards for each track include cash prizes and computing vouchers, with a total of 12 awards distributed among the top teams [25].
ICCV 2025放榜!录取率24%,夏威夷门票你抢到了吗?
机器之心· 2025-06-26 06:10
机器之心报道 编辑:+0 ICCV 2025 将于 10 月 19 日至 25 日在美国夏威夷举行。刚刚,ICCV 官方向投稿者发送了今年论文 接收结果的通知。 数据显示,今年大会共收到了 11239 份有效投稿,所有投稿均已进入审稿流程。程序委员会推荐录用 2699 篇论文,最终录用率为 24%。 对比前几届数据,2025 年的投稿量几乎接近 2019 年的三倍,这反映了计算机视觉领域的快速扩张和 学术研究的日益活跃。 尽管投稿数量大幅增加,ICCV 的录用率在过去几年中保持了相对稳定,基本维持在 25% - 26% 的 区间内。 继 CVPR 2025 之后,ICCV 2025 会议也实施了一项旨在强化问责制与诚信的新政策。程序委员会主 席团识别出了 25 名极不负责任的审稿人,并因此对与他们相关的 29 篇论文进行了直接拒稿处理。 这些被拒的论文中,有 12 篇若无此关联本应被录用,但这也引发了争议。 ICCV 2023 投稿 8260 篇,录用 2160 篇,录用率约为 26.15%。 ICCV 2021 投稿 6152 篇,录用 1612 篇,录用率为 26.20%。 ICCV 2019 投稿 43 ...
MIT终身教授何恺明,入职谷歌了
量子位· 2025-06-26 02:11
Core Viewpoint - Kaiming He, a prominent figure in computer vision, has recently joined Google DeepMind as a part-time distinguished scientist after obtaining tenure at MIT, indicating a strategic collaboration between academia and industry in AI research [1][2][5][7]. Group 1: Kaiming He's Career and Achievements - Kaiming He is recognized as a legendary figure in the computer vision field, having received his undergraduate degree from Tsinghua University and his PhD from the Chinese University of Hong Kong under the supervision of Xiaodong Wu [9][10]. - He co-authored the award-winning paper "Single Image Haze Removal Using Dark Channel Prior" in 2009, marking a significant achievement for Asian researchers in the CVPR conference [10]. - After completing his PhD in 2011, he worked at Microsoft Research Asia and later joined Facebook AI Research (FAIR), where he developed the influential ResNet architecture, which has been cited over 280,000 times [11][12][15]. - His research contributions include notable works like Faster R-CNN and Mask R-CNN, the latter winning the best paper award at ICCV 2017 [15][18]. Group 2: Recent Developments and Collaborations - Kaiming He joined MIT's EECS department in 2023, marking a return to academia after a significant tenure in the industry, which garnered attention and speculation about Meta's loss [16][18]. - His recent research focuses on model performance optimization, including advancements in image generation techniques and the development of highly compressed tokenizers for text generation [20]. - He has collaborated with Google DeepMind on various projects, including the paper "Fractal Generative Models," which introduced a new paradigm for generating high-resolution images [22][23]. - The collaboration with DeepMind has been ongoing, with previous joint efforts addressing challenges in visual autoregressive models and proposing solutions for scaling these models [25][27].
单应计算加速数十倍、计算量减少95%!基于几何的SKS和ACA矩阵分解被提出
机器之心· 2025-06-19 03:50
Group 1 - The research team from Donghua University, Shanghai Jiao Tong University, and the Chinese Academy of Sciences has proposed two geometry-based homography decomposition methods that significantly reduce the computational load of solving homographies from four points by over 95% compared to conventional sparse linear equation methods [3][4]. - The paper titled "Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations" has been accepted by the IEEE T-PAMI journal [5][4]. - The proposed methods are expected to be applicable in various visual applications, including QR code scanning, projective geometry, computer vision, and graphics problems [3]. Group 2 - The traditional Direct Linear Transformation (DLT) method constructs a sparse linear equation system for homography solving, which typically requires around 2000 floating-point operations [7]. - Improved methods have been developed, reducing the computational load to approximately 1800 operations for SVD decomposition and 220 operations for a customized Gaussian elimination method [7]. - The new methods, SKS and ACA, achieve a significant reduction in floating-point operations, with ACA requiring only 29 operations for specific cases like square templates [18][22]. Group 3 - The SKS transformation decomposes the homography matrix into multiple sub-transformations, leveraging the hierarchical nature of geometric transformations [9][10]. - The ACA transformation similarly computes affine transformations from three corresponding points, resulting in an efficient homography matrix decomposition [15]. - The average time for a single four-point homography calculation using the ACA method is reported to be only 17 nanoseconds, achieving acceleration factors of 29 times and 43 times compared to previous methods [22]. Group 4 - The methods can be integrated into various visual processing applications, replacing traditional homography algorithms, particularly in QR code scanning, which is estimated to reach billions of scans daily in China [24]. - The research team is also exploring further applications in deep learning for estimating geometric parameters, P3P pose estimation based on planar homography, and N-dimensional homography matrix decomposition [25].
刚刚,CVPR 2025奖项出炉:牛津&Meta博士生王建元获最佳论文,谢赛宁摘年轻研究者奖
机器之心· 2025-06-13 15:45
Core Insights - The CVPR 2025 conference in Nashville, Tennessee, awarded five papers, including one best paper and four honorable mentions, along with one best student paper and one honorable mention for student papers [1][2]. Submission and Acceptance Statistics - This year, over 40,000 authors submitted 13,008 papers, marking a 13% increase from last year's 11,532 submissions. A total of 2,872 papers were accepted, resulting in an overall acceptance rate of approximately 22.1%. Among the accepted papers, 96 were oral presentations (3.3%) and 387 were highlighted (13.7%) [3][5]. Conference Attendance - The conference attracted over 9,000 attendees from more than 70 countries and regions [7]. Paper Acceptance by Field - The image and video generation field had the highest number of accepted papers, while the highest acceptance rates were seen in 3D based on multi-view and sensor data, as well as single-image 3D [8]. Best Paper Award - The best paper, titled "VGGT: Visual Geometry Grounded Transformer," was presented by researchers from the University of Oxford and Meta AI. It introduced a universal 3D vision model based on a pure feedforward Transformer architecture, capable of inferring core geometric information from one or more images [13][14]. Notable Research Contributions - The best paper demonstrated significant performance improvements over traditional optimization methods and existing state-of-the-art models in various 3D tasks, achieving inference speeds in seconds without requiring post-processing optimization [17]. Best Student Paper - The best student paper, "Neural Inverse Rendering from Propagating Light," proposed a physics-based multi-view dynamic light propagation neural inverse rendering system, achieving state-of-the-art 3D reconstruction under strong indirect lighting conditions [53][55]. Awards and Recognitions - Two Young Researcher Awards were given to Hao Su and Saining Xie for their outstanding contributions to computer vision research [68][72]. The Longuet-Higgins Award was presented to two papers that have significantly influenced the field, including the Inception architecture and fully convolutional networks for semantic segmentation [75][78][80].
微美全息上涨5.13%,报2.46美元/股,总市值2415.92万美元
Jin Rong Jie· 2025-06-11 13:50
Group 1 - The core viewpoint of the articles highlights WiMi Hologram Cloud's significant growth in net profit despite a decrease in total revenue, indicating a strong operational efficiency and potential for future growth [1][2]. - As of December 31, 2024, WiMi's total revenue is projected to be 542 million RMB, a year-on-year decrease of 7.42%, while the net profit attributable to the parent company is expected to reach 71.64 million RMB, reflecting a year-on-year increase of 117.01% [1]. - WiMi Hologram Cloud is positioned as a leading provider of holographic cloud technology solutions in China, focusing on various aspects of holographic AR technology, including AI synthesis, visual presentation, and advertising [1][2]. Group 2 - The company has made significant breakthroughs in various holographic application fields such as advertising, entertainment, education, and 5G communications, aiming to enhance the depth of research and market application in holographic 3D computer vision [2]. - WiMi's vision is to become a creator of the holographic ecosystem in China, emphasizing the development of a strong and open service platform that bridges the application of holographic technology and computer vision presentation [2].
速递|Buildots完成4500万美元D轮融资,用AI模型+计算机视觉破解建筑业“信息脱节”难题
Z Potentials· 2025-05-30 03:23
Core Viewpoint - Buildots aims to revolutionize the construction industry by utilizing artificial intelligence and computer vision technology to bridge the gap between management and on-site realities [3][4]. Group 1: Company Overview - Buildots is a Chicago-based startup founded in 2018 by Roy Danon, Aviv Leibovici, and Yakir Sudry, focusing on tracking construction progress through images captured by 360-degree cameras on hard hats [3]. - The company has raised a total of $166 million, with $45 million from a Series D funding round led by Qumra Capital [3]. Group 2: Technology and Innovation - The Buildots system not only monitors construction progress but also predicts potential delays and issues, allowing teams to make data-driven decisions rather than relying on fragmented information [4]. - The platform enables project status inquiries through an AI chatbot and provides alerts for possible risks, which can help avoid costly problems [4]. Group 3: Market Position and Competition - Buildots serves clients including Intel and around 50 construction companies, positioning itself as a significant player in the construction technology sector [4]. - Competitors in the market include BeamUp, which develops AI design platforms, and Versatile, which analyzes construction site data to present project progress [4]. Group 4: Future Plans - The recent funding will primarily be used to expand Buildots' product offerings to cover more stages of the construction lifecycle and to enhance its AI models using historical data [4][5]. - Buildots plans to focus on expanding its research and development team and growing its presence in North America [4].