视觉目标检测
Search documents
何恺明NeurIPS 2025演讲盘点:视觉目标检测三十年
机器之心· 2025-12-11 10:00
Core Insights - The article highlights the significance of the "Test of Time Award" received by the paper "Faster R-CNN," co-authored by renowned researchers, marking its impact on the field of computer vision since its publication in 2015 [1][5][25] - The presentation by He Kaiming at NeurIPS 2025 summarizes the evolution of visual object detection over the past 30 years, showcasing key milestones and influential works that have shaped the field [6][31] Historical Development - The early attempts at face detection in the 1990s relied on handcrafted features and statistical methods, which were limited in adaptability and speed [12] - The introduction of AlexNet in 2012 demonstrated the superior feature extraction capabilities of deep learning, paving the way for its application in object detection [15] - The R-CNN model, proposed in 2014, revolutionized object detection by integrating CNNs for feature extraction and classification, although it initially faced computational challenges [17][18] Technological Advancements - The development of Faster R-CNN in 2015 addressed the speed bottleneck by introducing the Region Proposal Network (RPN), allowing for end-to-end real-time detection [25] - Subsequent innovations, such as YOLO and SSD in 2016, further enhanced detection speed by enabling direct output of object locations and categories [32] - The introduction of Mask R-CNN in 2017 added instance segmentation capabilities, while DETR in 2020 redefined detection using Transformer architecture [32][34] Future Directions - The article concludes with reflections on the ongoing exploration in computer vision, emphasizing the need for innovative models to replace outdated components as bottlenecks arise [35][36]