计算机视觉
Search documents
暑假打打比赛!PRCV 2025空间智能与具身智能视觉感知挑战赛正式启动~
自动驾驶之心· 2025-06-30 12:51
Core Viewpoint - The competition aims to advance research in spatial intelligence and embodied intelligence, focusing on visual perception as a key supporting technology for applications in autonomous driving, smart cities, and robotics [2][4]. Group 1: Competition Purpose and Significance - Visual perception is crucial for achieving spatial and embodied intelligence, with significant applications in various fields [2]. - The competition seeks to promote high-efficiency and high-quality research in spatial and embodied intelligence technologies [4]. - It aims to explore innovations in cutting-edge methods such as reinforcement learning, computer vision, and graphics [4]. Group 2: Competition Organization - The competition is organized by a team of experts from institutions like Beijing University of Science and Technology, Tsinghua University, and the Chinese Academy of Sciences [5]. - The competition is sponsored by Beijing Jiuzhang Yunjing Technology Co., Ltd., which also provides technical support [5]. Group 3: Competition Data and Resources - Participants will have access to real and simulated datasets, including multi-view drone aerial images and specific simulation environments for tasks [11]. - The sponsor will provide free computing resources, including H800 GPU power, for validating and testing submitted algorithms [12][13]. Group 4: Task Settings - The competition consists of two tracks: Spatial Intelligence and Embodied Intelligence, each with specific tasks and evaluation methods [17]. - Spatial Intelligence requires building a 3D reconstruction model based on multi-view aerial images, while Embodied Intelligence involves completing tasks in dynamic occlusion scenarios [17]. Group 5: Evaluation Methods - Evaluation for Spatial Intelligence includes rendering quality and geometric accuracy, with scores based on PSNR and F1-Score metrics [19][20]. - For Embodied Intelligence, evaluation focuses on task completion and execution efficiency, with metrics such as success rate and average pose error [23][21]. Group 6: Submission and Awards - Results must be submitted in a specified format, and top-ranking teams will have their results reproduced for evaluation [24]. - Awards for each track include cash prizes and computing vouchers, with a total of 12 awards distributed among the top teams [25].
ICCV 2025放榜!录取率24%,夏威夷门票你抢到了吗?
机器之心· 2025-06-26 06:10
Core Insights - The ICCV 2025 conference will take place from October 19 to 25 in Hawaii, USA, with a significant increase in paper submissions reflecting the rapid expansion of the computer vision field [2][4]. - A total of 11,239 valid submissions were received this year, with 2,699 papers recommended for acceptance, resulting in an acceptance rate of 24% [3][5]. - The acceptance rate has remained relatively stable over the past few years, consistently hovering around 25% to 26% [5][8]. - The conference has implemented new policies to enhance accountability and integrity, identifying 25 irresponsible reviewers and rejecting 29 associated papers [6][7]. Submission Trends - The number of submissions for ICCV 2025 is nearly three times that of 2019, indicating a surge in academic activity within the computer vision domain [4]. - Previous years' submission data shows a steady increase: ICCV 2023 had 8,260 submissions with a 26.15% acceptance rate, ICCV 2021 had 6,152 submissions with a 26.20% acceptance rate, and ICCV 2019 had 4,323 submissions with a 25% acceptance rate [8]. Challenges in Peer Review - The rapid increase in submissions poses unprecedented challenges for the peer review process, with major AI conferences experiencing submission volumes exceeding 10,000 papers [35]. - Concerns regarding review quality and reviewer accountability have become more pronounced, leading to discussions about reforming the traditional one-way review system into a two-way feedback loop [38][39]. - A proposed solution includes a dual feedback system allowing authors to evaluate review quality while providing formal recognition to reviewers, aiming to create a sustainable and high-quality peer review system [38][40].
MIT终身教授何恺明,入职谷歌了
量子位· 2025-06-26 02:11
Core Viewpoint - Kaiming He, a prominent figure in computer vision, has recently joined Google DeepMind as a part-time distinguished scientist after obtaining tenure at MIT, indicating a strategic collaboration between academia and industry in AI research [1][2][5][7]. Group 1: Kaiming He's Career and Achievements - Kaiming He is recognized as a legendary figure in the computer vision field, having received his undergraduate degree from Tsinghua University and his PhD from the Chinese University of Hong Kong under the supervision of Xiaodong Wu [9][10]. - He co-authored the award-winning paper "Single Image Haze Removal Using Dark Channel Prior" in 2009, marking a significant achievement for Asian researchers in the CVPR conference [10]. - After completing his PhD in 2011, he worked at Microsoft Research Asia and later joined Facebook AI Research (FAIR), where he developed the influential ResNet architecture, which has been cited over 280,000 times [11][12][15]. - His research contributions include notable works like Faster R-CNN and Mask R-CNN, the latter winning the best paper award at ICCV 2017 [15][18]. Group 2: Recent Developments and Collaborations - Kaiming He joined MIT's EECS department in 2023, marking a return to academia after a significant tenure in the industry, which garnered attention and speculation about Meta's loss [16][18]. - His recent research focuses on model performance optimization, including advancements in image generation techniques and the development of highly compressed tokenizers for text generation [20]. - He has collaborated with Google DeepMind on various projects, including the paper "Fractal Generative Models," which introduced a new paradigm for generating high-resolution images [22][23]. - The collaboration with DeepMind has been ongoing, with previous joint efforts addressing challenges in visual autoregressive models and proposing solutions for scaling these models [25][27].
单应计算加速数十倍、计算量减少95%!基于几何的SKS和ACA矩阵分解被提出
机器之心· 2025-06-19 03:50
Group 1 - The research team from Donghua University, Shanghai Jiao Tong University, and the Chinese Academy of Sciences has proposed two geometry-based homography decomposition methods that significantly reduce the computational load of solving homographies from four points by over 95% compared to conventional sparse linear equation methods [3][4]. - The paper titled "Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations" has been accepted by the IEEE T-PAMI journal [5][4]. - The proposed methods are expected to be applicable in various visual applications, including QR code scanning, projective geometry, computer vision, and graphics problems [3]. Group 2 - The traditional Direct Linear Transformation (DLT) method constructs a sparse linear equation system for homography solving, which typically requires around 2000 floating-point operations [7]. - Improved methods have been developed, reducing the computational load to approximately 1800 operations for SVD decomposition and 220 operations for a customized Gaussian elimination method [7]. - The new methods, SKS and ACA, achieve a significant reduction in floating-point operations, with ACA requiring only 29 operations for specific cases like square templates [18][22]. Group 3 - The SKS transformation decomposes the homography matrix into multiple sub-transformations, leveraging the hierarchical nature of geometric transformations [9][10]. - The ACA transformation similarly computes affine transformations from three corresponding points, resulting in an efficient homography matrix decomposition [15]. - The average time for a single four-point homography calculation using the ACA method is reported to be only 17 nanoseconds, achieving acceleration factors of 29 times and 43 times compared to previous methods [22]. Group 4 - The methods can be integrated into various visual processing applications, replacing traditional homography algorithms, particularly in QR code scanning, which is estimated to reach billions of scans daily in China [24]. - The research team is also exploring further applications in deep learning for estimating geometric parameters, P3P pose estimation based on planar homography, and N-dimensional homography matrix decomposition [25].
刚刚,CVPR 2025奖项出炉:牛津&Meta博士生王建元获最佳论文,谢赛宁摘年轻研究者奖
机器之心· 2025-06-13 15:45
Core Insights - The CVPR 2025 conference in Nashville, Tennessee, awarded five papers, including one best paper and four honorable mentions, along with one best student paper and one honorable mention for student papers [1][2]. Submission and Acceptance Statistics - This year, over 40,000 authors submitted 13,008 papers, marking a 13% increase from last year's 11,532 submissions. A total of 2,872 papers were accepted, resulting in an overall acceptance rate of approximately 22.1%. Among the accepted papers, 96 were oral presentations (3.3%) and 387 were highlighted (13.7%) [3][5]. Conference Attendance - The conference attracted over 9,000 attendees from more than 70 countries and regions [7]. Paper Acceptance by Field - The image and video generation field had the highest number of accepted papers, while the highest acceptance rates were seen in 3D based on multi-view and sensor data, as well as single-image 3D [8]. Best Paper Award - The best paper, titled "VGGT: Visual Geometry Grounded Transformer," was presented by researchers from the University of Oxford and Meta AI. It introduced a universal 3D vision model based on a pure feedforward Transformer architecture, capable of inferring core geometric information from one or more images [13][14]. Notable Research Contributions - The best paper demonstrated significant performance improvements over traditional optimization methods and existing state-of-the-art models in various 3D tasks, achieving inference speeds in seconds without requiring post-processing optimization [17]. Best Student Paper - The best student paper, "Neural Inverse Rendering from Propagating Light," proposed a physics-based multi-view dynamic light propagation neural inverse rendering system, achieving state-of-the-art 3D reconstruction under strong indirect lighting conditions [53][55]. Awards and Recognitions - Two Young Researcher Awards were given to Hao Su and Saining Xie for their outstanding contributions to computer vision research [68][72]. The Longuet-Higgins Award was presented to two papers that have significantly influenced the field, including the Inception architecture and fully convolutional networks for semantic segmentation [75][78][80].
微美全息上涨5.13%,报2.46美元/股,总市值2415.92万美元
Jin Rong Jie· 2025-06-11 13:50
Group 1 - The core viewpoint of the articles highlights WiMi Hologram Cloud's significant growth in net profit despite a decrease in total revenue, indicating a strong operational efficiency and potential for future growth [1][2]. - As of December 31, 2024, WiMi's total revenue is projected to be 542 million RMB, a year-on-year decrease of 7.42%, while the net profit attributable to the parent company is expected to reach 71.64 million RMB, reflecting a year-on-year increase of 117.01% [1]. - WiMi Hologram Cloud is positioned as a leading provider of holographic cloud technology solutions in China, focusing on various aspects of holographic AR technology, including AI synthesis, visual presentation, and advertising [1][2]. Group 2 - The company has made significant breakthroughs in various holographic application fields such as advertising, entertainment, education, and 5G communications, aiming to enhance the depth of research and market application in holographic 3D computer vision [2]. - WiMi's vision is to become a creator of the holographic ecosystem in China, emphasizing the development of a strong and open service platform that bridges the application of holographic technology and computer vision presentation [2].
速递|Buildots完成4500万美元D轮融资,用AI模型+计算机视觉破解建筑业“信息脱节”难题
Z Potentials· 2025-05-30 03:23
Core Viewpoint - Buildots aims to revolutionize the construction industry by utilizing artificial intelligence and computer vision technology to bridge the gap between management and on-site realities [3][4]. Group 1: Company Overview - Buildots is a Chicago-based startup founded in 2018 by Roy Danon, Aviv Leibovici, and Yakir Sudry, focusing on tracking construction progress through images captured by 360-degree cameras on hard hats [3]. - The company has raised a total of $166 million, with $45 million from a Series D funding round led by Qumra Capital [3]. Group 2: Technology and Innovation - The Buildots system not only monitors construction progress but also predicts potential delays and issues, allowing teams to make data-driven decisions rather than relying on fragmented information [4]. - The platform enables project status inquiries through an AI chatbot and provides alerts for possible risks, which can help avoid costly problems [4]. Group 3: Market Position and Competition - Buildots serves clients including Intel and around 50 construction companies, positioning itself as a significant player in the construction technology sector [4]. - Competitors in the market include BeamUp, which develops AI design platforms, and Versatile, which analyzes construction site data to present project progress [4]. Group 4: Future Plans - The recent funding will primarily be used to expand Buildots' product offerings to cover more stages of the construction lifecycle and to enhance its AI models using historical data [4][5]. - Buildots plans to focus on expanding its research and development team and growing its presence in North America [4].
4万多名作者挤破头,CVPR 2025官方揭秘三大爆款主题, 你卷对方向了吗?
机器之心· 2025-05-28 03:02
Core Insights - The article discusses the latest trends in the field of computer vision, highlighting three major research directions that are gaining traction as of 2025 [3][4]. Group 1: Major Research Directions - The three prominent areas identified are: 1. Multi-view and sensor 3D technology, which has evolved from 2D rendering to more complex 3D evaluations, significantly influenced by the introduction of NeRF in 2020 [5]. 2. Image and video synthesis, which has become a focal point for presenting environmental information more accurately, reflecting advancements in the ability to analyze and generate multimedia content [6]. 3. Multimodal learning, which integrates visual, linguistic, and reasoning capabilities, indicating a trend towards more interactive and comprehensive AI systems [7][8]. Group 2: Conference Insights - The CVPR 2025 conference has seen a 13% increase in paper submissions, with a total of 13,008 submissions and an acceptance rate of 22.1%, indicating a highly competitive environment [3]. - The conference emphasizes the importance of diverse voices in the research community, ensuring that every paper, regardless of the author's affiliation, is given equal consideration [8].
长春光机所光电突触器件研究取得新进展
Huan Qiu Wang Zi Xun· 2025-05-10 09:18
Core Insights - The research team at Changchun Institute of Optics, Fine Mechanics and Physics has developed a novel ultraviolet optoelectronic synaptic device and an optoelectronic synaptic transistor, providing new technological pathways for advanced artificial vision systems and neuromorphic computing vision [1][2] Group 1: Ultraviolet Optoelectronic Synaptic Device - The ultraviolet optoelectronic synaptic device utilizes the ferroelectric polarization characteristics of AlScN and the excellent optoelectronic properties of GaN, successfully constructed based on the trapping and detrapping mechanism of holes at the heterojunction [1] - This device exhibits outstanding non-volatile storage characteristics and can simulate synaptic functions in biological visual systems, enabling multi-state modulation such as long-term potentiation (LTP), paired-pulse facilitation (PPF), and learning-forgetting-relearning processes [1][2] Group 2: Optoelectronic Synaptic Transistor - The optoelectronic synaptic transistor achieves wide-spectrum high optoelectronic conversion efficiency and long data retention capability from ultraviolet to near-infrared through a gas adsorption-assisted persistent photoconductivity strategy [1][2] - It demonstrates excellent photodetection performance across a broad spectrum from 375nm to 1310nm, with a paired-pulse facilitation index reaching 158%, significantly enhancing the precision and efficiency of neural networks in processing visual information [2] Group 3: Implications for Computing and Vision Systems - The development of these devices provides efficient and biomimetic solutions for hardware implementation of multispectral neuromorphic vision systems, simulating the perception and recognition functions of human retinal cells for multispectral signals [2] - Neuromorphic vision systems, which mimic the structure of neurons and synapses in the human brain, can process multiple streams of information simultaneously, significantly reducing power consumption and improving data processing speed compared to traditional serial processing visual systems [2]
“计算机视觉被GPT-4o终结了”(狗头)
量子位· 2025-03-29 07:46
Core Viewpoint - The article discusses the advancements in computer vision (CV) and image generation capabilities brought by the new GPT-4o model, highlighting its potential to disrupt existing tools and methodologies in the field [1][2]. Group 1: Technological Advancements - GPT-4o introduces native multimodal image generation, expanding the functionalities of AI tools beyond traditional applications [2][12]. - The image generation process in GPT-4o is based on a self-regressive model, differing from the diffusion model used in DALL·E, which allows for better adherence to instructions and enhanced image editing capabilities [15][19]. - Observations suggest that the image generation may involve a multi-scale self-regressive combination, where a rough image is generated first, followed by detail filling while the rough shape evolves [17][19]. Group 2: Industry Impact - The advancements in GPT-4o's capabilities have raised concerns among designers and computer vision researchers, indicating a significant shift in the competitive landscape of AI tools [6][10]. - OpenAI's approach of scaling foundational models to achieve these capabilities has surprised many in the industry, suggesting a new trend in AI development [12][19]. - The potential for GPT-4o to enhance applications in autonomous driving has been noted, with implications for future developments in this sector [10]. Group 3: Community Engagement - The article encourages community members to share their experiences and innovative uses of GPT-4o, fostering a collaborative environment for exploring AI applications [26].