计算机视觉
Search documents
MIT终身教授何恺明,入职谷歌了
量子位· 2025-06-26 02:11
Core Viewpoint - Kaiming He, a prominent figure in computer vision, has recently joined Google DeepMind as a part-time distinguished scientist after obtaining tenure at MIT, indicating a strategic collaboration between academia and industry in AI research [1][2][5][7]. Group 1: Kaiming He's Career and Achievements - Kaiming He is recognized as a legendary figure in the computer vision field, having received his undergraduate degree from Tsinghua University and his PhD from the Chinese University of Hong Kong under the supervision of Xiaodong Wu [9][10]. - He co-authored the award-winning paper "Single Image Haze Removal Using Dark Channel Prior" in 2009, marking a significant achievement for Asian researchers in the CVPR conference [10]. - After completing his PhD in 2011, he worked at Microsoft Research Asia and later joined Facebook AI Research (FAIR), where he developed the influential ResNet architecture, which has been cited over 280,000 times [11][12][15]. - His research contributions include notable works like Faster R-CNN and Mask R-CNN, the latter winning the best paper award at ICCV 2017 [15][18]. Group 2: Recent Developments and Collaborations - Kaiming He joined MIT's EECS department in 2023, marking a return to academia after a significant tenure in the industry, which garnered attention and speculation about Meta's loss [16][18]. - His recent research focuses on model performance optimization, including advancements in image generation techniques and the development of highly compressed tokenizers for text generation [20]. - He has collaborated with Google DeepMind on various projects, including the paper "Fractal Generative Models," which introduced a new paradigm for generating high-resolution images [22][23]. - The collaboration with DeepMind has been ongoing, with previous joint efforts addressing challenges in visual autoregressive models and proposing solutions for scaling these models [25][27].
单应计算加速数十倍、计算量减少95%!基于几何的SKS和ACA矩阵分解被提出
机器之心· 2025-06-19 03:50
Group 1 - The research team from Donghua University, Shanghai Jiao Tong University, and the Chinese Academy of Sciences has proposed two geometry-based homography decomposition methods that significantly reduce the computational load of solving homographies from four points by over 95% compared to conventional sparse linear equation methods [3][4]. - The paper titled "Fast and Interpretable 2D Homography Decomposition: Similarity-Kernel-Similarity and Affine-Core-Affine Transformations" has been accepted by the IEEE T-PAMI journal [5][4]. - The proposed methods are expected to be applicable in various visual applications, including QR code scanning, projective geometry, computer vision, and graphics problems [3]. Group 2 - The traditional Direct Linear Transformation (DLT) method constructs a sparse linear equation system for homography solving, which typically requires around 2000 floating-point operations [7]. - Improved methods have been developed, reducing the computational load to approximately 1800 operations for SVD decomposition and 220 operations for a customized Gaussian elimination method [7]. - The new methods, SKS and ACA, achieve a significant reduction in floating-point operations, with ACA requiring only 29 operations for specific cases like square templates [18][22]. Group 3 - The SKS transformation decomposes the homography matrix into multiple sub-transformations, leveraging the hierarchical nature of geometric transformations [9][10]. - The ACA transformation similarly computes affine transformations from three corresponding points, resulting in an efficient homography matrix decomposition [15]. - The average time for a single four-point homography calculation using the ACA method is reported to be only 17 nanoseconds, achieving acceleration factors of 29 times and 43 times compared to previous methods [22]. Group 4 - The methods can be integrated into various visual processing applications, replacing traditional homography algorithms, particularly in QR code scanning, which is estimated to reach billions of scans daily in China [24]. - The research team is also exploring further applications in deep learning for estimating geometric parameters, P3P pose estimation based on planar homography, and N-dimensional homography matrix decomposition [25].
刚刚,CVPR 2025奖项出炉:牛津&Meta博士生王建元获最佳论文,谢赛宁摘年轻研究者奖
机器之心· 2025-06-13 15:45
Core Insights - The CVPR 2025 conference in Nashville, Tennessee, awarded five papers, including one best paper and four honorable mentions, along with one best student paper and one honorable mention for student papers [1][2]. Submission and Acceptance Statistics - This year, over 40,000 authors submitted 13,008 papers, marking a 13% increase from last year's 11,532 submissions. A total of 2,872 papers were accepted, resulting in an overall acceptance rate of approximately 22.1%. Among the accepted papers, 96 were oral presentations (3.3%) and 387 were highlighted (13.7%) [3][5]. Conference Attendance - The conference attracted over 9,000 attendees from more than 70 countries and regions [7]. Paper Acceptance by Field - The image and video generation field had the highest number of accepted papers, while the highest acceptance rates were seen in 3D based on multi-view and sensor data, as well as single-image 3D [8]. Best Paper Award - The best paper, titled "VGGT: Visual Geometry Grounded Transformer," was presented by researchers from the University of Oxford and Meta AI. It introduced a universal 3D vision model based on a pure feedforward Transformer architecture, capable of inferring core geometric information from one or more images [13][14]. Notable Research Contributions - The best paper demonstrated significant performance improvements over traditional optimization methods and existing state-of-the-art models in various 3D tasks, achieving inference speeds in seconds without requiring post-processing optimization [17]. Best Student Paper - The best student paper, "Neural Inverse Rendering from Propagating Light," proposed a physics-based multi-view dynamic light propagation neural inverse rendering system, achieving state-of-the-art 3D reconstruction under strong indirect lighting conditions [53][55]. Awards and Recognitions - Two Young Researcher Awards were given to Hao Su and Saining Xie for their outstanding contributions to computer vision research [68][72]. The Longuet-Higgins Award was presented to two papers that have significantly influenced the field, including the Inception architecture and fully convolutional networks for semantic segmentation [75][78][80].
微美全息上涨5.13%,报2.46美元/股,总市值2415.92万美元
Jin Rong Jie· 2025-06-11 13:50
Group 1 - The core viewpoint of the articles highlights WiMi Hologram Cloud's significant growth in net profit despite a decrease in total revenue, indicating a strong operational efficiency and potential for future growth [1][2]. - As of December 31, 2024, WiMi's total revenue is projected to be 542 million RMB, a year-on-year decrease of 7.42%, while the net profit attributable to the parent company is expected to reach 71.64 million RMB, reflecting a year-on-year increase of 117.01% [1]. - WiMi Hologram Cloud is positioned as a leading provider of holographic cloud technology solutions in China, focusing on various aspects of holographic AR technology, including AI synthesis, visual presentation, and advertising [1][2]. Group 2 - The company has made significant breakthroughs in various holographic application fields such as advertising, entertainment, education, and 5G communications, aiming to enhance the depth of research and market application in holographic 3D computer vision [2]. - WiMi's vision is to become a creator of the holographic ecosystem in China, emphasizing the development of a strong and open service platform that bridges the application of holographic technology and computer vision presentation [2].
速递|Buildots完成4500万美元D轮融资,用AI模型+计算机视觉破解建筑业“信息脱节”难题
Z Potentials· 2025-05-30 03:23
Core Viewpoint - Buildots aims to revolutionize the construction industry by utilizing artificial intelligence and computer vision technology to bridge the gap between management and on-site realities [3][4]. Group 1: Company Overview - Buildots is a Chicago-based startup founded in 2018 by Roy Danon, Aviv Leibovici, and Yakir Sudry, focusing on tracking construction progress through images captured by 360-degree cameras on hard hats [3]. - The company has raised a total of $166 million, with $45 million from a Series D funding round led by Qumra Capital [3]. Group 2: Technology and Innovation - The Buildots system not only monitors construction progress but also predicts potential delays and issues, allowing teams to make data-driven decisions rather than relying on fragmented information [4]. - The platform enables project status inquiries through an AI chatbot and provides alerts for possible risks, which can help avoid costly problems [4]. Group 3: Market Position and Competition - Buildots serves clients including Intel and around 50 construction companies, positioning itself as a significant player in the construction technology sector [4]. - Competitors in the market include BeamUp, which develops AI design platforms, and Versatile, which analyzes construction site data to present project progress [4]. Group 4: Future Plans - The recent funding will primarily be used to expand Buildots' product offerings to cover more stages of the construction lifecycle and to enhance its AI models using historical data [4][5]. - Buildots plans to focus on expanding its research and development team and growing its presence in North America [4].
4万多名作者挤破头,CVPR 2025官方揭秘三大爆款主题, 你卷对方向了吗?
机器之心· 2025-05-28 03:02
Core Insights - The article discusses the latest trends in the field of computer vision, highlighting three major research directions that are gaining traction as of 2025 [3][4]. Group 1: Major Research Directions - The three prominent areas identified are: 1. Multi-view and sensor 3D technology, which has evolved from 2D rendering to more complex 3D evaluations, significantly influenced by the introduction of NeRF in 2020 [5]. 2. Image and video synthesis, which has become a focal point for presenting environmental information more accurately, reflecting advancements in the ability to analyze and generate multimedia content [6]. 3. Multimodal learning, which integrates visual, linguistic, and reasoning capabilities, indicating a trend towards more interactive and comprehensive AI systems [7][8]. Group 2: Conference Insights - The CVPR 2025 conference has seen a 13% increase in paper submissions, with a total of 13,008 submissions and an acceptance rate of 22.1%, indicating a highly competitive environment [3]. - The conference emphasizes the importance of diverse voices in the research community, ensuring that every paper, regardless of the author's affiliation, is given equal consideration [8].
长春光机所光电突触器件研究取得新进展
Huan Qiu Wang Zi Xun· 2025-05-10 09:18
Core Insights - The research team at Changchun Institute of Optics, Fine Mechanics and Physics has developed a novel ultraviolet optoelectronic synaptic device and an optoelectronic synaptic transistor, providing new technological pathways for advanced artificial vision systems and neuromorphic computing vision [1][2] Group 1: Ultraviolet Optoelectronic Synaptic Device - The ultraviolet optoelectronic synaptic device utilizes the ferroelectric polarization characteristics of AlScN and the excellent optoelectronic properties of GaN, successfully constructed based on the trapping and detrapping mechanism of holes at the heterojunction [1] - This device exhibits outstanding non-volatile storage characteristics and can simulate synaptic functions in biological visual systems, enabling multi-state modulation such as long-term potentiation (LTP), paired-pulse facilitation (PPF), and learning-forgetting-relearning processes [1][2] Group 2: Optoelectronic Synaptic Transistor - The optoelectronic synaptic transistor achieves wide-spectrum high optoelectronic conversion efficiency and long data retention capability from ultraviolet to near-infrared through a gas adsorption-assisted persistent photoconductivity strategy [1][2] - It demonstrates excellent photodetection performance across a broad spectrum from 375nm to 1310nm, with a paired-pulse facilitation index reaching 158%, significantly enhancing the precision and efficiency of neural networks in processing visual information [2] Group 3: Implications for Computing and Vision Systems - The development of these devices provides efficient and biomimetic solutions for hardware implementation of multispectral neuromorphic vision systems, simulating the perception and recognition functions of human retinal cells for multispectral signals [2] - Neuromorphic vision systems, which mimic the structure of neurons and synapses in the human brain, can process multiple streams of information simultaneously, significantly reducing power consumption and improving data processing speed compared to traditional serial processing visual systems [2]
“计算机视觉被GPT-4o终结了”(狗头)
量子位· 2025-03-29 07:46
Core Viewpoint - The article discusses the advancements in computer vision (CV) and image generation capabilities brought by the new GPT-4o model, highlighting its potential to disrupt existing tools and methodologies in the field [1][2]. Group 1: Technological Advancements - GPT-4o introduces native multimodal image generation, expanding the functionalities of AI tools beyond traditional applications [2][12]. - The image generation process in GPT-4o is based on a self-regressive model, differing from the diffusion model used in DALL·E, which allows for better adherence to instructions and enhanced image editing capabilities [15][19]. - Observations suggest that the image generation may involve a multi-scale self-regressive combination, where a rough image is generated first, followed by detail filling while the rough shape evolves [17][19]. Group 2: Industry Impact - The advancements in GPT-4o's capabilities have raised concerns among designers and computer vision researchers, indicating a significant shift in the competitive landscape of AI tools [6][10]. - OpenAI's approach of scaling foundational models to achieve these capabilities has surprised many in the industry, suggesting a new trend in AI development [12][19]. - The potential for GPT-4o to enhance applications in autonomous driving has been noted, with implications for future developments in this sector [10]. Group 3: Community Engagement - The article encourages community members to share their experiences and innovative uses of GPT-4o, fostering a collaborative environment for exploring AI applications [26].
等待13年,AlexNet重磅开源:Hinton团队亲手写的原版代码,甚至还带注释
3 6 Ke· 2025-03-24 11:38
Core Insights - AlexNet's original source code has been open-sourced after 13 years, allowing AI developers and deep learning enthusiasts to access the foundational code that revolutionized computer vision [1][10][11] - The release includes the original 2012 version written by Geoffrey Hinton's team, complete with annotations, providing insights into the development process of deep learning models [1][11] Group 1: Historical Context - AlexNet emerged in 2012 during the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), significantly reducing the Top-5 error rate from 26.2% to 15.3%, marking a pivotal moment in computer vision [2][3] - Prior to AlexNet, neural networks faced skepticism and were largely overlooked due to limitations in computational power and data availability, with a resurgence occurring in the 1980s following the rediscovery of the backpropagation algorithm [4][6] Group 2: Technical Aspects - AlexNet consists of 5 convolutional layers and 3 fully connected layers, totaling 60 million parameters and 650,000 neurons, utilizing GPU acceleration for training [2][3] - The success of AlexNet was facilitated by the availability of the ImageNet dataset, which was crowdsourced and became the largest image dataset at the time, and advancements in GPU technology, particularly NVIDIA's CUDA programming system [5][6] Group 3: Development and Impact - The open-sourcing of AlexNet's code was a collaborative effort between the Computer History Museum and Google, taking five years to navigate licensing complexities [10][11] - AlexNet's publication has led to over 170,000 citations, establishing it as a seminal work in the field of deep learning and influencing subsequent research and development in AI [7][10]
【计算机】端侧AI发展迎来重大拐点,计算视觉领导者虹软科技有望实现新增长——AI行业跟踪报告第60期(刘凯/白玥)
光大证券研究· 2025-03-06 09:25
点击注册小程序 特别申明: 本订阅号中所涉及的证券研究信息由光大证券研究所编写,仅面向光大证券专业投资者客户,用作新媒体形势下研究 信息和研究观点的沟通交流。非光大证券专业投资者客户,请勿订阅、接收或使用本订阅号中的任何信息。本订阅号 难以设置访问权限,若给您造成不便,敬请谅解。光大证券研究所不会因关注、收到或阅读本订阅号推送内容而视相 关人员为光大证券的客户。 报告摘要 端侧 AI技术的快速发展正在推动多领域的智能化变革 风险提示: AI技术发展不及预期;AI眼镜落地不及预期;下游需求不及预期。 发布日期: 2025-03-06 根据WellsennXR数据,全球眼镜市场规模庞大,2023年销量约15.6亿副,市场规模约1500亿美元,预计未 来10年销量将达20亿副,市场规模接近2000亿美元。AI智能眼镜在这一背景下展现出巨大潜力,预计2025 年开始快速渗透传统眼镜市场,2035年销量有望达14亿副。目前,Meta、雷鸟、百度、Rokid等企业已推 出多款AI眼镜产品,推动技术迭代与市场成熟。截至2024Q2,Meta与雷朋联名产品Ray-BanMeta出货量已 经超过了100万台;雷鸟V3AI拍摄眼 ...