Computer Vision - filings, earnings calls, financial reports, news - Reportify

Computer Vision

Search documents

Appointment of Marec Gasiun as Executive Vice President of Sales & Marketing at Neonode

Prnewswire· 2026-01-02 14:28

With this achievement, the company is positioned to accelerate its automotive momentum through new design wins and strategic partnerships. "The recruitment of Marec allows us to consolidate all commercial activities into a single, focused team," said Mr. Alexus. "This will sharpen execution, and with Marec's extensive experience in commercial leadership across technology, automotive, and high-growth environments, we now have the right setup to accelerate our growth journey." For more information, please con ...

Neonode(US:NEON)

Automotive Software

Computer Vision

MultiSensing computer vision and AI technology

Automotive Software

Computer Vision

MultiSensing computer vision and AI technology

SIGGRAPH Asia 2025｜30FPS普通相机恢复200FPS细节，4D重建方案来了

机器之心· 2025-12-14 04:53

硬件革新：异步捕捉，让相机 "错峰拍摄" 本文第一作者陈羽田，香港中文大学 MMLab 博士二年级在读，研究方向为三维重建与生成，导师为薛天帆教授。个人主页：https://yutian10.github.io 当古装剧中的长袍在武林高手凌空翻腾的瞬间扬起 0.01 秒的惊艳弧度，当 VR 玩家想伸手抓住对手 "空中定格" 的剑锋，当 TikTok 爆款视频里一滴牛奶皇冠般的溅落要被 360° 无死角重放 —— 如何用普通的摄像机，把瞬间即逝的高速世界 "冻结" 成可供反复拆解、传送与交互的数字化 4D 时空，成为 3D 视觉领域的一个难题。然而，受限于硬件成本与数据传输带宽，目前绝大多数 4D 采集阵列的最高帧率仅约 30 FPS；相比之下，传统高速摄影通常需要 120 FPS 乃至更高。简单升级相机硬件不仅价格高昂，还会带来指数级增长的数据通量，难以在大规模部署中落地。另一条改变的思路是在重建阶段 "补帧"。近期，例如 4D 高斯溅射（4D Gaussian Splatting）等动态场景重建方法能在简单运动中通过稀疏时序输入合成连续帧，变相提升帧率，但面对布料摆动、高速旋转等非线性复杂运动，中间 ...

视频扩散模型

Computer Vision

异步采集 + 视频扩散模型修复的4D重建方案

视频扩散模型

Computer Vision

异步采集 + 视频扩散模型修复的4D重建方案

NeurIPS Spotlight｜GHAP：把3DGS“剪枝”变成“重建更小的高斯世界”

机器之心· 2025-11-14 09:30

Core Viewpoint - The article presents a novel approach to 3D Gaussian Splatting (3DGS) compression by framing it as a Gaussian mixture model simplification, which effectively reduces redundancy while preserving geometric details [4][28]. Summary by Sections Introduction - 3DGS is a popular method for 3D scene modeling that uses numerous Gaussian spheres to create high-quality 3D representations. However, the redundancy in Gaussian spheres limits storage and rendering speed [4]. Methodology - The proposed Gaussian-Herding-across-Pens (GHAP) method treats the entire 3DGS as a Gaussian mixture model, aiming to reconstruct a smaller mixture model globally. This approach maintains the geometric structure while reducing the number of Gaussian spheres [8][9]. - GHAP employs a two-stage process: first simplifying geometric information (position/covariance), followed by refining appearance features (opacity/color). This decoupling enhances stability [9][19]. Experimental Results - The GHAP method was compared with various pruning-based and end-to-end compression methods. Results indicate that GHAP consistently outperforms other baseline methods while being close to the performance of full-sample end-to-end methods [20][24]. - At a 10% retention rate, GHAP maintains high visual fidelity across different models and scenes, demonstrating its effectiveness in preserving the original geometric structure [23][24]. Conclusion - The GHAP method offers a new perspective on 3DGS compression, focusing on Gaussian mixture model simplification to retain geometric detail. It is designed to be scalable for large 3DGS scenes and is compatible with most existing 3DGS frameworks [28].

Gaussian Mixture Reduction (GMR)

Optimal Transport Theory

Computer Vision

3D Gaussian Splatting (3DGS)

Gaussian-Herding-across-Pens (GHAP)

Gaussian Mixture Reduction (GMR)

Optimal Transport Theory

Computer Vision

3D Gaussian Splatting (3DGS)

Gaussian-Herding-across-Pens (GHAP)

Feed-Forward 3D综述：三维视觉如何「一步到位」

机器之心· 2025-11-06 08:58

Core Insights - The article discusses advancements in the field of 3D vision, particularly focusing on the transition from traditional methods to Feed-Forward 3D approaches, which enhance efficiency and generalization capabilities [2][4]. Summary by Sections Overview of Feed-Forward 3D - The article highlights the evolution of 3D reconstruction techniques, from Structure-from-Motion (SfM) to Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the shift towards Feed-Forward 3D methods that eliminate the need for per-scene optimization [2][6]. Key Technological Branches - Five main architectural categories of Feed-Forward 3D methods are identified, each contributing significantly to the field's progress [6][7]. - Neural Radiance Fields (NeRF) introduced a differentiable framework for volume rendering but faced efficiency issues due to scene-specific optimization. The emergence of conditional NeRF has led to various branches focusing on direct prediction of radiance fields [7][9]. - PointMap Models, led by DUSt3R, predict pixel-aligned 3D point clouds directly within a Transformer framework, enhancing efficiency and memory capabilities [9][10]. - 3D Gaussian Splatting (3DGS) represents scenes as Gaussian point clouds, balancing rendering quality and speed. Recent advancements allow for direct output of Gaussian parameters [10][12]. - Mesh, Occupancy, and SDF Models integrate traditional geometric modeling with modern techniques, enabling high-precision surface modeling [14][19]. Applications and Benchmarking - The paper summarizes the application of Feed-Forward models across various tasks, including camera pose estimation, point map estimation, and single-image view synthesis, providing a comprehensive benchmark of over 30 common 3D datasets [16][18][22]. - Evaluation metrics such as PSNR, SSIM, and Chamfer Distance are established to facilitate model comparison and performance assessment [18][23]. Future Challenges and Trends - The article identifies four major open questions for future research, including the integration of Diffusion Transformers, scalable 4D memory mechanisms, and the construction of multimodal large-scale datasets [27][28]. - Challenges such as the predominance of RGB-only data, the need for improved reconstruction accuracy, and difficulties in free-viewpoint rendering are highlighted [29].

多视图合成

Computer Vision

Feed-Forward 3D

Neural Radiance Fields (NeRF)

3D Gaussian Splatting (3DGS)

多视图合成

Computer Vision

Feed-Forward 3D

Neural Radiance Fields (NeRF)

3D Gaussian Splatting (3DGS)

Feed-Forward 3D综述：3D视觉进入“一步到位”时代

自动驾驶之心· 2025-10-31 16:03

Core Insights - The article discusses the evolution of 3D vision technologies, highlighting the transition from traditional methods like Structure-from-Motion (SfM) to advanced techniques such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the emergence of Feed-Forward 3D as a new paradigm in the AI-driven era [2][6]. Summary by Categories 1. Technological Evolution - The article outlines the historical progression in 3D vision, noting that previous methods often required per-scene optimization, which was slow and lacked generalization capabilities [2][6]. - Feed-Forward 3D is introduced as a new paradigm that aims to overcome these limitations, enabling faster and more generalized 3D understanding [2]. 2. Classification of Feed-Forward 3D Methods - The article categorizes Feed-Forward 3D methods into five main architectures, each contributing to significant advancements in the field: 1. **NeRF-based Models**: These models utilize a differentiable framework for volume rendering but face efficiency issues due to scene-specific optimization. Conditional NeRF approaches have emerged to allow direct prediction of radiance fields [8]. 2. **PointMap Models**: Led by DUSt3R, these models predict pixel-aligned 3D point clouds directly within a Transformer framework, eliminating the need for camera pose input [10]. 3. **3D Gaussian Splatting (3DGS)**: This innovative representation uses Gaussian point clouds to balance rendering quality and speed, with advancements allowing direct output of Gaussian parameters [11][13]. 4. **Mesh / Occupancy / SDF Models**: These methods combine traditional geometric modeling with modern techniques like Transformers and Diffusion models [14]. 5. **3D-Free Models**: These models learn mappings from multi-view inputs to new perspectives without relying on explicit 3D representations [15]. 3. Applications and Tasks - The article highlights diverse applications of Feed-Forward models, including: - Pose-Free Reconstruction & View Synthesis - Dynamic 4D Reconstruction & Video Diffusion - SLAM and visual localization - 3D-aware image and video generation - Digital human modeling - Robotic manipulation and world modeling [19]. 4. Benchmarking and Evaluation Metrics - The article mentions the inclusion of over 30 commonly used 3D datasets, covering various types of scenes and modalities, and summarizes standard evaluation metrics such as PSNR, SSIM, and Chamfer Distance for future model comparisons [20][21]. 5. Future Challenges and Trends - The article identifies four major open questions for future research, including the need for multi-modal data, improvements in reconstruction accuracy, challenges in free-viewpoint rendering, and the limitations of long-context reasoning in processing extensive frame sequences [25][26].

3D Reconstruction

Computer Vision

Feed-Forward 3D

Neural Radiance Fields (NeRF)

3D Reconstruction

Computer Vision

Feed-Forward 3D

Neural Radiance Fields (NeRF)

三维重建综述：从多视角几何到 NeRF 与 3DGS 的演进

自动驾驶之心· 2025-09-22 23:34

Core Viewpoint - 3D reconstruction is a critical intersection of computer vision and graphics, serving as the digital foundation for cutting-edge applications such as virtual reality, augmented reality, autonomous driving, and digital twins. Recent advancements in new perspective synthesis technologies, represented by Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have significantly improved reconstruction quality, speed, and dynamic adaptability [5][6]. Group 1: Introduction and Demand - The resurgence of interest in 3D reconstruction is driven by new application demands across various fields, including city-scale digital twins requiring kilometer-level coverage and centimeter-level accuracy, autonomous driving simulations needing dynamic traffic flow and real-time semantics, and AR/VR social applications demanding over 90 FPS and photo-realistic quality [6]. - Traditional reconstruction pipelines are inadequate for these new requirements, prompting the integration of geometry, texture, and lighting through differentiable rendering techniques [6]. Group 2: Traditional Multi-View Geometry Reconstruction - The traditional multi-view geometry approach (SfM to MVS) has inherent limitations in quality, efficiency, and adaptability to dynamic scenes, which have been addressed through iterative advancements in NeRF and 3DGS technologies [7]. - A comprehensive comparison of various methods highlights the evolution and future challenges in the field of 3D reconstruction [7]. Group 3: NeRF and Its Innovations - NeRF models scenes as continuous 5D functions, enabling advanced rendering techniques that have evolved significantly from 2020 to 2024, addressing issues such as data requirements, texture limitations, lighting sensitivity, and dynamic scene handling [13][15]. - Various methods have been developed to enhance quality and efficiency, including Mip-NeRF, NeRF-W, and InstantNGP, each contributing to improved rendering speeds and reduced memory usage [17][18]. Group 4: 3DGS and Its Advancements - 3DGS represents scenes as collections of 3D Gaussians, allowing for efficient rendering and high-quality output. Recent methods have focused on optimizing rendering quality and efficiency, achieving significant improvements in memory usage and frame rates [22][26]. - The comparison of 3DGS with other methods shows its superiority in rendering speed and dynamic scene reconstruction capabilities [31]. Group 5: Future Trends and Conclusion - The next five years are expected to see advancements in hybrid representations, real-time processing on mobile devices, generative reconstruction techniques, and multi-modal fusion for robust reconstruction [33]. - The ultimate goal is to enable real-time 3D reconstruction accessible to everyone, marking a shift towards ubiquitous computing [34].

Computer Vision

三维重建技术

Computer Vision

三维重建技术

港科&地平线&浙大联手开源SAIL-Recon：三分钟重建一座城

自动驾驶之心· 2025-09-02 23:33

Core Insights - The article discusses the SAIL-Recon framework, which integrates scene regression with localization to achieve large-scale Structure from Motion (SfM) using thousands of images efficiently and accurately [7][10][34]. Group 1: Traditional SfM Limitations - Traditional SfM algorithms rely on feature extraction, matching, triangulation, and bundle adjustment, which can fail in low-texture, blurry, or repetitive texture scenes [5]. - Recent research has proposed an end-to-end learnable SfM pipeline that directly regresses scene structure and camera poses from images, but it is limited by GPU memory when handling large-scale scenes [5][10]. Group 2: SAIL-Recon Framework - SAIL-Recon is a multi-task framework that unifies reconstruction and localization without the need for scene-specific training, sampling a few anchor images from large image or video sequences to infer neural scene representations [7][10]. - The framework achieves state-of-the-art (SOTA) performance across multiple benchmarks, surpassing both traditional and learning-based methods in accuracy and efficiency [10][34]. Group 3: Methodology - The SAIL-Recon process involves selecting a small number of anchor images to extract neural scene representations, which are then used to jointly estimate scene coordinates and camera poses for all images [9][10]. - The method employs a transformer to compute scene representations and camera parameters, optimizing GPU memory usage through a key-value cache [11][12]. Group 4: Experimental Results - SAIL-Recon demonstrated superior performance in pose estimation and new view synthesis tasks, achieving the highest PSNR in the Tanks & Temples dataset and completing reconstructions significantly faster than traditional methods [26][32]. - The framework maintains good performance even when reducing the number of anchor images from 10 to 2, indicating robustness in various sampling strategies [32]. Group 5: Limitations and Future Work - The framework's reliance on a fixed global coordinate system may affect certain sequences, suggesting a need for improved anchor image selection strategies [36]. - Uniform sampling could overlook scene areas, indicating potential for research into coverage-aware sampling methods [36].

运动恢复结构（SfM）

Computer Vision

运动恢复结构（SfM）

Computer Vision

多样化大规模数据集！SceneSplat++：首个基于3DGS的综合基准~

自动驾驶之心· 2025-06-20 14:06

Core Insights - The article introduces SceneSplat-Bench, a comprehensive benchmark for evaluating visual-language scene understanding methods based on 3D Gaussian Splatting (3DGS) [11][30]. - It presents SceneSplat-49K, a large-scale dataset containing approximately 49,000 raw scenes and 46,000 filtered 3DGS scenes, which is the most extensive open-source dataset for complex and high-quality scene-level 3DGS reconstruction [9][30]. - The evaluation indicates that generalizable methods consistently outperform per-scene optimization methods, establishing a new paradigm for scalable scene understanding through pre-trained models [30]. Evaluation Protocols - The benchmark evaluates methods based on two key metrics in 3D space: foreground mean Intersection over Union (f-mIoU) and foreground mean accuracy (f-mAcc), addressing object size imbalance and reducing viewpoint dependency compared to 2D evaluations [22][30]. - The evaluation dataset includes ScanNet, ScanNet++, and Matterport3D for indoor scenes, and HoliCity for outdoor scenes, emphasizing the methods' capabilities across various object scales and complex environments [22][30]. Dataset Contributions - SceneSplat-49K is compiled from multiple sources, including SceneSplat-7K, DL3DV-10K, HoliCity, and Aria Synthetic Environments, ensuring a diverse range of indoor and outdoor environments [9][10]. - The dataset preparation involved approximately 891 GPU days and extensive human effort, highlighting the significant resources invested in creating a high-quality dataset [7][9]. Methodological Insights - The article categorizes methods into three types: per-scene optimization methods, per-scene optimization-free methods, and generalizable methods, with SceneSplat representing the latter [23][30]. - Generalizable methods eliminate the need for extensive single-scene computations during inference, allowing for efficient processing of 3D scenes in a single forward pass [24][30]. Performance Results - The results from SceneSplat-Bench demonstrate that SceneSplat excels in both performance and efficiency, often surpassing the pseudo-label methods used for its pre-training [24][30]. - The performance of various methods shows significant variation based on the dataset's complexity, indicating the importance of challenging benchmarks in revealing the limitations of competing methods [28][30].

3D Gaussian Splatting (3DGS)

Visual-Language Reasoning

Computer Vision

SceneSplat-Bench

3D Gaussian Splatting (3DGS)

Visual-Language Reasoning

Computer Vision

SceneSplat-Bench

无需昂贵设备，单目方案生成超逼真3D头像，清华＆IDEA新研究入选CVPR2025

量子位· 2025-05-22 14:29

Core Viewpoint - The article discusses the development of HRAvatar, a method for creating high-quality, relightable 3D avatars from monocular video, addressing challenges in animation, real-time rendering, and visual realism [1][4][6]. Group 1: Methodology and Innovations - HRAvatar utilizes a learnable deformation basis and linear skinning techniques to achieve flexible and precise geometric transformations [1][6]. - An end-to-end expression encoder is introduced to enhance the accuracy of expression parameter extraction, reducing tracking errors and ensuring generalization [6][10]. - The method decomposes the avatar's appearance into material properties such as albedo, roughness, and Fresnel reflectance, employing a simplified BRDF model for shading [6][16]. Group 2: Performance and Results - HRAvatar demonstrates superior performance across various metrics, achieving a PSNR of 30.36, MAE of 0.845, SSIM of 0.9482, and LPIPS of 0.0569, outperforming existing methods [24][26]. - The method achieves real-time rendering speeds of approximately 155 FPS under driving and relighting conditions [25]. - Experimental results indicate that HRAvatar excels in detail richness and quality, particularly in LPIPS scores, suggesting enhanced avatar detail [24][34]. Group 3: Applications and Future Directions - The reconstructed avatars can be animated and relit in new environmental lighting conditions, allowing for simple material editing [28]. - The introduction of HRAvatar expands the application scenarios for monocular Gaussian virtual avatar modeling, with the code being open-sourced for public use [35][36].

3D高斯头像重建

Computer Vision

3D高斯头像重建

Computer Vision

ICML 2025 Spotlight | 用傅里叶分解探讨图像对抗扰动，代码已开源

机器之心· 2025-05-18 04:25

Core Viewpoint - The article discusses a novel approach to adversarial purification in computer vision, focusing on the frequency domain to effectively separate adversarial perturbations from clean images while preserving semantic information [5][21]. Research Background - Adversarial samples pose significant challenges to the safety and robustness of models in computer vision, necessitating effective adversarial purification techniques to restore original clean images [5]. - Existing adversarial purification methods are categorized into training-based and diffusion model-based approaches, with the latter offering stronger generalization capabilities without requiring extensive training data [5][6]. Motivation and Theoretical Analysis - The key to successful adversarial purification lies in eliminating adversarial perturbations while retaining the semantic information of the original image [9]. - Current strategies that add noise to mask adversarial perturbations often excessively damage the semantic content of the original image [9]. - The study employs Fourier decomposition to analyze the distribution characteristics of adversarial perturbations, revealing that they predominantly affect high-frequency components, while low-frequency components are more robust [9][12]. Methodology - A filter is constructed to retain low-frequency amplitude spectrum components, which are less affected by adversarial perturbations, while allowing for the replacement of these components with those from the original clean image [14][15]. - The phase spectrum is also addressed, as it is influenced by adversarial perturbations across all frequency components; thus, a projection method is used to maintain the integrity of the phase information [16][17]. Experimental Results - The proposed method demonstrates improved performance in both standard and robust accuracy metrics compared to state-of-the-art (SOTA) methods on datasets such as CIFAR10 and ImageNet [18][19]. - Visualizations indicate that the purified images closely resemble the original clean images, confirming the effectiveness of the proposed approach [20]. Conclusion - While significant progress has been made in preserving semantic information and removing adversarial perturbations, further exploration into more effective image decomposition methods and deeper theoretical explanations remains a future research direction [21].

Adversarial Purification

Fourier Decomposition

Diffusion Model

Computer Vision

Adversarial Purification

Fourier Decomposition

Diffusion Model

Computer Vision