Computer Vision
Search documents
90后兄弟“卖算法”年入2.5亿 回款账期从99天拖到379天
凤凰网财经· 2026-01-23 11:52
近日,山东极视角科技股份有限公司(简称:极视角)获IPO备案,正式向港交所递交上市申请。 资料显示,极视角成立于2015年6月,由陈振杰、罗韵、陈硕三位中山大学90后校友联合创立,后于2023年4月完成股改更名,现任董事长兼总经理陈振杰 领衔的核心团队,凭借腾讯战略规划、咨询机构服务及技术研发的多元背景,深耕企业级计算机视觉赛道。 凤凰网财经《IPO观察哨》 股权结构上,陈振杰、罗韵及员工持股平台横琴极力构成单一最大股东团体,根据一致行动协议合计拥有29.84%的投票权,其中陈振杰持股16.05%、罗韵 持股4.39%,高通中国作为重要机构投资方持股4.97%,公司无港交所上市规则界定的控股股东。 财务表现上,凤凰网财经《IPO观察哨》发现,极视角营收保持高速增长但盈利与现金流承压明显。2022年至2024年,公司营收分别实现1.02亿元、1.28亿 元、2.57亿元,2025年前三季度营收进一步增至1.36亿元,同比增幅达71.6%。但盈利稳定性极差,2022年、2023年分别亏损6072.2万元、5624.6万元, 2024年短暂扭亏实现利润870.8万元,2025年前三季度又再度转亏3629.6万元,累 ...
Appointment of Marec Gasiun as Executive Vice President of Sales & Marketing at Neonode
Prnewswire· 2026-01-02 14:28
Core Insights - Neonode Inc. has appointed Marec Gasiun as Executive Vice President of Sales & Marketing, effective January 1, 2026, to enhance its commercial strategy and execution [1][5] Group 1: Leadership Appointment - Marec Gasiun brings extensive global commercial leadership experience from the automotive, technology, and telecommunications sectors, previously serving as Vice President of Business Development at SeeReal Technologies [2] - Gasiun has also held significant roles, including Vice President of Global Technology Partnerships at Telia Company and Head of Business Development for Google's automotive software business [2][3] Group 2: Strategic Importance - The appointment is seen as pivotal for the company, especially as legacy touch technologies decline, positioning Neonode's MultiSensing computer vision and AI technology for significant advancements in 2026 [3] - The recruitment aims to consolidate all commercial activities into a focused team, enhancing execution and leveraging Gasiun's experience to accelerate growth [5] Group 3: Recent Achievements - In December 2025, Neonode converted a commercial vehicle design win into a production license agreement, marking a transition from validation to real-world adoption of its MultiSensing driver monitoring technology [4] - This milestone is expected to accelerate the company's automotive momentum through new design wins and strategic partnerships [4]
SIGGRAPH Asia 2025|30FPS普通相机恢复200FPS细节,4D重建方案来了
机器之心· 2025-12-14 04:53
硬件革新:异步捕捉,让相机 "错峰拍摄" 本文第一作者陈羽田,香港中文大学 MMLab 博士二年级在读,研究方向为三维重建与生成,导师为薛天帆教授。个人主页:https://yutian10.github.io 当古装剧中的长袍在武林高手凌空翻腾的瞬间扬起 0.01 秒的惊艳弧度,当 VR 玩家想伸手抓住对手 "空中定格" 的剑锋,当 TikTok 爆款视频里一滴牛奶皇冠般的溅 落要被 360° 无死角重放 —— 如何用普通的摄像机,把瞬间即逝的高速世界 "冻结" 成可供反复拆解、传送与交互的数字化 4D 时空,成为 3D 视觉领域的一个难 题。 然而,受限于硬件成本与数据传输带宽,目前绝大多数 4D 采集阵列的最高帧率仅约 30 FPS;相比之下,传统高速摄影通常需要 120 FPS 乃至更高。简单升级相机 硬件不仅价格高昂,还会带来指数级增长的数据通量,难以在大规模部署中落地。另一条改变的思路是在重建阶段 "补帧"。近期,例如 4D 高斯溅射(4D Gaussian Splatting)等动态场景重建方法能在简单运动中通过稀疏时序输入合成连续帧,变相提升帧率,但面对布料摆动、高速旋转等非线性复杂运动,中间 ...
NeurIPS Spotlight|GHAP:把3DGS“剪枝”变成“重建更小的高斯世界”
机器之心· 2025-11-14 09:30
Core Viewpoint - The article presents a novel approach to 3D Gaussian Splatting (3DGS) compression by framing it as a Gaussian mixture model simplification, which effectively reduces redundancy while preserving geometric details [4][28]. Summary by Sections Introduction - 3DGS is a popular method for 3D scene modeling that uses numerous Gaussian spheres to create high-quality 3D representations. However, the redundancy in Gaussian spheres limits storage and rendering speed [4]. Methodology - The proposed Gaussian-Herding-across-Pens (GHAP) method treats the entire 3DGS as a Gaussian mixture model, aiming to reconstruct a smaller mixture model globally. This approach maintains the geometric structure while reducing the number of Gaussian spheres [8][9]. - GHAP employs a two-stage process: first simplifying geometric information (position/covariance), followed by refining appearance features (opacity/color). This decoupling enhances stability [9][19]. Experimental Results - The GHAP method was compared with various pruning-based and end-to-end compression methods. Results indicate that GHAP consistently outperforms other baseline methods while being close to the performance of full-sample end-to-end methods [20][24]. - At a 10% retention rate, GHAP maintains high visual fidelity across different models and scenes, demonstrating its effectiveness in preserving the original geometric structure [23][24]. Conclusion - The GHAP method offers a new perspective on 3DGS compression, focusing on Gaussian mixture model simplification to retain geometric detail. It is designed to be scalable for large 3DGS scenes and is compatible with most existing 3DGS frameworks [28].
Feed-Forward 3D综述:三维视觉如何「一步到位」
机器之心· 2025-11-06 08:58
Core Insights - The article discusses advancements in the field of 3D vision, particularly focusing on the transition from traditional methods to Feed-Forward 3D approaches, which enhance efficiency and generalization capabilities [2][4]. Summary by Sections Overview of Feed-Forward 3D - The article highlights the evolution of 3D reconstruction techniques, from Structure-from-Motion (SfM) to Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the shift towards Feed-Forward 3D methods that eliminate the need for per-scene optimization [2][6]. Key Technological Branches - Five main architectural categories of Feed-Forward 3D methods are identified, each contributing significantly to the field's progress [6][7]. - Neural Radiance Fields (NeRF) introduced a differentiable framework for volume rendering but faced efficiency issues due to scene-specific optimization. The emergence of conditional NeRF has led to various branches focusing on direct prediction of radiance fields [7][9]. - PointMap Models, led by DUSt3R, predict pixel-aligned 3D point clouds directly within a Transformer framework, enhancing efficiency and memory capabilities [9][10]. - 3D Gaussian Splatting (3DGS) represents scenes as Gaussian point clouds, balancing rendering quality and speed. Recent advancements allow for direct output of Gaussian parameters [10][12]. - Mesh, Occupancy, and SDF Models integrate traditional geometric modeling with modern techniques, enabling high-precision surface modeling [14][19]. Applications and Benchmarking - The paper summarizes the application of Feed-Forward models across various tasks, including camera pose estimation, point map estimation, and single-image view synthesis, providing a comprehensive benchmark of over 30 common 3D datasets [16][18][22]. - Evaluation metrics such as PSNR, SSIM, and Chamfer Distance are established to facilitate model comparison and performance assessment [18][23]. Future Challenges and Trends - The article identifies four major open questions for future research, including the integration of Diffusion Transformers, scalable 4D memory mechanisms, and the construction of multimodal large-scale datasets [27][28]. - Challenges such as the predominance of RGB-only data, the need for improved reconstruction accuracy, and difficulties in free-viewpoint rendering are highlighted [29].
Feed-Forward 3D综述:3D视觉进入“一步到位”时代
自动驾驶之心· 2025-10-31 16:03
Core Insights - The article discusses the evolution of 3D vision technologies, highlighting the transition from traditional methods like Structure-from-Motion (SfM) to advanced techniques such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the emergence of Feed-Forward 3D as a new paradigm in the AI-driven era [2][6]. Summary by Categories 1. Technological Evolution - The article outlines the historical progression in 3D vision, noting that previous methods often required per-scene optimization, which was slow and lacked generalization capabilities [2][6]. - Feed-Forward 3D is introduced as a new paradigm that aims to overcome these limitations, enabling faster and more generalized 3D understanding [2]. 2. Classification of Feed-Forward 3D Methods - The article categorizes Feed-Forward 3D methods into five main architectures, each contributing to significant advancements in the field: 1. **NeRF-based Models**: These models utilize a differentiable framework for volume rendering but face efficiency issues due to scene-specific optimization. Conditional NeRF approaches have emerged to allow direct prediction of radiance fields [8]. 2. **PointMap Models**: Led by DUSt3R, these models predict pixel-aligned 3D point clouds directly within a Transformer framework, eliminating the need for camera pose input [10]. 3. **3D Gaussian Splatting (3DGS)**: This innovative representation uses Gaussian point clouds to balance rendering quality and speed, with advancements allowing direct output of Gaussian parameters [11][13]. 4. **Mesh / Occupancy / SDF Models**: These methods combine traditional geometric modeling with modern techniques like Transformers and Diffusion models [14]. 5. **3D-Free Models**: These models learn mappings from multi-view inputs to new perspectives without relying on explicit 3D representations [15]. 3. Applications and Tasks - The article highlights diverse applications of Feed-Forward models, including: - Pose-Free Reconstruction & View Synthesis - Dynamic 4D Reconstruction & Video Diffusion - SLAM and visual localization - 3D-aware image and video generation - Digital human modeling - Robotic manipulation and world modeling [19]. 4. Benchmarking and Evaluation Metrics - The article mentions the inclusion of over 30 commonly used 3D datasets, covering various types of scenes and modalities, and summarizes standard evaluation metrics such as PSNR, SSIM, and Chamfer Distance for future model comparisons [20][21]. 5. Future Challenges and Trends - The article identifies four major open questions for future research, including the need for multi-modal data, improvements in reconstruction accuracy, challenges in free-viewpoint rendering, and the limitations of long-context reasoning in processing extensive frame sequences [25][26].
三维重建综述:从多视角几何到 NeRF 与 3DGS 的演进
自动驾驶之心· 2025-09-22 23:34
Core Viewpoint - 3D reconstruction is a critical intersection of computer vision and graphics, serving as the digital foundation for cutting-edge applications such as virtual reality, augmented reality, autonomous driving, and digital twins. Recent advancements in new perspective synthesis technologies, represented by Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have significantly improved reconstruction quality, speed, and dynamic adaptability [5][6]. Group 1: Introduction and Demand - The resurgence of interest in 3D reconstruction is driven by new application demands across various fields, including city-scale digital twins requiring kilometer-level coverage and centimeter-level accuracy, autonomous driving simulations needing dynamic traffic flow and real-time semantics, and AR/VR social applications demanding over 90 FPS and photo-realistic quality [6]. - Traditional reconstruction pipelines are inadequate for these new requirements, prompting the integration of geometry, texture, and lighting through differentiable rendering techniques [6]. Group 2: Traditional Multi-View Geometry Reconstruction - The traditional multi-view geometry approach (SfM to MVS) has inherent limitations in quality, efficiency, and adaptability to dynamic scenes, which have been addressed through iterative advancements in NeRF and 3DGS technologies [7]. - A comprehensive comparison of various methods highlights the evolution and future challenges in the field of 3D reconstruction [7]. Group 3: NeRF and Its Innovations - NeRF models scenes as continuous 5D functions, enabling advanced rendering techniques that have evolved significantly from 2020 to 2024, addressing issues such as data requirements, texture limitations, lighting sensitivity, and dynamic scene handling [13][15]. - Various methods have been developed to enhance quality and efficiency, including Mip-NeRF, NeRF-W, and InstantNGP, each contributing to improved rendering speeds and reduced memory usage [17][18]. Group 4: 3DGS and Its Advancements - 3DGS represents scenes as collections of 3D Gaussians, allowing for efficient rendering and high-quality output. Recent methods have focused on optimizing rendering quality and efficiency, achieving significant improvements in memory usage and frame rates [22][26]. - The comparison of 3DGS with other methods shows its superiority in rendering speed and dynamic scene reconstruction capabilities [31]. Group 5: Future Trends and Conclusion - The next five years are expected to see advancements in hybrid representations, real-time processing on mobile devices, generative reconstruction techniques, and multi-modal fusion for robust reconstruction [33]. - The ultimate goal is to enable real-time 3D reconstruction accessible to everyone, marking a shift towards ubiquitous computing [34].
港科&地平线&浙大联手开源SAIL-Recon:三分钟重建一座城
自动驾驶之心· 2025-09-02 23:33
Core Insights - The article discusses the SAIL-Recon framework, which integrates scene regression with localization to achieve large-scale Structure from Motion (SfM) using thousands of images efficiently and accurately [7][10][34]. Group 1: Traditional SfM Limitations - Traditional SfM algorithms rely on feature extraction, matching, triangulation, and bundle adjustment, which can fail in low-texture, blurry, or repetitive texture scenes [5]. - Recent research has proposed an end-to-end learnable SfM pipeline that directly regresses scene structure and camera poses from images, but it is limited by GPU memory when handling large-scale scenes [5][10]. Group 2: SAIL-Recon Framework - SAIL-Recon is a multi-task framework that unifies reconstruction and localization without the need for scene-specific training, sampling a few anchor images from large image or video sequences to infer neural scene representations [7][10]. - The framework achieves state-of-the-art (SOTA) performance across multiple benchmarks, surpassing both traditional and learning-based methods in accuracy and efficiency [10][34]. Group 3: Methodology - The SAIL-Recon process involves selecting a small number of anchor images to extract neural scene representations, which are then used to jointly estimate scene coordinates and camera poses for all images [9][10]. - The method employs a transformer to compute scene representations and camera parameters, optimizing GPU memory usage through a key-value cache [11][12]. Group 4: Experimental Results - SAIL-Recon demonstrated superior performance in pose estimation and new view synthesis tasks, achieving the highest PSNR in the Tanks & Temples dataset and completing reconstructions significantly faster than traditional methods [26][32]. - The framework maintains good performance even when reducing the number of anchor images from 10 to 2, indicating robustness in various sampling strategies [32]. Group 5: Limitations and Future Work - The framework's reliance on a fixed global coordinate system may affect certain sequences, suggesting a need for improved anchor image selection strategies [36]. - Uniform sampling could overlook scene areas, indicating potential for research into coverage-aware sampling methods [36].
多样化大规模数据集!SceneSplat++:首个基于3DGS的综合基准~
自动驾驶之心· 2025-06-20 14:06
Core Insights - The article introduces SceneSplat-Bench, a comprehensive benchmark for evaluating visual-language scene understanding methods based on 3D Gaussian Splatting (3DGS) [11][30]. - It presents SceneSplat-49K, a large-scale dataset containing approximately 49,000 raw scenes and 46,000 filtered 3DGS scenes, which is the most extensive open-source dataset for complex and high-quality scene-level 3DGS reconstruction [9][30]. - The evaluation indicates that generalizable methods consistently outperform per-scene optimization methods, establishing a new paradigm for scalable scene understanding through pre-trained models [30]. Evaluation Protocols - The benchmark evaluates methods based on two key metrics in 3D space: foreground mean Intersection over Union (f-mIoU) and foreground mean accuracy (f-mAcc), addressing object size imbalance and reducing viewpoint dependency compared to 2D evaluations [22][30]. - The evaluation dataset includes ScanNet, ScanNet++, and Matterport3D for indoor scenes, and HoliCity for outdoor scenes, emphasizing the methods' capabilities across various object scales and complex environments [22][30]. Dataset Contributions - SceneSplat-49K is compiled from multiple sources, including SceneSplat-7K, DL3DV-10K, HoliCity, and Aria Synthetic Environments, ensuring a diverse range of indoor and outdoor environments [9][10]. - The dataset preparation involved approximately 891 GPU days and extensive human effort, highlighting the significant resources invested in creating a high-quality dataset [7][9]. Methodological Insights - The article categorizes methods into three types: per-scene optimization methods, per-scene optimization-free methods, and generalizable methods, with SceneSplat representing the latter [23][30]. - Generalizable methods eliminate the need for extensive single-scene computations during inference, allowing for efficient processing of 3D scenes in a single forward pass [24][30]. Performance Results - The results from SceneSplat-Bench demonstrate that SceneSplat excels in both performance and efficiency, often surpassing the pseudo-label methods used for its pre-training [24][30]. - The performance of various methods shows significant variation based on the dataset's complexity, indicating the importance of challenging benchmarks in revealing the limitations of competing methods [28][30].
无需昂贵设备,单目方案生成超逼真3D头像,清华&IDEA新研究入选CVPR2025
量子位· 2025-05-22 14:29
Core Viewpoint - The article discusses the development of HRAvatar, a method for creating high-quality, relightable 3D avatars from monocular video, addressing challenges in animation, real-time rendering, and visual realism [1][4][6]. Group 1: Methodology and Innovations - HRAvatar utilizes a learnable deformation basis and linear skinning techniques to achieve flexible and precise geometric transformations [1][6]. - An end-to-end expression encoder is introduced to enhance the accuracy of expression parameter extraction, reducing tracking errors and ensuring generalization [6][10]. - The method decomposes the avatar's appearance into material properties such as albedo, roughness, and Fresnel reflectance, employing a simplified BRDF model for shading [6][16]. Group 2: Performance and Results - HRAvatar demonstrates superior performance across various metrics, achieving a PSNR of 30.36, MAE of 0.845, SSIM of 0.9482, and LPIPS of 0.0569, outperforming existing methods [24][26]. - The method achieves real-time rendering speeds of approximately 155 FPS under driving and relighting conditions [25]. - Experimental results indicate that HRAvatar excels in detail richness and quality, particularly in LPIPS scores, suggesting enhanced avatar detail [24][34]. Group 3: Applications and Future Directions - The reconstructed avatars can be animated and relit in new environmental lighting conditions, allowing for simple material editing [28]. - The introduction of HRAvatar expands the application scenarios for monocular Gaussian virtual avatar modeling, with the code being open-sourced for public use [35][36].