Workflow
Computer Vision
icon
Search documents
Seeing Machines posts 117% rise in vehicle production as Guardian sales surge
Yahoo Finance· 2026-02-11 07:42
Seeing Machines posts 117% rise in vehicle production as Guardian sales surge Proactive uses images sourced from Shutterstock Driver-monitoring specialist records highest-ever quarterly volumes ahead of European safety mandate Seeing Machines Ltd (AIM:SEE, OTC:SEEMF, FRA:M2Z) produced nearly 580,000 vehicles with its driver-monitoring systems in the second quarter of its 2026 financial year, a 117% increase on the same period a year ago. The Australia-based computer vision company said 4.8 million vehicl ...
90后兄弟“卖算法”年入2.5亿 回款账期从99天拖到379天
凤凰网财经· 2026-01-23 11:52
Core Viewpoint - Shandong Jishijiao Technology Co., Ltd. (referred to as Jishijiao) has submitted its IPO application to the Hong Kong Stock Exchange, marking a significant step in its growth trajectory since its establishment in June 2015 [1][3]. Company Overview - Jishijiao was founded by three alumni from Sun Yat-sen University and has focused on the enterprise-level computer vision sector, leveraging diverse backgrounds in strategic planning, consulting, and technology development [1][3]. - The company has created China's first "AI Vision Algorithm Mall," which has launched over 1,500 algorithms covering more than 100 industries, serving over 3,000 clients including Tencent Cloud and Alibaba Cloud [3]. Market Position - According to Frost & Sullivan, Jishijiao ranks eighth in China's emerging enterprise-level computer vision solutions market with a market share of only 1.6%, significantly trailing the market leader's 12.1% share [3]. Financial Performance - Jishijiao has completed 11 rounds of financing, with a notable D round in November 2024 that valued the company at 2.31 billion RMB, reflecting a growth of over 243 times since its angel round valuation of 9.5 million RMB in 2015 [3][4]. - Revenue has shown rapid growth, with figures of 102 million RMB in 2022, 128 million RMB in 2023, and projected 257 million RMB in 2024, while the first three quarters of 2025 saw revenue of 136 million RMB, a year-on-year increase of 71.6% [5][6]. Profitability and Cash Flow - Despite revenue growth, Jishijiao has faced significant challenges with profitability and cash flow, reporting losses of 60.72 million RMB in 2022 and 56.25 million RMB in 2023, with a brief profit of 8.71 million RMB in 2024, followed by a loss of 36.30 million RMB in the first three quarters of 2025 [5][6]. - The company has experienced continuous negative cash flow from operating activities, totaling over 190 million RMB from 2022 to the first three quarters of 2025, attributed to deteriorating accounts receivable turnover [6]. Research and Development - Jishijiao has invested over 100 million RMB in R&D from 2022 to 2024, with R&D expenses accounting for 34.4% of revenue in the first three quarters of 2025. However, the output ratio remains low, with only 30 patents and 117 software copyrights held by a 101-member R&D team [6][7]. Employee Retention - The company has faced high employee turnover rates, with rates of 63.04%, 42.86%, and 45.91% from 2022 to 2024, which has impacted the continuity of technical development and project delivery [8][9]. Compliance Issues - Jishijiao has encountered compliance challenges, including insufficient social security contributions and unregistered lease agreements, which may pose risks during its IPO process [9].
Appointment of Marec Gasiun as Executive Vice President of Sales & Marketing at Neonode
Prnewswire· 2026-01-02 14:28
Core Insights - Neonode Inc. has appointed Marec Gasiun as Executive Vice President of Sales & Marketing, effective January 1, 2026, to enhance its commercial strategy and execution [1][5] Group 1: Leadership Appointment - Marec Gasiun brings extensive global commercial leadership experience from the automotive, technology, and telecommunications sectors, previously serving as Vice President of Business Development at SeeReal Technologies [2] - Gasiun has also held significant roles, including Vice President of Global Technology Partnerships at Telia Company and Head of Business Development for Google's automotive software business [2][3] Group 2: Strategic Importance - The appointment is seen as pivotal for the company, especially as legacy touch technologies decline, positioning Neonode's MultiSensing computer vision and AI technology for significant advancements in 2026 [3] - The recruitment aims to consolidate all commercial activities into a focused team, enhancing execution and leveraging Gasiun's experience to accelerate growth [5] Group 3: Recent Achievements - In December 2025, Neonode converted a commercial vehicle design win into a production license agreement, marking a transition from validation to real-world adoption of its MultiSensing driver monitoring technology [4] - This milestone is expected to accelerate the company's automotive momentum through new design wins and strategic partnerships [4]
SIGGRAPH Asia 2025|30FPS普通相机恢复200FPS细节,4D重建方案来了
机器之心· 2025-12-14 04:53
Core Viewpoint - The article discusses advancements in 4D reconstruction technology, specifically focusing on a new method that combines asynchronous capture with a video diffusion model to enhance the quality of high-speed dynamic scene reconstruction using low-cost hardware [3][10]. Group 1: Hardware Innovation - The asynchronous capture method allows multiple cameras to work in a "relay" fashion, overcoming the speed limitations of individual cameras. This method introduces a slight delay in the activation of different cameras, effectively doubling the frame rate from 25 FPS to 100 FPS or even reaching 200 FPS by organizing the cameras into groups [5][6][8]. Group 2: Software Innovation - A video diffusion model is employed to address the "sparse view" problem that arises from asynchronous capture, which can lead to visual artifacts in the initial 4D reconstruction. This model is trained to repair these artifacts and enhance video quality by utilizing the spatio-temporal context provided by the input video [9][10][13]. Group 3: Overall Process - The method integrates hardware capture with AI algorithms in an iterative optimization framework. The process includes initial reconstruction using asynchronous capture, generating pseudo ground truth videos, enhancing these videos with the diffusion model, and optimizing the 4D Gaussian model based on the enhanced output [14][15][17]. Group 4: Method Effectiveness - The proposed method outperforms several state-of-the-art techniques in key metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) across two public datasets, demonstrating its effectiveness in producing high-quality 4D reconstructions [19][21]. Group 5: Real-World Validation - A multi-view capture system consisting of 12 cameras operating at 25 FPS was established to validate the method in real-world scenarios. The experiments confirmed that the approach could robustly reconstruct high-quality, temporally consistent 4D content even in complex asynchronous capture environments [22].
NeurIPS Spotlight|GHAP:把3DGS“剪枝”变成“重建更小的高斯世界”
机器之心· 2025-11-14 09:30
Core Viewpoint - The article presents a novel approach to 3D Gaussian Splatting (3DGS) compression by framing it as a Gaussian mixture model simplification, which effectively reduces redundancy while preserving geometric details [4][28]. Summary by Sections Introduction - 3DGS is a popular method for 3D scene modeling that uses numerous Gaussian spheres to create high-quality 3D representations. However, the redundancy in Gaussian spheres limits storage and rendering speed [4]. Methodology - The proposed Gaussian-Herding-across-Pens (GHAP) method treats the entire 3DGS as a Gaussian mixture model, aiming to reconstruct a smaller mixture model globally. This approach maintains the geometric structure while reducing the number of Gaussian spheres [8][9]. - GHAP employs a two-stage process: first simplifying geometric information (position/covariance), followed by refining appearance features (opacity/color). This decoupling enhances stability [9][19]. Experimental Results - The GHAP method was compared with various pruning-based and end-to-end compression methods. Results indicate that GHAP consistently outperforms other baseline methods while being close to the performance of full-sample end-to-end methods [20][24]. - At a 10% retention rate, GHAP maintains high visual fidelity across different models and scenes, demonstrating its effectiveness in preserving the original geometric structure [23][24]. Conclusion - The GHAP method offers a new perspective on 3DGS compression, focusing on Gaussian mixture model simplification to retain geometric detail. It is designed to be scalable for large 3DGS scenes and is compatible with most existing 3DGS frameworks [28].
Feed-Forward 3D综述:三维视觉如何「一步到位」
机器之心· 2025-11-06 08:58
Core Insights - The article discusses advancements in the field of 3D vision, particularly focusing on the transition from traditional methods to Feed-Forward 3D approaches, which enhance efficiency and generalization capabilities [2][4]. Summary by Sections Overview of Feed-Forward 3D - The article highlights the evolution of 3D reconstruction techniques, from Structure-from-Motion (SfM) to Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the shift towards Feed-Forward 3D methods that eliminate the need for per-scene optimization [2][6]. Key Technological Branches - Five main architectural categories of Feed-Forward 3D methods are identified, each contributing significantly to the field's progress [6][7]. - Neural Radiance Fields (NeRF) introduced a differentiable framework for volume rendering but faced efficiency issues due to scene-specific optimization. The emergence of conditional NeRF has led to various branches focusing on direct prediction of radiance fields [7][9]. - PointMap Models, led by DUSt3R, predict pixel-aligned 3D point clouds directly within a Transformer framework, enhancing efficiency and memory capabilities [9][10]. - 3D Gaussian Splatting (3DGS) represents scenes as Gaussian point clouds, balancing rendering quality and speed. Recent advancements allow for direct output of Gaussian parameters [10][12]. - Mesh, Occupancy, and SDF Models integrate traditional geometric modeling with modern techniques, enabling high-precision surface modeling [14][19]. Applications and Benchmarking - The paper summarizes the application of Feed-Forward models across various tasks, including camera pose estimation, point map estimation, and single-image view synthesis, providing a comprehensive benchmark of over 30 common 3D datasets [16][18][22]. - Evaluation metrics such as PSNR, SSIM, and Chamfer Distance are established to facilitate model comparison and performance assessment [18][23]. Future Challenges and Trends - The article identifies four major open questions for future research, including the integration of Diffusion Transformers, scalable 4D memory mechanisms, and the construction of multimodal large-scale datasets [27][28]. - Challenges such as the predominance of RGB-only data, the need for improved reconstruction accuracy, and difficulties in free-viewpoint rendering are highlighted [29].
Feed-Forward 3D综述:3D视觉进入“一步到位”时代
自动驾驶之心· 2025-10-31 16:03
Core Insights - The article discusses the evolution of 3D vision technologies, highlighting the transition from traditional methods like Structure-from-Motion (SfM) to advanced techniques such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the emergence of Feed-Forward 3D as a new paradigm in the AI-driven era [2][6]. Summary by Categories 1. Technological Evolution - The article outlines the historical progression in 3D vision, noting that previous methods often required per-scene optimization, which was slow and lacked generalization capabilities [2][6]. - Feed-Forward 3D is introduced as a new paradigm that aims to overcome these limitations, enabling faster and more generalized 3D understanding [2]. 2. Classification of Feed-Forward 3D Methods - The article categorizes Feed-Forward 3D methods into five main architectures, each contributing to significant advancements in the field: 1. **NeRF-based Models**: These models utilize a differentiable framework for volume rendering but face efficiency issues due to scene-specific optimization. Conditional NeRF approaches have emerged to allow direct prediction of radiance fields [8]. 2. **PointMap Models**: Led by DUSt3R, these models predict pixel-aligned 3D point clouds directly within a Transformer framework, eliminating the need for camera pose input [10]. 3. **3D Gaussian Splatting (3DGS)**: This innovative representation uses Gaussian point clouds to balance rendering quality and speed, with advancements allowing direct output of Gaussian parameters [11][13]. 4. **Mesh / Occupancy / SDF Models**: These methods combine traditional geometric modeling with modern techniques like Transformers and Diffusion models [14]. 5. **3D-Free Models**: These models learn mappings from multi-view inputs to new perspectives without relying on explicit 3D representations [15]. 3. Applications and Tasks - The article highlights diverse applications of Feed-Forward models, including: - Pose-Free Reconstruction & View Synthesis - Dynamic 4D Reconstruction & Video Diffusion - SLAM and visual localization - 3D-aware image and video generation - Digital human modeling - Robotic manipulation and world modeling [19]. 4. Benchmarking and Evaluation Metrics - The article mentions the inclusion of over 30 commonly used 3D datasets, covering various types of scenes and modalities, and summarizes standard evaluation metrics such as PSNR, SSIM, and Chamfer Distance for future model comparisons [20][21]. 5. Future Challenges and Trends - The article identifies four major open questions for future research, including the need for multi-modal data, improvements in reconstruction accuracy, challenges in free-viewpoint rendering, and the limitations of long-context reasoning in processing extensive frame sequences [25][26].
三维重建综述:从多视角几何到 NeRF 与 3DGS 的演进
自动驾驶之心· 2025-09-22 23:34
Core Viewpoint - 3D reconstruction is a critical intersection of computer vision and graphics, serving as the digital foundation for cutting-edge applications such as virtual reality, augmented reality, autonomous driving, and digital twins. Recent advancements in new perspective synthesis technologies, represented by Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have significantly improved reconstruction quality, speed, and dynamic adaptability [5][6]. Group 1: Introduction and Demand - The resurgence of interest in 3D reconstruction is driven by new application demands across various fields, including city-scale digital twins requiring kilometer-level coverage and centimeter-level accuracy, autonomous driving simulations needing dynamic traffic flow and real-time semantics, and AR/VR social applications demanding over 90 FPS and photo-realistic quality [6]. - Traditional reconstruction pipelines are inadequate for these new requirements, prompting the integration of geometry, texture, and lighting through differentiable rendering techniques [6]. Group 2: Traditional Multi-View Geometry Reconstruction - The traditional multi-view geometry approach (SfM to MVS) has inherent limitations in quality, efficiency, and adaptability to dynamic scenes, which have been addressed through iterative advancements in NeRF and 3DGS technologies [7]. - A comprehensive comparison of various methods highlights the evolution and future challenges in the field of 3D reconstruction [7]. Group 3: NeRF and Its Innovations - NeRF models scenes as continuous 5D functions, enabling advanced rendering techniques that have evolved significantly from 2020 to 2024, addressing issues such as data requirements, texture limitations, lighting sensitivity, and dynamic scene handling [13][15]. - Various methods have been developed to enhance quality and efficiency, including Mip-NeRF, NeRF-W, and InstantNGP, each contributing to improved rendering speeds and reduced memory usage [17][18]. Group 4: 3DGS and Its Advancements - 3DGS represents scenes as collections of 3D Gaussians, allowing for efficient rendering and high-quality output. Recent methods have focused on optimizing rendering quality and efficiency, achieving significant improvements in memory usage and frame rates [22][26]. - The comparison of 3DGS with other methods shows its superiority in rendering speed and dynamic scene reconstruction capabilities [31]. Group 5: Future Trends and Conclusion - The next five years are expected to see advancements in hybrid representations, real-time processing on mobile devices, generative reconstruction techniques, and multi-modal fusion for robust reconstruction [33]. - The ultimate goal is to enable real-time 3D reconstruction accessible to everyone, marking a shift towards ubiquitous computing [34].
港科&地平线&浙大联手开源SAIL-Recon:三分钟重建一座城
自动驾驶之心· 2025-09-02 23:33
Core Insights - The article discusses the SAIL-Recon framework, which integrates scene regression with localization to achieve large-scale Structure from Motion (SfM) using thousands of images efficiently and accurately [7][10][34]. Group 1: Traditional SfM Limitations - Traditional SfM algorithms rely on feature extraction, matching, triangulation, and bundle adjustment, which can fail in low-texture, blurry, or repetitive texture scenes [5]. - Recent research has proposed an end-to-end learnable SfM pipeline that directly regresses scene structure and camera poses from images, but it is limited by GPU memory when handling large-scale scenes [5][10]. Group 2: SAIL-Recon Framework - SAIL-Recon is a multi-task framework that unifies reconstruction and localization without the need for scene-specific training, sampling a few anchor images from large image or video sequences to infer neural scene representations [7][10]. - The framework achieves state-of-the-art (SOTA) performance across multiple benchmarks, surpassing both traditional and learning-based methods in accuracy and efficiency [10][34]. Group 3: Methodology - The SAIL-Recon process involves selecting a small number of anchor images to extract neural scene representations, which are then used to jointly estimate scene coordinates and camera poses for all images [9][10]. - The method employs a transformer to compute scene representations and camera parameters, optimizing GPU memory usage through a key-value cache [11][12]. Group 4: Experimental Results - SAIL-Recon demonstrated superior performance in pose estimation and new view synthesis tasks, achieving the highest PSNR in the Tanks & Temples dataset and completing reconstructions significantly faster than traditional methods [26][32]. - The framework maintains good performance even when reducing the number of anchor images from 10 to 2, indicating robustness in various sampling strategies [32]. Group 5: Limitations and Future Work - The framework's reliance on a fixed global coordinate system may affect certain sequences, suggesting a need for improved anchor image selection strategies [36]. - Uniform sampling could overlook scene areas, indicating potential for research into coverage-aware sampling methods [36].
多样化大规模数据集!SceneSplat++:首个基于3DGS的综合基准~
自动驾驶之心· 2025-06-20 14:06
Core Insights - The article introduces SceneSplat-Bench, a comprehensive benchmark for evaluating visual-language scene understanding methods based on 3D Gaussian Splatting (3DGS) [11][30]. - It presents SceneSplat-49K, a large-scale dataset containing approximately 49,000 raw scenes and 46,000 filtered 3DGS scenes, which is the most extensive open-source dataset for complex and high-quality scene-level 3DGS reconstruction [9][30]. - The evaluation indicates that generalizable methods consistently outperform per-scene optimization methods, establishing a new paradigm for scalable scene understanding through pre-trained models [30]. Evaluation Protocols - The benchmark evaluates methods based on two key metrics in 3D space: foreground mean Intersection over Union (f-mIoU) and foreground mean accuracy (f-mAcc), addressing object size imbalance and reducing viewpoint dependency compared to 2D evaluations [22][30]. - The evaluation dataset includes ScanNet, ScanNet++, and Matterport3D for indoor scenes, and HoliCity for outdoor scenes, emphasizing the methods' capabilities across various object scales and complex environments [22][30]. Dataset Contributions - SceneSplat-49K is compiled from multiple sources, including SceneSplat-7K, DL3DV-10K, HoliCity, and Aria Synthetic Environments, ensuring a diverse range of indoor and outdoor environments [9][10]. - The dataset preparation involved approximately 891 GPU days and extensive human effort, highlighting the significant resources invested in creating a high-quality dataset [7][9]. Methodological Insights - The article categorizes methods into three types: per-scene optimization methods, per-scene optimization-free methods, and generalizable methods, with SceneSplat representing the latter [23][30]. - Generalizable methods eliminate the need for extensive single-scene computations during inference, allowing for efficient processing of 3D scenes in a single forward pass [24][30]. Performance Results - The results from SceneSplat-Bench demonstrate that SceneSplat excels in both performance and efficiency, often surpassing the pseudo-label methods used for its pre-training [24][30]. - The performance of various methods shows significant variation based on the dataset's complexity, indicating the importance of challenging benchmarks in revealing the limitations of competing methods [28][30].