Workflow
NeRF
icon
Search documents
肝了几个月,新的端到端闭环仿真系统终于用上了。
自动驾驶之心· 2025-07-03 12:41
Core Viewpoint - The article discusses the development and implementation of the Street Gaussians algorithm for dynamic scene representation in autonomous driving, highlighting its efficiency in training and rendering compared to previous methods [2][3]. Group 1: Background and Challenges - Previous methods faced challenges such as slow training and rendering speeds, as well as inaccuracies in vehicle pose tracking [3]. - Street Gaussians aims to generate realistic images for view synthesis in dynamic urban street scenes by modeling them as a combination of foreground moving vehicles and static backgrounds [3][4]. Group 2: Technical Implementation - The background model is represented as a set of points in world coordinates, each assigned a 3D Gaussian to represent geometry and color, with parameters optimized to avoid invalid values [8]. - The object model for moving vehicles includes a set of optimizable tracking poses and point clouds, with similar Gaussian attributes to the background model but defined in local coordinates [11]. - A 4D spherical harmonic model is introduced to encode temporal information into the appearance of moving vehicles without high storage costs [12]. Group 3: Initialization and Data Handling - Street Gaussians utilizes aggregated LiDAR point clouds for initialization, addressing the limitations of traditional SfM point clouds in urban environments [17]. - For objects with fewer than 2,000 LiDAR points, random sampling is employed to ensure sufficient data for model initialization [17]. Group 4: Course and Learning Opportunities - The article promotes a specialized course on 3D Gaussian Splatting (3DGS), covering various subfields and practical applications in autonomous driving, aimed at enhancing understanding and implementation skills [26][35].
4万多名作者挤破头,CVPR 2025官方揭秘三大爆款主题, 你卷对方向了吗?
机器之心· 2025-05-28 03:02
Core Insights - The article discusses the latest trends in the field of computer vision, highlighting three major research directions that are gaining traction as of 2025 [3][4]. Group 1: Major Research Directions - The three prominent areas identified are: 1. Multi-view and sensor 3D technology, which has evolved from 2D rendering to more complex 3D evaluations, significantly influenced by the introduction of NeRF in 2020 [5]. 2. Image and video synthesis, which has become a focal point for presenting environmental information more accurately, reflecting advancements in the ability to analyze and generate multimedia content [6]. 3. Multimodal learning, which integrates visual, linguistic, and reasoning capabilities, indicating a trend towards more interactive and comprehensive AI systems [7][8]. Group 2: Conference Insights - The CVPR 2025 conference has seen a 13% increase in paper submissions, with a total of 13,008 submissions and an acceptance rate of 22.1%, indicating a highly competitive environment [3]. - The conference emphasizes the importance of diverse voices in the research community, ensuring that every paper, regardless of the author's affiliation, is given equal consideration [8].
虞晶怡教授:大模型的潜力在空间智能,但我们对此还远没有共识|Al&Society百人百问
腾讯研究院· 2025-05-09 08:20
Core Viewpoint - The article discusses the transformative impact of generative AI on technology, business, and society, emphasizing the shift from an information society to an intelligent society, and the need to explore new opportunities and challenges brought by AI [1]. Group 1: Insights from Experts - The article features insights from Yu Jingyi, a prominent professor in computer science, who highlights the current bottlenecks in large model technology and the potential of generative AI in spatial intelligence [5][6]. - Yu emphasizes that the understanding of spatial intelligence is evolving, moving from simple digital reconstructions to more complex intelligent interpretations of space, aided by advancements in generative AI [12][13]. Group 2: Technological Breakthroughs - The development of generative AI technologies, such as DALL-E 3 and GPT-4o, showcases the potential for significant advancements in image and video generation, indicating that the capabilities of language models in visual generation are far from being fully realized [10][11]. - The introduction of the CAST project, which incorporates actor-network theory and physical rules, aims to enhance the understanding of spatial relationships among objects, marking a significant step in the evolution of spatial intelligence [16][18]. Group 3: Challenges and Opportunities - A major challenge in the field is the lack of sufficient 3D scene data, particularly real-world data, which hampers the development of robust AI models for spatial understanding [18][19]. - The article discusses the potential of cross-modal methods to address data scarcity in 3D environments, leveraging advancements in text-to-image technologies to infer spatial relationships [19][20]. Group 4: Future Applications - The short-term applications of spatial intelligence are expected to be in the fields of art creation, gaming, and film production, where generative AI can significantly enhance efficiency and creativity [42][43]. - In the medium to long term, spatial intelligence is anticipated to become a core component of embodied intelligence, potentially transforming industries such as smart devices and robotics [43][44]. Group 5: Ethical Considerations - The rise of AI companionship raises ethical questions regarding emotional dependency and the implications of human-robot interactions, necessitating ongoing discussions about ethical frameworks in technology development [50][51].