Core Viewpoint - Step1X-3D is a newly released and open-sourced 3D model with a total parameter count of 4.8 billion, designed to generate high-fidelity and controllable 3D content for various applications including gaming, film, and industrial design [1][3]. Group 1: Data and Algorithm Optimization - Step1X-3D is built on a foundation of over 5 million raw data points, resulting in a training sample library of 2 million high-quality, standardized samples, addressing the industry's data scarcity and quality issues [4]. - The model employs enhanced mesh to SDF conversion techniques, improving the success rate of watertight geometry conversion by 20%, thus enhancing its generalization ability and detail capture [7]. Group 2: 3D Native Generation - The model features a two-stage architecture that decouples geometry and texture representation, ensuring the generated models are structurally reliable and visually accurate, avoiding geometric distortion [10]. - The geometry generation utilizes an innovative mixed VAE-DiT architecture to produce watertight TSDF representations, capturing rich geometric details through techniques like sharp edge sampling [15]. - Texture generation is optimized using a powerful SD-XL model, ensuring vibrant colors and realistic textures that maintain consistency across multiple views, effectively avoiding common distortions and seams [16]. Group 3: Control and Usability - Step1X-3D significantly enhances the controllability and usability of 3D content generation, allowing users to intuitively adjust various attributes such as symmetry and surface details [18][19]. - The architecture's design aligns closely with mainstream 2D generation models, facilitating the integration of established 2D control techniques, thus making the creation process more precise [18]. Group 4: Performance Evaluation - Step1X-3D underwent rigorous quantitative and qualitative assessments, outperforming several mainstream models in key dimensions, particularly achieving the highest CLIP-Score among compared models, indicating strong content and input semantic consistency [23][25]. Group 5: Team and Vision - The development teams, Step1X-3D and LightIllusions, aim to advance AGI and focus on 3D AIGC and spatial intelligence technologies, with a commitment to enhancing 3D content production capabilities and commercializing 3D applications [27].
阶跃星辰×光影焕像联合打造超强3D生成引擎Step1X-3D!还开源全链路训练代码
机器之心·2025-05-16 02:42