Workflow
3D AIGC
icon
Search documents
NeurIPS 2025 Spotlight | PhysX-3D:面向真实物理世界的3D资产生成范式
机器之心· 2025-10-11 08:06
Core Insights - The article presents PhysXNet, the first systematically annotated 3D dataset based on physical properties, addressing the gap between virtual 3D assets and real-world physics [6][9][27] - It introduces PhysXGen, a novel framework for generating 3D assets that incorporates physical attributes, enhancing the realism and applicability of 3D models in various fields [9][18][27] Dataset Overview - PhysXNet includes over 26,000 annotated 3D objects with detailed physical properties, while the extended version, PhysXNet-XL, contains over 6 million programmatically generated 3D objects [9][10][16] - The dataset covers five core dimensions: physical scale, materials, affordance, kinematic information, and textual descriptions, providing a comprehensive resource for 3D modeling [6][9][27] Annotation Process - A human-in-the-loop annotation framework was developed to efficiently collect and label physical information, ensuring high-quality data [11][13] - The annotation process involves two main stages: initial data collection and determination of kinematic parameters, utilizing advanced models like GPT-4o for accuracy [13][11] Generation Methodology - PhysXGen integrates physical attributes with geometric structure and appearance, achieving a dual optimization goal for generating realistic 3D assets [18][27] - The framework demonstrates significant improvements in generating physical properties compared to existing methods, with relative performance enhancements in various dimensions [23][24] Experimental Results - The evaluation of PhysXGen shows notable advancements in both geometric quality and physical property accuracy, outperforming baseline methods in multiple metrics [20][21][23] - The results indicate a 24% improvement in physical scale, 64% in materials, 28% in kinematic parameters, and 72% in affordance compared to traditional approaches [23][24] Conclusion - The article emphasizes the importance of bridging the gap between 3D assets and real-world physics, highlighting the potential impact of PhysXNet and PhysXGen on fields such as embedded AI, robotics, and 3D vision [27]
干货超标!腾讯混元3D负责人郭春超:真正的3D AIGC革命,还没开始!
AI科技大本营· 2025-05-16 01:33
Core Viewpoint - The article emphasizes that the true revolution of 3D AIGC (AI-Generated Content) has yet to begin, despite significant advancements in the technology [4][6]. Group 1: Current State of 3D AIGC - The current 3D AIGC technology has made notable progress, but it is still in its early stages compared to more mature text and image generation technologies [9][22]. - The development of 3D generation is rapidly evolving, with the industry only beginning to explore its potential in 2024 [22][20]. - The existing technology can generate static 3D models but faces challenges in integrating into professional-grade CG pipelines [9][12]. Group 2: Challenges in 3D Generation - There are significant challenges in data scarcity and utilization efficiency, as acquiring 3D data is much more difficult than images [9][32]. - The current 3D generation capabilities are limited, with a need for improvement in the efficiency and quality of generated assets [12][43]. - The industry must overcome hurdles related to the integration of AI into existing workflows, particularly in automating processes like topology and UV mapping [24][30]. Group 3: Technological Evolution and Future Directions - The evolution of technology is moving towards a combination of autoregressive models and diffusion models, which may enhance controllability and memory capabilities in 3D generation [9][36]. - The goal is to create a comprehensive 3D world model that can understand and generate complex scenes, requiring advancements in physical consistency modeling and spatial semantic coherence [19][40]. - By 2025, the aim is to achieve object-level generation that approaches the quality of manual modeling, with initial forms of scene generation [20][19]. Group 4: Open Source and Community Engagement - The open-source approach is seen as a critical catalyst for accelerating technological development and fostering a thriving ecosystem in the 3D AIGC space [9][28]. - Continuous model iteration and community feedback are essential for maintaining a competitive edge in the rapidly evolving field [33][34]. - The company plans to release more models and datasets to lower industry barriers and promote widespread adoption [19][20]. Group 5: Impact on Professionals and Industry - AI is positioned as a powerful productivity tool for 3D designers rather than a replacement, enabling faster realization of creative ideas [47][46]. - The integration of AI tools will likely transform the role of 3D designers into hybrid professionals who can effectively leverage AI alongside their creative skills [47][46]. - The potential for AI to democratize 3D content creation is acknowledged, but it is emphasized that professional expertise will still be valuable in high-stakes environments [26][47].
AI无限生成《我的世界》,玩家动动键盘鼠标自主控制!国产交互式世界模型来了
量子位· 2025-05-13 03:01
Core Viewpoint - The article discusses the launch of Matrix-Game, an interactive world modeling tool developed by Kunlun Wanwei, which allows users to create and explore virtual environments in a highly realistic manner using simple mouse and keyboard commands. This tool leverages AI to generate content in real-time, significantly lowering the barriers to entry for users and enhancing creative freedom while adhering to physical realism. Group 1: Matrix-Game Overview - Matrix-Game enables users to interact with and create detailed virtual content that aligns with real-world physics, offering a low operational threshold for users [10][41]. - The tool supports various environments, including forests, beaches, deserts, glaciers, rivers, and plains, and allows for basic and complex movements, perspective shifts, and actions like jumping and attacking [5][6][10]. - The Matrix-Game-MC dataset is a large-scale dataset that includes unlabelled Minecraft game videos and controllable video data, facilitating the model's learning of complex environmental dynamics and interaction patterns [14][15]. Group 2: Technical Implementation - The main model framework is based on diffusion models, which include image-to-world modeling, autoregressive video generation, and controllable interaction design [18][20]. - The image-to-world modeling process generates interactive video content from a single image, integrating user actions without relying on language prompts [21]. - The autoregressive video generation ensures temporal consistency by generating video segments based on previous frames, while controllable interaction design enhances the model's responsiveness to user inputs [23][27]. Group 3: Evaluation and Performance - The GameWorld Score evaluation system assesses the performance of interactive world generation models across four dimensions: visual quality, temporal quality, action controllability, and physical rule understanding [29][30]. - Matrix-Game outperforms existing models like Decart's Oasis and Microsoft's MineWorld in all evaluated dimensions, achieving a user preference rate of 96.3% in blind tests [36][39]. - In specific actions such as movement and attack, Matrix-Game maintains over 90% accuracy, demonstrating high precision in fine-grained control [39]. Group 4: Industry Implications - Matrix-Game has potential applications in rapidly building virtual game worlds, producing content for film and the metaverse, training embodied agents, and generating data [41][42]. - The trend towards 3D AI-generated content (AIGC) is gaining traction, with major companies investing in this area, indicating a shift from 2D to 3D technologies [43][46]. - The advancements in 3D AIGC and world modeling are expected to provide new interactive experiences, making it a focal point for future AI developments [48][49].
腾讯控股(00700)混元3D生成模型全新升级 建模精细度大幅提升
智通财经网· 2025-04-23 06:27
Core Insights - Tencent Holdings has officially released the 2.5 version of its Mix Yuan 3D generative model, significantly enhancing modeling precision and achieving ultra-high-definition geometric detail modeling [1] - The total parameter count of the Mix Yuan 3D model has increased from 1 billion to 10 billion, with the effective polygon count rising over tenfold [1] - The new version supports 4K high-definition textures and fine-grained bump mapping, improving the visual effects of surface textures [1] Group 1 - The effective geometric resolution has been upgraded to 1024, transitioning from standard definition to high definition [1] - The Mix Yuan 3D AI creation engine has been fully updated to the v2.5 model base, with the free generation quota doubled to 20 times per day [1] - The Mix Yuan 3D generation API has been officially launched on Tencent Cloud, available for enterprises and developers [1] Group 2 - The new version is the first in the industry to achieve multi-view input for generating PBR models, significantly enhancing the quality and realism of generated outputs [1] - The bone skinning system has been optimized to support automatic bone binding and weight assignment under non-standard poses, greatly improving 3D animation generation efficiency [1] - The 3D generation workflow features have been further upgraded, providing professional pipeline templates for intelligent decimation models and multi-view 3D models [1] Group 3 - Tencent Mix Yuan actively embraces the open-source ecosystem, with foundational models 1.0 and 2.0, as well as accelerated, multi-view, and lightweight models, all open-sourced [2] - The total number of stars on GitHub for Mix Yuan 3D has exceeded 12,000, continuously enriching the 3D AIGC community [2]
单图直出CAD工程文件!CVPR 2025新研究解决AI生成3D模型“不可编辑”痛点|魔芯科技NTU等出品
量子位· 2025-04-14 09:09
Core Viewpoint - CADCrafter represents a significant advancement in 3D modeling technology, enabling the generation of editable CAD files directly from images, thus transforming the traditional image-to-mesh paradigm into an image-to-CAD framework [1][16]. Group 1: Technology Overview - CADCrafter differs from previous 3D generation methods by producing original CAD files from various sources, including rendered parts and everyday objects [2]. - The system compiles CAD instructions into 3D files suitable for production, such as STP format [3]. - Users can edit objects through CAD instructions, enhancing flexibility in design [4]. Group 2: Performance Improvements - Experimental results indicate that CADCrafter shows significant improvements in practicality and surface quality compared to existing 3D generation methods [5]. - The technology allows for high-quality CAD model generation from a single image, overcoming challenges associated with traditional mesh models [21][26]. Group 3: Research and Development - The research team comprises members from KOKONI 3D, Nanyang Technological University, A*STAR, Westlake University, University of Texas at Austin, and Zhejiang University [7]. - The study has been accepted for presentation at CVPR 2025, highlighting its relevance in the field [9]. Group 4: Methodology - CADCrafter employs a two-stage generation framework combining Variational Autoencoder (VAE) and Diffusion Transformer for effective CAD instruction mapping [19][20]. - A distillation strategy is used to transfer knowledge from a multi-view model to a single-view model, enabling high-quality CAD generation from a single image [21]. - A code validity check mechanism is integrated to ensure generated CAD instructions are compilable, enhancing the success rate of 3D model generation [22][25]. Group 5: Practical Applications - The technology allows engineers to quickly generate editable CAD models from photographs of existing parts, streamlining prototype design and part reconstruction processes [30]. - CADCrafter's ability to produce high-quality, editable CAD files from everyday objects demonstrates its potential for real-world applications in manufacturing and maintenance [28][30].
速递|全球首个多模态交互3D大模型来了,GPT-4o都没做到的,它做到了
Z Potentials· 2025-04-14 02:30
Core Viewpoint - The launch of GPT-4o and its multimodal capabilities has garnered significant attention in the global AI community, particularly with its ability to generate images through combined text, image, voice, and video training [1]. Group 1: GPT-4o and Neural4D 2o - GPT-4o supports multiple modalities in a single model, enhancing image generation with improved context understanding and feature retention [1]. - DreamTech's Neural4D 2o is the first global multimodal 3D model that allows for natural language interaction and editing, supporting text and image inputs [1]. - Neural4D 2o utilizes a multimodal transformer encoder and 3D DiT decoder to achieve high precision in local editing, character ID retention, and style transfer [1]. Group 2: User Experience and Application - The practical application of Neural4D 2o shows significant improvements in stability, context consistency, and local editing capabilities, although users experience longer wait times of 2-5 minutes due to server limitations [8]. - The technology allows users to perform tasks previously reserved for professional 3D designers, indicating a shift towards democratizing 3D design capabilities [8]. Group 3: Company Vision - DreamTech aims to enhance the experience of AIGC creators and consumers through innovative products and services, with a vision to create seamless, real-time interactive 4D experiences using advanced AI technology [9].