3D生成
Search documents
首次证实RL能让3D模型学会推理,复杂文本描述下生成质量跃升
3 6 Ke· 2026-02-27 02:33
Core Insights - The research introduces the first systematic integration of reinforcement learning (RL) into text-to-3D autoregressive generation, addressing unique challenges in 3D generation compared to 2D [1][3][17] - The study emphasizes the importance of designing reward models specifically for 3D generation, with human preference scores (HPS v2.1) identified as the most effective single reward signal [6][12][17] Group 1: Challenges in 3D Generation - 3D objects lack a "standard view," making it difficult to evaluate geometric consistency, texture quality, and semantic alignment from multiple perspectives [5][6] - The long-range dependencies in 3D generation lead to sparser reward signals, complicating the model's ability to detect errors during the generation process [5][6] Group 2: Reward Model Design - The research tested various reward combinations, concluding that HPS v2.1 alone provides the strongest results, while semantic alignment and aesthetic quality can enhance performance when combined with HPS [6][12] - A surprising finding is that general large models (Qwen2.5-VL) are more robust in assessing 3D consistency than specialized models, filling the gap in reward signals for 3D generation [6][12] Group 3: Algorithm Selection and Training Paradigms - The study reveals that token-level optimization is more suitable for 3D generation than sequence-level operations, which can hinder performance [7][12] - Data diversity is more critical than training duration in RL training for 3D generation, as doubling the training data is effective, while tripling iterations can lead to overfitting [12][17] Group 4: Evaluation Metrics - Existing 3D generation benchmarks fail to assess models' implicit reasoning capabilities under complex text descriptions, leading to the development of the MME-3DR benchmark [10][17] - MME-3DR includes 249 carefully selected complex 3D objects and evaluates multi-view geometric consistency, semantic detail alignment, and texture realism [10][17] Group 5: Model Performance and Contributions - The final model, AR3D-R1, outperformed existing state-of-the-art methods on both MME-3DR and Toys4K benchmarks, demonstrating significant improvements in reasoning capabilities [13][18] - The research establishes a systematic framework for integrating RL into 3D generation, highlighting the need for tailored rewards, algorithms, and training paradigms rather than simply transferring 2D experiences [17][18]
3D生成「ImageNet」来了!腾讯混元开源HY3D-Bench
量子位· 2026-02-06 10:10
Core Insights - The article discusses the advancements in 3D generation technology, highlighting the release of the HY3D-Bench dataset by Tencent's Hunyuan team, which addresses key challenges in the field such as data quality, evaluation standards, and long-tail category coverage [3][4]. Dataset Composition - The HY3D-Bench dataset consists of 252,000 high-quality 3D assets, 240,000 component-level structured annotations, and 125,000 AIGC synthetic samples, providing a standardized data foundation for 3D generation research [19][20]. - Early benchmark datasets like ShapeNet had limitations such as imbalanced category coverage and insufficient data volume, which hindered the practical application of 3D generation technology [4]. - The emergence of large-scale datasets like Objaverse has improved the situation, but challenges remain, particularly in the preprocessing of raw 3D data, which requires significant computational resources and expertise [4][6]. Data Processing Pipeline - Tencent's Hunyuan team developed an automated data processing pipeline that filters and processes raw 3D assets into high-quality, training-ready data packages, significantly reducing the technical barriers for researchers [6][8]. - The pipeline includes initial filtering based on polygon count and UV mapping quality, followed by post-processing steps such as watertight processing and multi-view rendering [6][8]. Component-Level Data Processing - The component data processing aims to intelligently decompose static meshes into semantically consistent component sets, facilitating subsequent component-aware generation tasks [8][10]. - This process utilizes topological connectivity analysis to identify physically separated components within 3D assets, enhancing the modularity of 3D generation [8]. AIGC Synthesis - To address the scarcity of long-tail data, the team created a three-step generation pipeline that synthesizes data for embodied intelligent simulation needs [10][12]. - The pipeline includes text expansion using LLMs to generate detailed product descriptions, image generation using text-to-image models, and 3D asset generation using the HY3D-3.0 model [12]. Experimental Results - The lightweight model Hunyuan3D-2.1-Small, trained on the open-source dataset, demonstrates superior generation quality and inference speed compared to traditional methods, achieving a fivefold increase in speed while avoiding common issues like the "Janus Problem" [12][13]. - The dataset's scale includes 252,000 samples for manual modeling, 240,000 samples for component-level data, and 125,000 synthetic samples, providing a robust foundation for 3D generation tasks [13][19]. Future Plans - The team plans to expand the diversity of 3D assets and enhance multi-task adaptability, further exploring the potential of data-driven methods in 3D generation [20].
3D版Nano Banana来了!AI修模成为现实,3D生成进入可编辑时代
量子位· 2026-01-27 03:53
Core Viewpoint - The article highlights the emergence of 3D generation technology as a critical area in AI, with significant advancements led by the Chinese team Hyper3D, particularly through their product Rodin Gen-2 Edit, which integrates 3D generation and editing capabilities [1][3][27]. Group 1: 3D Generation and Editing Technology - Hyper3D has launched Rodin Gen-2 Edit, the first commercial product that combines "3D generation" and "3D editing" into a complete workflow, marking the entry of 3D generation into the editable era [3][11]. - The editing functionality allows users to select specific areas of a model and input text commands for modifications, such as changing a robot's arms to cannons, demonstrating a user-friendly approach to 3D model editing [4][5][20]. - The platform supports importing any existing models, including third-party AI-generated models, for editing, establishing Hyper3D's editing capabilities as a foundational infrastructure rather than a standalone feature [9][11]. Group 2: Technological Advancements and User Experience - Hyper3D Rodin showcases cutting-edge technology, enabling users to modify, add, or remove model components through natural language without affecting the overall structure, thus revolutionizing 3D modeling [13][21]. - The transition from "generation" to "editing" fills a crucial gap in the AI workflow, allowing for iterative design processes rather than random generation, which has been common in the past [14][19]. - The platform's capabilities are enhanced by the introduction of 3D ControlNet, which allows precise control over geometric structures during the generation phase, and the BANG technology, which facilitates recursive disassembly of complex models for localized editing [17][25]. Group 3: Market Position and Future Directions - Hyper3D's advancements have been recognized by the market, with the team completing two rounds of funding from top-tier VC and strategic industry players in 2025, indicating strong investor confidence in their technology [27]. - The company aims to extend beyond single-object editing, with future developments targeting the creation of complete 3D scenes that include objects, relationships, and physical constraints, laying the groundwork for future "world models" and embodied intelligence infrastructure [26]. - The launch of Rodin Gen-2 Edit represents a significant step in making 3D generation not just feasible but practically usable, providing a valuable reference point for the industry [27].
「商汤系」跑出一堆独角兽,可闫俊杰无法复制
36氪· 2025-12-26 00:01
Core Viewpoint - The article highlights the emergence of AI companies from the "SenseTime system," emphasizing their rapid growth and potential to become unicorns, particularly focusing on MiniMax and Vivix AI as key players in the AI landscape [4][10]. Group 1: Company Performance - MiniMax has achieved significant revenue growth, with a reported income of $53.44 million for the first nine months of 2025, surpassing its total revenue of $30.52 million for 2024 [7][9]. - The company is nearing breakeven for its consumer products, indicating a strong commercial viability [7]. - Vivix AI, founded by a former executive from SenseTime, reached a valuation of $1.32 billion within just ten months of its establishment [10]. Group 2: Market Position and Strategy - The "SenseTime system" has produced several successful AI startups, with each major sector in AI featuring companies founded by former SenseTime employees [10][11]. - MiniMax is recognized for its forward-looking strategies, having launched innovative AI applications and models ahead of industry trends, such as the MoE model [20][21]. - The company has a diverse product matrix, which has helped it remain resilient during market fluctuations [21]. Group 3: Talent and Experience - The success of the "SenseTime system" is attributed to the technical expertise and practical experience of its founders, many of whom have a strong background in AI technology and product development [12][18]. - The article notes that the unique combination of technical skills and project experience among these entrepreneurs has made them attractive to investors [15][18]. - The ability to replicate successful strategies and learn from past experiences is emphasized as a key factor in the growth of these companies [26].
Gemini 3+Nano Banana Pro+3D 生成+手势控制=?藏师傅教你炫酷展示运动成果
歸藏的AI工具箱· 2025-12-05 12:02
Core Viewpoint - The article discusses the creation of personalized 3D models and posters for outdoor activities such as hiking, skiing, cycling, and camping, utilizing the Nano Banana Pro tool to showcase achievements while maintaining privacy [4][6][8]. Group 1: Skiing - The skiing poster design involves creating a visual representation of ski tracks on a snow-covered mountain, integrating user-uploaded images of ski equipment to enhance the visual appeal [10][11]. - The atmosphere is emphasized with strong reflections and a snowy forest backdrop, creating a dynamic and engaging scene [11][12]. - The final output includes a title, data from uploaded images, and a short phrase related to the skiing experience [13]. Group 2: Cycling - The cycling poster design focuses on a 3D terrain model featuring a prominent local landmark, with a clear road path illustrating the cycling route [16][17]. - User-uploaded images of bicycles are incorporated into the design, ensuring accurate representation of colors and features [16]. - The visual style includes a shallow depth of field and morning light effects, enhancing the overall aesthetic [17][18]. Group 3: Hiking - The hiking poster design highlights a local landmark with a winding path, integrating user-uploaded images of hiking gear to symbolize the hiking experience [21][22]. - The atmosphere is crafted with a dreamlike quality, featuring elements like mist and reflections on water surfaces [21]. - The final design includes a title, data from uploaded images, and specific geographic coordinates [23]. Group 4: Camping - The camping poster design showcases a local landscape with a focus on the camping setup, using user-uploaded images of tents and camping gear [25][26]. - The scene is set in a night mode with warm lighting effects emanating from the tent, creating a cozy atmosphere [26][27]. - The final output includes a title, data on elevation, temperature, and camping duration, along with a poetic phrase about the camping experience [28]. Group 5: 3D Model Creation - The article explains the process of converting images into 3D models using tools like tripo3d.ai or hyper3d.ai, emphasizing the simplicity of the operation [31][33]. - Users are instructed to download the generated models in GLB format for compatibility [33]. - The final step involves uploading the 3D model and associated data to a platform for interactive display, including gesture control features [36][38]. Group 6: Product Development - The article outlines the straightforward process of building a webpage to showcase 3D models and data visualizations, highlighting the ease of use of the Gemini 3 Pro tool [40][41]. - The design aims for a clean, minimalistic aesthetic while incorporating interactive elements for user engagement [41]. - The article encourages sharing experiences and creations within the outdoor community [42][43].
从游戏工厂到空间智能仿真:混元 3D 为何是腾讯 AI 的“侧翼突围”
AI前线· 2025-11-27 04:02
Core Insights - Tencent's "Hunyuan 3D" has accelerated its global outreach by launching an international version of its creative engine and achieving over 3 million downloads of its open-source model, marking a significant step in its AI strategy [2][3][21] - Tencent's unique position as a technology company lies in its combination of massive 3D demand from various sectors, mature multi-modal capabilities of its Hunyuan model, and a comprehensive distribution network through WeChat, QQ, and Tencent Cloud [3][4] Group 1: Business and Technology Integration - The traditional 3D industry faces challenges of high costs and long production times, with art costs in game development often accounting for 50%-80% of total expenses, and 3D asset creation being the most resource-intensive [6][7] - Hunyuan 3D aims to address these issues by enhancing the efficiency of 3D asset production and solving scene-level construction problems through two main technical lines [8][9] - The integration of Hunyuan 3D into Tencent's internal game projects has shown promising results, significantly reducing the time required to create 3D assets from days to mere hours [12][14] Group 2: Market Applications and Expansion - Hunyuan 3D's applications extend beyond gaming, with over 150 companies across various industries, including e-commerce, film, advertising, and 3D printing, utilizing its models to enhance production efficiency [25][27] - The technology has enabled a shift in consumer 3D printing, allowing users to generate personalized models with minimal expertise, thus expanding the market [26] - In advertising and content creation, Hunyuan 3D is poised to transform how brands engage with consumers by moving from static displays to interactive experiences [27][29] Group 3: Strategic Vision and Competitive Edge - Tencent's AI strategy focuses on building ecological barriers rather than merely scaling operations, emphasizing quality, controllability, and cost-effectiveness as foundational capabilities [31][32] - The company has achieved recognition for its Hunyuan image model, which topped global rankings, indicating its leadership in multi-modal technology [31] - Tencent's approach to 3D generation is characterized by a commitment to understanding industry pain points and fostering an ecosystem that supports sustainable growth [39][40]
图片生成仿真!这个AI让3D资产「开箱即用」,直接赋能机器人训练
量子位· 2025-11-23 04:09
Core Insights - The article introduces PhysX-Anything, the first framework for generating 3D assets with physical properties directly from a single image, aimed at enhancing embodied AI and robotics applications [5][27][28]. Group 1: Framework Overview - PhysX-Anything allows for the generation of high-quality, sim-ready 3D assets that include explicit geometric structures, joint movements, and physical parameters, addressing the limitations of existing 3D generation methods [5][6]. - The framework employs a "coarse-to-fine" generation approach, utilizing multiple dialogue rounds to create both global physical descriptions and detailed geometric information from a single image [8][14]. Group 2: Technical Innovations - A novel 3D representation method is introduced, achieving a compression ratio of 193 times while retaining geometric structure, inspired by voxel representation [9][27]. - The framework utilizes a tree-structured, VLM-friendly format to enhance the richness of physical attributes and textual descriptions, facilitating better understanding and reasoning by the VLM [12]. Group 3: Performance Evaluation - PhysX-Anything outperforms existing methods like URDFormer and PhysXGen in both geometric and physical attribute metrics, demonstrating superior generalization capabilities [18][20]. - Human evaluations indicate that the generated structures from PhysX-Anything received the highest scores for both geometric and physical attributes, confirming its effectiveness [22]. Group 4: Practical Applications - The generated sim-ready 3D assets can be directly imported into simulators for various robotic strategy learning tasks, showcasing their practical utility in embodied intelligence applications [25][26]. - The framework is expected to drive a paradigm shift from "visual modeling" to "physical modeling" in 3D vision and robotics research [28].
95 后团队做 3D 大模型,拿下头部游戏重磅合作,正在定义 3D 生成的新规则
Founder Park· 2025-11-18 11:06
Core Insights - The article highlights the significant advancements made by Yingmou Technology in the field of 3D generation, particularly through their model Rodin and its latest iteration, Rodin Gen-2, which has achieved substantial improvements in generation quality and controllability [2][6][9]. Group 1: Company Achievements - Yingmou Technology's Rodin model was showcased at GDC, capturing the attention of top game developers and leading to the successful application of 3D generation technology in mobile gaming [2]. - The company recently completed a multi-million dollar funding round led by BlueRun Ventures, with participation from ByteDance and Sequoia China, positioning it as a leading startup in the 3D large model sector [2]. - The research paper "CLAY" received nominations for best papers at SIGGRAPH, marking a significant milestone for the young team that has been focused on 3D research since its inception [2][3]. Group 2: Technological Innovations - Rodin Gen-2 has been upgraded to utilize a dataset of millions and billions of parameters, resulting in a qualitative leap in generation quality, including smoother geometric surfaces and reduced post-processing costs [6][9]. - The introduction of the "Bang to Parts" feature allows users to decompose generated models into smaller components, enhancing the controllability of 3D models and streamlining workflows in various applications [9][12]. - The model's ability to generate clean and clear 3D meshes reduces the need for extensive repairs in software like Blender and Unity, making it more production-ready [8]. Group 3: Industry Trends - Major companies are increasingly investing in 3D generation technologies, with Roblox open-sourcing CUBE 3D and ByteDance releasing Seed3D 1.0, indicating a growing trend in the industry [6]. - The demand for rapid and accurate 3D model generation is driving innovations, with Yingmou's technology achieving model generation speeds of under 10 seconds, catering to diverse industry needs [24]. - The team believes that 3D generation will play a crucial role in future applications, serving as a foundational technology for various sectors, including digital content creation, industrial design, and AR/VR interactions [29].
智能早报丨字节跳动推出3D生成大模型;美法官承认使用人工智能导致法院裁决出错
Guan Cha Zhe Wang· 2025-10-24 02:00
Group 1 - ByteDance's Seed team launched a 3D generative model called Seed3D 1.0, capable of generating high-quality simulation-level 3D models from a single image using a Diffusion Transformer architecture [1] - Kuaishou's StreamLake officially released an AI coding product matrix, including the intelligent development tool CodeFlicker and self-developed large models KAT-Coder, with KAT-Coder-Pro V1 achieving a 73.4% solution rate in SWE-bench Verified tests, surpassing GPT-5 and Claude Sonnet 4 [2] - Apple is reportedly considering acquiring Warner Bros to expand its Apple TV streaming lineup, with other major players like Amazon and Paramount also interested in bidding [3] Group 2 - Two federal judges in the U.S. acknowledged that court rulings were flawed due to the use of AI in drafting, which did not undergo the usual review process, prompting them to improve the review methods [4] - Due to worsening chip supply issues, semiconductor supplier Ansem Semiconductor has reduced or suspended deliveries, causing concerns in the German automotive industry, with Volkswagen forced to halt production at its Wolfsburg plant [5] - Ansem Semiconductor's largest packaging and testing facility is located in Dongguan, China, responsible for about 70% of its global packaging tasks, highlighting the critical role of this facility in the automotive supply chain [5]
10.23犀牛财经晚报:权益基金发行又见“日光基” 京东旗下公司已获香港保险经纪牌照
Xi Niu Cai Jing· 2025-10-23 10:25
Group 1: Equity Fund Market - The equity fund issuance market has seen a resurgence of "one-day sold-out" funds, with 16 equity funds sold out in one day since September [1] - The recently issued Huatai Bairui Yingtai Stable 3-Month Holding Mixed FOF fund raised over 5 billion yuan in a single day [1] - The increase in active fund issuance indicates a notable rise in investor risk appetite [1] Group 2: Banking and Financial Products - As of the end of Q3 2025, the total scale of the banking wealth management market reached 32.13 trillion yuan, a year-on-year increase of 9.42% [1] - The number of existing wealth management products in the market is 43,900, reflecting a year-on-year increase of 10.01% [1] - Wealth management products from financial companies account for 91.13% of the total market [1] Group 3: Corporate Developments - JD's subsidiary Jingda HK Trading Co., Limited has obtained a Hong Kong insurance brokerage license, valid until October 2028 [1] - ByteDance's Seed team launched a 3D generative model, Seed3D 1.0, which can create high-quality 3D models from single images [2] - Anshi Semiconductor (China) has assured clients that all products produced in China comply with local laws and regulations [2] Group 4: Regulatory Actions - Beijing Securities Regulatory Bureau has mandated corrective measures for Beijing Sunshine Tianhong Asset Management Co., Ltd. due to non-compliance with information disclosure regulations [3] Group 5: Financing and Investments - New Stone Technology has completed over $500 million in Pre-IPO financing, with Tencent and other notable investors participating [7] - Xinhua Securities has received approval from the China Securities Regulatory Commission to issue up to 10 billion yuan in technology innovation corporate bonds [7] Group 6: Project Contracts and Investments - Jinggong Steel Structure signed a contract for a project in Saudi Arabia worth 6.5 billion Saudi Riyals (approximately 1.23 billion yuan) [8] - Chuanfa Longmang plans to invest 366 million yuan in a 100,000 tons/year lithium dihydrogen phosphate project [9] Group 7: Financial Performance - High-speed Rail Electric reported a 54.32% year-on-year increase in net profit for the first three quarters of 2025 [10] - Huaguang Bio achieved a 146.55% year-on-year increase in net profit for the same period [11] - Northern Navigation turned a profit with a net profit of 125 million yuan, compared to a loss in the previous year [13]