Workflow
3D生成
icon
Search documents
首次证实RL能让3D模型学会推理,复杂文本描述下生成质量跃升
3 6 Ke· 2026-02-27 02:33
图像生成用RL已经打出了漂亮的成绩单,那3D生成呢? 当GRPO让大模型在数学、代码推理上实现质变,研究团队率先给出答案——首个将强化学习系统性引入文本到3D自回归生成的研究正式诞生,并被 CVPR 2026接收。该研究不只是简单移植2D经验,而是针对3D生成的独特挑战,从奖励设计、算法选择、评测基准到训练范式,做了一套完整的系统性 探索。 核心矛盾在于:3D对象没有「标准视角」。一张图对不对,人一眼就能看出来;但一个3D物体,需要从多个视角同时评估几何一致性、纹理质感与语义 对齐——任何一个维度设计不当,训练就会崩。 更深层的问题是,3D生成模型在自回归解码时,每一个token都携带着对整体结构的隐式承诺。这种长程依赖让奖励信号的稀疏性问题在3D中比2D更加突 出——模型很难在中途感知到哪里出了问题。 研究团队将这个问题拆成四个维度系统研究: 奖励模型怎么设计——哪类奖励信号对3D生成最有效? RL算法怎么选——GRPO的哪些变体适合3D的序列特性? 为什么3D比2D难得多? RL在文本、图像生成上屡试不爽,但直接搬到3D行不通。 最出人意料的发现:通用大模型(Qwen2.5-VL)评估3D一致性,比专用模 ...
3D生成「ImageNet」来了!腾讯混元开源HY3D-Bench
量子位· 2026-02-06 10:10
腾讯混元团队 投稿 量子位 | 公众号 QbitAI 3D生成如今在可用性上已经达到了一眼惊艳的程度。 但数据质量参差、评估标准缺失、长尾类别覆盖不足这三大痛点,依然困扰着该领域的研究者们。 该工作通过自动化数据清洗流水线,从Objaverse等大规模原始库中筛选并处理了 25.2万 个高质量3D资产,提供包括水密网格、多视角渲染 图像在内的"即用型"数据集,同时还包含 24万 个3D部件分解结果,显著降低了3D生成模型的训练门槛。 另外,为补充学术数据集多样性不足,创新性地引入AIGC驱动合成管道,利用LLM生成语义描述、扩散模型生成图像,并通过HY3D-3.0引 擎转化为高保真3D资产,均匀覆盖了1252个类别,平衡了常见类别和长尾类别数据分布的差异。 实验显示,基于该基准的轻量级模型(Hunyuan3D-2.1-Small)在生成质量和推理速度上均优于传统方法,该数据集为机器人仿真、虚拟现 实等下游应用提供了坚实的数据基石。 数据集组成 高质量基准数据集的可用性始终是3D生成模型发展的核心制约因素。早期基准数据集如ShapeNet虽为3D生成研究奠定基础,但存在 类别覆 盖失衡、几何结构简单、数据量不足 ...
3D版Nano Banana来了!AI修模成为现实,3D生成进入可编辑时代
量子位· 2026-01-27 03:53
Core Viewpoint - The article highlights the emergence of 3D generation technology as a critical area in AI, with significant advancements led by the Chinese team Hyper3D, particularly through their product Rodin Gen-2 Edit, which integrates 3D generation and editing capabilities [1][3][27]. Group 1: 3D Generation and Editing Technology - Hyper3D has launched Rodin Gen-2 Edit, the first commercial product that combines "3D generation" and "3D editing" into a complete workflow, marking the entry of 3D generation into the editable era [3][11]. - The editing functionality allows users to select specific areas of a model and input text commands for modifications, such as changing a robot's arms to cannons, demonstrating a user-friendly approach to 3D model editing [4][5][20]. - The platform supports importing any existing models, including third-party AI-generated models, for editing, establishing Hyper3D's editing capabilities as a foundational infrastructure rather than a standalone feature [9][11]. Group 2: Technological Advancements and User Experience - Hyper3D Rodin showcases cutting-edge technology, enabling users to modify, add, or remove model components through natural language without affecting the overall structure, thus revolutionizing 3D modeling [13][21]. - The transition from "generation" to "editing" fills a crucial gap in the AI workflow, allowing for iterative design processes rather than random generation, which has been common in the past [14][19]. - The platform's capabilities are enhanced by the introduction of 3D ControlNet, which allows precise control over geometric structures during the generation phase, and the BANG technology, which facilitates recursive disassembly of complex models for localized editing [17][25]. Group 3: Market Position and Future Directions - Hyper3D's advancements have been recognized by the market, with the team completing two rounds of funding from top-tier VC and strategic industry players in 2025, indicating strong investor confidence in their technology [27]. - The company aims to extend beyond single-object editing, with future developments targeting the creation of complete 3D scenes that include objects, relationships, and physical constraints, laying the groundwork for future "world models" and embodied intelligence infrastructure [26]. - The launch of Rodin Gen-2 Edit represents a significant step in making 3D generation not just feasible but practically usable, providing a valuable reference point for the industry [27].
「商汤系」跑出一堆独角兽,可闫俊杰无法复制
36氪· 2025-12-26 00:01
Core Viewpoint - The article highlights the emergence of AI companies from the "SenseTime system," emphasizing their rapid growth and potential to become unicorns, particularly focusing on MiniMax and Vivix AI as key players in the AI landscape [4][10]. Group 1: Company Performance - MiniMax has achieved significant revenue growth, with a reported income of $53.44 million for the first nine months of 2025, surpassing its total revenue of $30.52 million for 2024 [7][9]. - The company is nearing breakeven for its consumer products, indicating a strong commercial viability [7]. - Vivix AI, founded by a former executive from SenseTime, reached a valuation of $1.32 billion within just ten months of its establishment [10]. Group 2: Market Position and Strategy - The "SenseTime system" has produced several successful AI startups, with each major sector in AI featuring companies founded by former SenseTime employees [10][11]. - MiniMax is recognized for its forward-looking strategies, having launched innovative AI applications and models ahead of industry trends, such as the MoE model [20][21]. - The company has a diverse product matrix, which has helped it remain resilient during market fluctuations [21]. Group 3: Talent and Experience - The success of the "SenseTime system" is attributed to the technical expertise and practical experience of its founders, many of whom have a strong background in AI technology and product development [12][18]. - The article notes that the unique combination of technical skills and project experience among these entrepreneurs has made them attractive to investors [15][18]. - The ability to replicate successful strategies and learn from past experiences is emphasized as a key factor in the growth of these companies [26].
Gemini 3+Nano Banana Pro+3D 生成+手势控制=?藏师傅教你炫酷展示运动成果
歸藏的AI工具箱· 2025-12-05 12:02
Core Viewpoint - The article discusses the creation of personalized 3D models and posters for outdoor activities such as hiking, skiing, cycling, and camping, utilizing the Nano Banana Pro tool to showcase achievements while maintaining privacy [4][6][8]. Group 1: Skiing - The skiing poster design involves creating a visual representation of ski tracks on a snow-covered mountain, integrating user-uploaded images of ski equipment to enhance the visual appeal [10][11]. - The atmosphere is emphasized with strong reflections and a snowy forest backdrop, creating a dynamic and engaging scene [11][12]. - The final output includes a title, data from uploaded images, and a short phrase related to the skiing experience [13]. Group 2: Cycling - The cycling poster design focuses on a 3D terrain model featuring a prominent local landmark, with a clear road path illustrating the cycling route [16][17]. - User-uploaded images of bicycles are incorporated into the design, ensuring accurate representation of colors and features [16]. - The visual style includes a shallow depth of field and morning light effects, enhancing the overall aesthetic [17][18]. Group 3: Hiking - The hiking poster design highlights a local landmark with a winding path, integrating user-uploaded images of hiking gear to symbolize the hiking experience [21][22]. - The atmosphere is crafted with a dreamlike quality, featuring elements like mist and reflections on water surfaces [21]. - The final design includes a title, data from uploaded images, and specific geographic coordinates [23]. Group 4: Camping - The camping poster design showcases a local landscape with a focus on the camping setup, using user-uploaded images of tents and camping gear [25][26]. - The scene is set in a night mode with warm lighting effects emanating from the tent, creating a cozy atmosphere [26][27]. - The final output includes a title, data on elevation, temperature, and camping duration, along with a poetic phrase about the camping experience [28]. Group 5: 3D Model Creation - The article explains the process of converting images into 3D models using tools like tripo3d.ai or hyper3d.ai, emphasizing the simplicity of the operation [31][33]. - Users are instructed to download the generated models in GLB format for compatibility [33]. - The final step involves uploading the 3D model and associated data to a platform for interactive display, including gesture control features [36][38]. Group 6: Product Development - The article outlines the straightforward process of building a webpage to showcase 3D models and data visualizations, highlighting the ease of use of the Gemini 3 Pro tool [40][41]. - The design aims for a clean, minimalistic aesthetic while incorporating interactive elements for user engagement [41]. - The article encourages sharing experiences and creations within the outdoor community [42][43].
从游戏工厂到空间智能仿真:混元 3D 为何是腾讯 AI 的“侧翼突围”
AI前线· 2025-11-27 04:02
Core Insights - Tencent's "Hunyuan 3D" has accelerated its global outreach by launching an international version of its creative engine and achieving over 3 million downloads of its open-source model, marking a significant step in its AI strategy [2][3][21] - Tencent's unique position as a technology company lies in its combination of massive 3D demand from various sectors, mature multi-modal capabilities of its Hunyuan model, and a comprehensive distribution network through WeChat, QQ, and Tencent Cloud [3][4] Group 1: Business and Technology Integration - The traditional 3D industry faces challenges of high costs and long production times, with art costs in game development often accounting for 50%-80% of total expenses, and 3D asset creation being the most resource-intensive [6][7] - Hunyuan 3D aims to address these issues by enhancing the efficiency of 3D asset production and solving scene-level construction problems through two main technical lines [8][9] - The integration of Hunyuan 3D into Tencent's internal game projects has shown promising results, significantly reducing the time required to create 3D assets from days to mere hours [12][14] Group 2: Market Applications and Expansion - Hunyuan 3D's applications extend beyond gaming, with over 150 companies across various industries, including e-commerce, film, advertising, and 3D printing, utilizing its models to enhance production efficiency [25][27] - The technology has enabled a shift in consumer 3D printing, allowing users to generate personalized models with minimal expertise, thus expanding the market [26] - In advertising and content creation, Hunyuan 3D is poised to transform how brands engage with consumers by moving from static displays to interactive experiences [27][29] Group 3: Strategic Vision and Competitive Edge - Tencent's AI strategy focuses on building ecological barriers rather than merely scaling operations, emphasizing quality, controllability, and cost-effectiveness as foundational capabilities [31][32] - The company has achieved recognition for its Hunyuan image model, which topped global rankings, indicating its leadership in multi-modal technology [31] - Tencent's approach to 3D generation is characterized by a commitment to understanding industry pain points and fostering an ecosystem that supports sustainable growth [39][40]
图片生成仿真!这个AI让3D资产「开箱即用」,直接赋能机器人训练
量子位· 2025-11-23 04:09
Core Insights - The article introduces PhysX-Anything, the first framework for generating 3D assets with physical properties directly from a single image, aimed at enhancing embodied AI and robotics applications [5][27][28]. Group 1: Framework Overview - PhysX-Anything allows for the generation of high-quality, sim-ready 3D assets that include explicit geometric structures, joint movements, and physical parameters, addressing the limitations of existing 3D generation methods [5][6]. - The framework employs a "coarse-to-fine" generation approach, utilizing multiple dialogue rounds to create both global physical descriptions and detailed geometric information from a single image [8][14]. Group 2: Technical Innovations - A novel 3D representation method is introduced, achieving a compression ratio of 193 times while retaining geometric structure, inspired by voxel representation [9][27]. - The framework utilizes a tree-structured, VLM-friendly format to enhance the richness of physical attributes and textual descriptions, facilitating better understanding and reasoning by the VLM [12]. Group 3: Performance Evaluation - PhysX-Anything outperforms existing methods like URDFormer and PhysXGen in both geometric and physical attribute metrics, demonstrating superior generalization capabilities [18][20]. - Human evaluations indicate that the generated structures from PhysX-Anything received the highest scores for both geometric and physical attributes, confirming its effectiveness [22]. Group 4: Practical Applications - The generated sim-ready 3D assets can be directly imported into simulators for various robotic strategy learning tasks, showcasing their practical utility in embodied intelligence applications [25][26]. - The framework is expected to drive a paradigm shift from "visual modeling" to "physical modeling" in 3D vision and robotics research [28].
95 后团队做 3D 大模型,拿下头部游戏重磅合作,正在定义 3D 生成的新规则
Founder Park· 2025-11-18 11:06
Core Insights - The article highlights the significant advancements made by Yingmou Technology in the field of 3D generation, particularly through their model Rodin and its latest iteration, Rodin Gen-2, which has achieved substantial improvements in generation quality and controllability [2][6][9]. Group 1: Company Achievements - Yingmou Technology's Rodin model was showcased at GDC, capturing the attention of top game developers and leading to the successful application of 3D generation technology in mobile gaming [2]. - The company recently completed a multi-million dollar funding round led by BlueRun Ventures, with participation from ByteDance and Sequoia China, positioning it as a leading startup in the 3D large model sector [2]. - The research paper "CLAY" received nominations for best papers at SIGGRAPH, marking a significant milestone for the young team that has been focused on 3D research since its inception [2][3]. Group 2: Technological Innovations - Rodin Gen-2 has been upgraded to utilize a dataset of millions and billions of parameters, resulting in a qualitative leap in generation quality, including smoother geometric surfaces and reduced post-processing costs [6][9]. - The introduction of the "Bang to Parts" feature allows users to decompose generated models into smaller components, enhancing the controllability of 3D models and streamlining workflows in various applications [9][12]. - The model's ability to generate clean and clear 3D meshes reduces the need for extensive repairs in software like Blender and Unity, making it more production-ready [8]. Group 3: Industry Trends - Major companies are increasingly investing in 3D generation technologies, with Roblox open-sourcing CUBE 3D and ByteDance releasing Seed3D 1.0, indicating a growing trend in the industry [6]. - The demand for rapid and accurate 3D model generation is driving innovations, with Yingmou's technology achieving model generation speeds of under 10 seconds, catering to diverse industry needs [24]. - The team believes that 3D generation will play a crucial role in future applications, serving as a foundational technology for various sectors, including digital content creation, industrial design, and AR/VR interactions [29].
智能早报丨字节跳动推出3D生成大模型;美法官承认使用人工智能导致法院裁决出错
Guan Cha Zhe Wang· 2025-10-24 02:00
Group 1 - ByteDance's Seed team launched a 3D generative model called Seed3D 1.0, capable of generating high-quality simulation-level 3D models from a single image using a Diffusion Transformer architecture [1] - Kuaishou's StreamLake officially released an AI coding product matrix, including the intelligent development tool CodeFlicker and self-developed large models KAT-Coder, with KAT-Coder-Pro V1 achieving a 73.4% solution rate in SWE-bench Verified tests, surpassing GPT-5 and Claude Sonnet 4 [2] - Apple is reportedly considering acquiring Warner Bros to expand its Apple TV streaming lineup, with other major players like Amazon and Paramount also interested in bidding [3] Group 2 - Two federal judges in the U.S. acknowledged that court rulings were flawed due to the use of AI in drafting, which did not undergo the usual review process, prompting them to improve the review methods [4] - Due to worsening chip supply issues, semiconductor supplier Ansem Semiconductor has reduced or suspended deliveries, causing concerns in the German automotive industry, with Volkswagen forced to halt production at its Wolfsburg plant [5] - Ansem Semiconductor's largest packaging and testing facility is located in Dongguan, China, responsible for about 70% of its global packaging tasks, highlighting the critical role of this facility in the automotive supply chain [5]
10.23犀牛财经晚报:权益基金发行又见“日光基” 京东旗下公司已获香港保险经纪牌照
Xi Niu Cai Jing· 2025-10-23 10:25
Group 1: Equity Fund Market - The equity fund issuance market has seen a resurgence of "one-day sold-out" funds, with 16 equity funds sold out in one day since September [1] - The recently issued Huatai Bairui Yingtai Stable 3-Month Holding Mixed FOF fund raised over 5 billion yuan in a single day [1] - The increase in active fund issuance indicates a notable rise in investor risk appetite [1] Group 2: Banking and Financial Products - As of the end of Q3 2025, the total scale of the banking wealth management market reached 32.13 trillion yuan, a year-on-year increase of 9.42% [1] - The number of existing wealth management products in the market is 43,900, reflecting a year-on-year increase of 10.01% [1] - Wealth management products from financial companies account for 91.13% of the total market [1] Group 3: Corporate Developments - JD's subsidiary Jingda HK Trading Co., Limited has obtained a Hong Kong insurance brokerage license, valid until October 2028 [1] - ByteDance's Seed team launched a 3D generative model, Seed3D 1.0, which can create high-quality 3D models from single images [2] - Anshi Semiconductor (China) has assured clients that all products produced in China comply with local laws and regulations [2] Group 4: Regulatory Actions - Beijing Securities Regulatory Bureau has mandated corrective measures for Beijing Sunshine Tianhong Asset Management Co., Ltd. due to non-compliance with information disclosure regulations [3] Group 5: Financing and Investments - New Stone Technology has completed over $500 million in Pre-IPO financing, with Tencent and other notable investors participating [7] - Xinhua Securities has received approval from the China Securities Regulatory Commission to issue up to 10 billion yuan in technology innovation corporate bonds [7] Group 6: Project Contracts and Investments - Jinggong Steel Structure signed a contract for a project in Saudi Arabia worth 6.5 billion Saudi Riyals (approximately 1.23 billion yuan) [8] - Chuanfa Longmang plans to invest 366 million yuan in a 100,000 tons/year lithium dihydrogen phosphate project [9] Group 7: Financial Performance - High-speed Rail Electric reported a 54.32% year-on-year increase in net profit for the first three quarters of 2025 [10] - Huaguang Bio achieved a 146.55% year-on-year increase in net profit for the same period [11] - Northern Navigation turned a profit with a net profit of 125 million yuan, compared to a loss in the previous year [13]