Workflow
量子位
icon
Search documents
腾讯用AI把美术管线重新做了一遍,混元3D Studio架构曝光
量子位· 2025-09-22 11:16
Core Viewpoint - Tencent has developed a professional-grade AI workstation called Hunyuan 3D Studio, designed specifically for 3D designers, game developers, and modelers, which streamlines the entire design process from concept to final game assets, significantly reducing production time from days to minutes [3][4]. Group 1: Key Features of Hunyuan 3D Studio - The platform includes seven core technology modules that ensure a seamless and automated workflow throughout the asset creation process [6]. - The workflow encompasses component splitting, controllable image generation, high-fidelity geometry generation, low-poly topology generation, semantic UV unwrapping, texture generation and editing, and rigging and animation effects [9][10]. Group 2: Component Splitting - The component splitting module utilizes connectivity analysis and semantic segmentation algorithms to automatically decompose complex models into logically and functionally independent components, allowing for independent editing and animation [9][10]. - The process involves using a feature extractor and segmentation heads to predict masks for component boundaries, ensuring high accuracy in the segmentation results [15][18]. Group 3: Controllable Image Generation - The controllable image generation module allows users to generate 3D design images in various mainstream game art styles by providing input images and style instructions [33][34]. - The system employs a dataset constructed from pairs of images to achieve precise mapping between realistic images and stylized outputs, enhancing the model's ability to generate consistent and high-quality designs [34][41]. Group 4: High-Fidelity Geometry Generation - High-fidelity geometry generation is based on the Hunyuan 3D framework, which includes a variational autoencoder for compressing and reconstructing 3D geometries [43][45]. - The process utilizes a diffusion model to efficiently generate high-quality samples from single input images, ensuring that the generated geometries align closely with the input prompts [47][50]. Group 5: Low-Poly Topology Generation - The low-poly topology module aims to create clean and art-compliant topology structures from high-fidelity geometries, employing a self-regressive model to predict vertices and faces directly from point clouds [55][56]. - The module incorporates a tokenization method that enhances training and inference efficiency by modeling the mesh as a sequence [59][60]. Group 6: Texture Generation and Editing - The texture generation framework extends 2D diffusion models to support multi-view texture generation, addressing challenges such as cross-view consistency and the transition from RGB textures to physically-based rendering (PBR) materials [76][78]. - A text-guided texture editing model has been developed, allowing for robust texture synthesis and editing based on high-quality PBR material datasets [81][84]. Group 7: Rigging and Animation Effects - The rigging and animation module includes a humanoid character animation branch and a general character animation branch, ensuring accurate bone generation and skinning through a template-based approach [97][100]. - The system allows for parameterized control, enabling high-level artistic adjustments throughout the pipeline while maintaining the ability to incrementally update without complete recalculation [104][105].
首创双NPU架构一鸣惊人!联发科天玑9500重磅加码主动式AI体验
量子位· 2025-09-22 11:16
克雷西 发自 深圳 量子位 | 公众号 QbitAI 一个问题,在当前的智能手机中,如果AI需要成为具有自主意识、会主动实现功能的"常驻能力",而不只是一个需要频繁被动焕新的"功能模 块",什么样的芯片架构才能真正跟得上这样的改变? 联发科给出的答案是:以更犀利的算力和更友好的能效表现,创造 超性能+超能效双NPU架构 ,始终让"AI Always on"。 这是一次从技术形态到使用方式的转变:目的是 让AI不再依赖被动唤醒,而是作为系统能力始终在线 、随时响应,融入用户的每一次操作。 这一趋势正在形成共识。 随着大模型下沉,端侧AI的使用频率越来越高,从输入法里的预测补全,到拍照时的构图建议,从锁屏摘要到图像生成,AI正在从"调用一 次"变为"时刻可用"。 为此,SoC不仅要能跑得快,更要 让AI跑得久、跑得稳 ,甚至在用户毫无察觉的情况下完成实时响应。 天玑9500 围绕这一目标重构芯片底座:首发双NPU架构,结合存算一体、硬件压缩等多项关键技术,在ETHZ苏黎世移动SoC AI榜单中蝉联 榜首,相比上一代跑分翻倍。 天玑9500正在让手机的AI变得更快、更聪明,也更贴近你的使用节奏。 写文案、整理想法、 ...
GPT-5编程测评大反转!表面不及格,实际63.1%的任务没交卷,全算上成绩比Claude高一倍
量子位· 2025-09-22 08:08
Core Insights - The article discusses the performance of leading AI models on the new software engineering benchmark SWE-BENCH PRO, revealing that none of the top models achieved a solution rate above 25% [1][23]. Group 1: Benchmark Overview - SWE-BENCH PRO is a new benchmark that presents more challenging tasks compared to its predecessor, SWE-Bench-Verified, which had an average accuracy of 70% [5][6]. - The new benchmark aims to eliminate data contamination risks by ensuring that models have not encountered the test content during training [9][12]. - SWE-BENCH PRO includes a diverse codebase of 1865 commercial applications, B2B services, and developer tools, structured into public, commercial, and reserved subsets [12][18]. Group 2: Model Performance - The top-performing models on the public set were GPT-5 and Claude Opus 4.1, with solution rates of 23.3% and 22.7%, respectively [25][26]. - In the commercial set, even the best models scored below 20%, indicating limited capabilities in solving real-world business problems [27][28]. - The performance of models varied significantly across programming languages, with Go and Python generally performing better than JavaScript and TypeScript [30]. Group 3: Failure Analysis - The primary failure modes for the models included semantic understanding issues, syntax errors, and incorrect answers, highlighting challenges in problem comprehension and algorithm correctness [34]. - GPT-5 exhibited a high unanswered rate of 63.1%, indicating that while it performs well on certain tasks, it struggles with more complex problems [32]. - The analysis suggests that the difficulty of programming languages, the nature of codebases, and the types of models are key factors influencing performance [28][29].
奥特曼预告ChatGPT新产品!Pro会员也要额外收费,这次不计成本投入算力
量子位· 2025-09-22 05:54
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 奥特曼真·算力氪金玩家。 OpenAI已经花了 160亿美元 (约人民币1138亿)租用计算资源。 相当于每天一睁眼,就有几千万花出去租服务器。 但这还不是最夸张的。据The Information消息,OpenAI计划在未来五年 额外支出约1000亿美元 ,用于从云服务提供商处租用 备用服务器 在计划的3500亿美元算力投入之外,这么多钱租来的服务器还只是"备用"的…… 不过,OpenAI这波操作,也是为了能在AI算力需求暴增的时候不掉链子。 奥特曼预告未来几周新产品是计算密集型 OpenAI的首席财务官Sarah Friar在最近高盛的一次会议上就透露过,由于计算能力短缺,公司曾多次推迟新功能和新的AI模型的发布, 甚 至要刻意降低某些产品的运行速度 。 。 面对这些计算上的难题,OpenAI这也算是下血本了。 但今年的160亿美元还只是冰山一角。 从长远规划来看,OpenAI计划要在2024到2030年间砸下3500亿用于服务器租赁,仅2030单年,预计服务器租赁支出就高达1000亿美元。 就在前几天,OpenAI还和甲骨文签订了一份为期五年、价值3 ...
马斯克新模型性价比拉满:1折价格实现Gemini 2.5性能,支持2M上下文
量子位· 2025-09-21 13:29
Core Viewpoint - The article discusses the launch of Grok 4 Fast by Elon Musk's xAI, highlighting its competitive pricing and advanced capabilities in multimodal reasoning and context handling [1][3]. Group 1: Product Features and Performance - Grok 4 Fast achieves a price-performance benchmark by matching the price of Gemini 2.5 while offering a 2 million token context window [1][3]. - It significantly reduces token costs, using 40% fewer tokens on average compared to Grok 4 while maintaining similar performance levels [11][12]. - In benchmark tests, Grok 4 Fast outperformed Grok 3 Mini and ranked 8th in text arena competitions, demonstrating superior performance among similarly sized models [17][18]. Group 2: Competitive Advantage - Grok 4 Fast leads the "price-intelligence" ratio in the industry, as verified by independent assessments [14]. - It scored 1163 points in the search arena, outperforming the second-place model by 17 points, showcasing its competitive edge [18]. Group 3: Technological Innovations - The model employs end-to-end reinforcement learning to enhance its tool usage, excelling in determining when to invoke tools like code execution or web browsing [20]. - Grok 4 Fast integrates advanced search capabilities, allowing seamless web browsing and real-time data enhancement for queries [21][22]. - It features a unified architecture that reduces end-to-end latency and token costs, making it suitable for real-time applications [25]. Group 4: Market Position and Future Developments - Grok 4 Fast is now available to all users, with complex queries automatically utilizing its capabilities in Auto mode [26]. - Two new models are set to be launched, with specific pricing for input and output tokens detailed [27]. - The recruitment of Dustin Tran from Google, a key figure in the development of Gemini models, strengthens the team behind Grok 4 Fast [28][30].
OpenAI神秘狠人,花名Bob
量子位· 2025-09-21 13:29
Core Viewpoint - The article discusses the significance of a mysterious individual known as "Bob" at OpenAI, who is responsible for a crucial CUDA kernel that is essential for high-performance AI training and inference. Bob's unique skills make him a highly sought-after talent in the tech industry, particularly in Silicon Valley, where competition for such expertise is intense [1][2][6][14]. Group 1: Bob's Role and Skills - Bob is recognized for his exceptional ability to write high-performance CUDA kernels, which are executed on tens of thousands of GPUs daily, potentially trillions of times [3][4]. - The reliance on Bob is so significant that former employees express admiration for his capabilities, with one noting that he can resolve issues in minutes that others struggle with for a week [7][8]. - Internally, OpenAI has a "Bob magic" emoji on Slack, symbolizing the reverence for his skills [9]. Group 2: Industry Implications - The article hints at Meta's interest in Bob, with rumors suggesting that Mark Zuckerberg is eager to learn more about him, indicating the competitive landscape for top talent in AI [10][12]. - The importance of CUDA kernels in AI companies is emphasized, as they are considered core assets, making individuals like Bob highly valuable and secretive [14]. - The article also mentions Scott Gray, a senior technical member at OpenAI, as a potential candidate for being "Bob," given his extensive background in GPU kernel optimization and significant contributions to machine learning research [15][17][22]. Group 3: Talent Competition in Silicon Valley - The competition for AI talent in Silicon Valley is described as fierce, with companies vying for skilled individuals who can contribute to foundational technologies like CUDA kernels [26][28]. - The article notes that OpenAI has already lost several key researchers to Meta, highlighting the ongoing talent war in the industry [29].
AI播客的未来是成为每个人的音频助手,事实性、完整性和活人感都很重要|对话ListenHub
量子位· 2025-09-21 08:01
Core Insights - The article discusses the emergence of AI podcast tools, particularly ListenHub, which aims to transform various content formats into audio podcasts, highlighting its potential as a personal audio assistant for users [3][6]. - It raises questions about the sustainability of AI podcasts as a new interactive medium and how products can differentiate themselves in a crowded market [5][6]. Group 1: Product Features and Differentiation - ListenHub is positioned as an "AI mouthpiece for creators," focusing on transforming text and links into engaging audio content, with features like FlowSpeech for converting written language into natural speech [9][10][15]. - The product includes a three-layer agent system: one for information gathering, another for content organization, and the last for converting materials into spoken word, enhancing user experience [16][18]. - ListenHub's unique selling points include the ability to edit content, customize voice tones, and support both single and dual-host podcasts, which sets it apart from competitors [32][39]. Group 2: User Engagement and Feedback - The company emphasizes the importance of early user feedback, particularly from the first 100 paid users, to refine product features and ensure they meet real user needs [33][34]. - ListenHub's user base primarily consists of self-media practitioners who utilize the tool for content creation, indicating a strong market demand for efficient audio production tools [29][30]. Group 3: Market Positioning and Future Outlook - ListenHub aims to become the go-to audio assistant for users, expanding its capabilities beyond podcasts to include various audio content formats, such as audiobooks and educational materials [100][102]. - The company recognizes the challenge of competing with larger firms but believes that its specialized features and user-centric approach will create a high switching cost for users [80][81]. Group 4: Development Strategy and Product Launch - The company adopted a strategy of launching a minimum viable product (MVP) to gather user insights and iterate on features based on real-world usage [33][36]. - ListenHub's initial focus was on core functionalities, ensuring that the primary user experience was compelling before adding additional features [75][76]. Group 5: AI Integration and Future Trends - The integration of AI in product development is highlighted as a key factor in enhancing efficiency and creativity within the team, with a focus on making every team member a product manager [49][50]. - The future of AI in content creation is seen as leaning towards agent-based systems, where users can interact with AI to generate and refine content seamlessly [59][60].
老黄9亿美元再投AI Infra,这次直接打包带走CEO和核心技术
量子位· 2025-09-21 06:36
不圆 发自 凹非寺 量子位 | 公众号 QbitAI 刚入股了"老对手"英特尔,老黄又豪掷9亿美刀,拿下一家AI Infra公司……的CEO和技术授权。 最新消息,AI Infra初创公司 Enfabrica 的核心团队和技术授权,已经被英伟达打包带走。 没错,又是一个不收购公司本身、但掏空公司根本的" 雇佣式收购 "。 Enfabrica成立于2019年,专注于解决I/O、内存及网络瓶颈,去年年底估值6亿美元。 交易达成,这也是英伟达今年第4次对AI初创出手。 如此大手笔,当即引发热议。有网友认为,英伟达这是在打长远算盘,认真地想要保持自己在AI硬件领域的霸主地位。 一起来看具体情况。 9亿美元带走技术核心 这笔折合人民币约 64亿 元的交易,被曝已于上周完成。 Enfabrica的CEO罗尚·桑卡尔(Rochan Sankar)现已入职英伟达,核心团队和公司技术也一并打包带给新东家。 成立于2019年的 Enfabrica 是一家专注于解决I/O、内存及网络瓶颈的硅谷AI基础设施初创公司,其技术旨在使大规模GPU集群能作为单体 计算机运行——该公司宣称其技术可实现超过10万个GPU的互联,可将数据中心GP ...
鸿蒙的全面进击:“天工计划”十亿重磅加码,打造AI全场景新生态
量子位· 2025-09-21 06:36
Core Viewpoint - Huawei's HarmonyOS 5 showcases significant advancements in AI capabilities, aiming to create a seamless, multi-device ecosystem that integrates AI into everyday tasks and interactions [1][3][6]. Group 1: AI Capabilities and Features - HarmonyOS 5 introduces the AI assistant "Xiao Yi," which can handle various tasks such as travel planning and music playback across multiple devices [2][4]. - The system integrates AI natively, allowing for a unified experience across smartphones, tablets, PCs, and other devices, unlike existing fragmented systems [3][6]. - As of now, over 17 million HarmonyOS 5 devices have been deployed, with more than 30,000 applications and services available, indicating rapid ecosystem growth [5][6]. Group 2: Xiao Yi's Functionality - Xiao Yi evolves from a simple voice assistant to a comprehensive AI agent capable of managing complex tasks and workflows, effectively acting as a project manager and personal assistant [10][15]. - The assistant can autonomously plan trips, manage schedules, and even organize events based on user preferences, streamlining previously cumbersome processes [11][14]. - Xiao Yi's emotional awareness allows it to respond to users' moods and provide contextually appropriate interactions, enhancing user experience [18][20]. Group 3: System Integration and Development - The "Xiao Yi Brain" feature enables seamless task management across devices, ensuring that AI capabilities are embedded within the entire system rather than being limited to individual devices [22][26]. - Huawei has launched the "Tiangong Plan," committing 1 billion yuan to support AI ecosystem innovation, aiming to lower barriers for developers and enhance AI capabilities [27][28]. - The Harmony Intelligence platform offers various development modes and components, facilitating the creation of AI agents without starting from scratch [30]. Group 4: Future Directions and User Engagement - Huawei's vision for AI includes transforming AI agents into decision-making partners, driving industry revolutions and enhancing human-computer interaction [32][34]. - The company emphasizes the importance of user feedback, having received over 10 million responses, with a high rate of issue resolution, to continuously improve the system [38][39]. - The overarching goal is to redefine operating systems from merely supporting AI to being driven by AI, fostering a collaborative ecosystem for developers and users alike [40].
无需训练的世界模型?西湖大学WorldForge开启空间智能新路径,让AI读懂3D世界
量子位· 2025-09-21 06:36
Core Viewpoint - The article discusses the advancements in AI-generated video content, highlighting the challenges of controllability in video generation models and introducing WorldForge as a solution to enhance precision in video creation without altering the model's weights [1][2]. Group 1: Challenges in Video Generation - AI-generated videos have gained significant attention due to their realistic visuals, but the lack of precise control over generated content remains a major limitation [1]. - Current models often require extensive retraining to improve controllability, which can be costly in terms of time and computational resources, potentially degrading the model's generalization ability [1]. Group 2: Introduction of WorldForge - WorldForge offers an innovative approach by guiding existing video generation models during the inference phase, allowing for precise control without modifying the model's weights [2][14]. - The framework consists of three collaborative modules designed to enhance the generation process [4]. Group 3: Key Modules of WorldForge - **Intra-step Recursive Refinement (IRR)**: This module sets boundaries for the AI's imagination by implementing a "predict-correct" micro-loop, allowing for timely corrections after each prediction to ensure adherence to a predefined trajectory [4][5]. - **Flow-Gated Latent Fusion (FLF)**: This module separates appearance and motion features, injecting motion signals only into relevant channels to maintain the quality of the generated content while controlling the perspective [6][7]. - **Dual-Path Self-Correcting Guidance (DSG)**: DSG addresses the imperfections in injected guidance signals by utilizing two parallel denoising paths to ensure high-quality output while adhering to trajectory constraints [7]. Group 4: Applications of WorldForge - WorldForge demonstrates remarkable capabilities, such as reconstructing 3D static scenes from a single image and generating 360° surround videos, indicating its potential for efficient world model exploration [9][8]. - The system allows users to design new camera trajectories for existing videos, executing complex movements and intelligently filling in newly exposed areas, outperforming traditional models that require extensive training [11]. - Additionally, WorldForge supports video content editing, including subject replacement and object manipulation, enabling creative modifications [12]. Group 5: Future Implications - WorldForge introduces a novel interactive and control approach in video generation, paving the way for the development of controllable world models without increasing training costs or losing prior knowledge [14]. - The potential for future advancements includes more natural interactions through language or gestures, allowing models to better understand and execute creative visions [14].