量子位
Search documents
中国AI高速路,华为给出开源开放方案
量子位· 2025-09-23 11:01
Core Viewpoint - Huawei is leading the development of an open and shared AI computing ecosystem through its innovative supernode architecture, which aims to create an "AI highway" that benefits various industries and players of all sizes [1][2][26]. Group 1: Supernode Technology - Huawei unveiled the supernode architecture at the Huawei Connect conference, introducing a range of supernode products that cover all scenarios from data centers to workstations [3]. - The Atlas 950 SuperPoD is designed for large-scale AI computing tasks, featuring innovations in system-level design, including zero-cable interconnect and enhanced cooling reliability [4]. - Compared to NVIDIA's upcoming products, the Atlas 950 supernode shows significant advantages in scale, total computing power, memory capacity, and interconnect bandwidth, achieving 56.8 times, 6.7 times, 15 times (1152TB), and 62 times (16.3PB/s) respectively [5]. Group 2: Open Source and Collaboration - Huawei is fully opening its supernode technology to the industry, allowing for shared technological benefits and collaborative innovation [16]. - The company is also opening its hardware components, including NPU modules and AI cards, to facilitate incremental development by customers and partners [18]. - On the software side, Huawei is making its operating system components open source, enabling users to integrate and maintain versions according to their needs [20]. Group 3: Industry Impact and Ecosystem - The supernode technology is designed to serve various industries, including internet, finance, telecommunications, and manufacturing, enhancing computing efficiency and business capabilities [29]. - The UnifiedBus protocol enables high bandwidth and low latency interconnectivity among computing and storage units, addressing traditional cluster reliability issues [33]. - Huawei's approach fosters an open ecosystem where different hardware manufacturers and software developers can collaborate, breaking down barriers in the AI computing landscape [42]. Group 4: Future Prospects - The Atlas 950 SuperCluster is set to be 2.5 times larger and 1.3 times more powerful than the current largest global cluster, xAI Colossus, positioning Huawei as a leader in computing power [48]. - By promoting an open and collaborative AI computing environment, Huawei aims to establish a sustainable and secure foundation for the AI industry in China, potentially leading to a new cycle of innovation [52][53].
Qwen开源版Banana来了!原生支持ControlNet
量子位· 2025-09-23 08:13
Core Viewpoint - Qwen has launched a new image editing model, Qwen-Image-Edit-2509, which enhances multi-image fusion capabilities and consistency in single images, providing various creative options for users. Group 1: Image Editing Features - The new model supports multi-image input, allowing combinations such as "person + person," "person + object," and "person + scene" [1][6][2] - It can generate wedding photos by merging two images, offering both traditional and modern styles [7][12] - The model excels in creating realistic scenes, adjusting characters' expressions and poses to fit the context [16][20] - It allows for easy editing of personal photos, including changing poses and outfits, and can create various styles like American elite fashion [25][27][29] - The model can also restore old photos, including colorization and damage repair [36][40] - Enhanced text consistency features include editing font types, colors, and materials, as well as targeted text corrections [50][55] Group 2: ControlNet and Keypoint Features - The model integrates ControlNet, enabling users to modify character poses and outfits using keypoint images [4][20] - It supports depth map control to maintain consistency between objects and scenes [60] Group 3: Qwen3-Omni Model - Qwen has also released the Qwen3-omni model, which is an end-to-end multimodal model capable of processing text, audio, images, and video [4][67] - It has achieved state-of-the-art performance in 36 audio and video benchmark tests, surpassing several closed-source models [69] - The model supports real-time translation and can summarize web content in various languages [71] - It features low latency for audio and video conversations, with response times of 211ms and 507ms respectively [72] - The model can handle long audio inputs of up to 30 minutes and allows for personalized system prompts [73][74]
DeepSeek V3.1更新「最终版」!下一次是V4/R2了???
量子位· 2025-09-23 03:14
Core Viewpoint - The latest update of DeepSeek, version V3.1-Terminus, addresses previous user-reported issues and enhances model performance while maintaining existing capabilities [2][3][7]. Group 1: Version Improvements - The Terminus version resolves a notable bug where the model randomly outputted the character "极" [3][7]. - Improvements include enhanced language consistency, reducing mixed-language outputs and random characters, and optimized performance of Code Agent and Search Agent [7][8]. Group 2: Performance Metrics - The new model shows improved performance in various benchmarks compared to the previous version: - MMLU-Pro: 85.0 (up from 84.8) - GPQA-Diamond: 80.7 (up from 80.1) - Humanity's Last Exam: 21.7 (up from 15.9) - BrowseComp: 38.5 (up from 30.0) - SimpleQA: 96.8 (up from 93.4) - SWE Verified: 68.4 (up from 66.0) - Terminal-bench: 36.7 (up from 31.3) [9]. Group 3: User Reactions and Future Speculations - Some users expressed concerns about a decrease in performance in the Codeforces competition, speculating that safety adjustments may have impacted the model's creativity [10]. - The naming of the Terminus version has led to speculation about the next version potentially being a complete overhaul (V4) [11][14].
全是套路!英伟达千亿美元投OpenAI,奥特曼拿钱买卡还让甲骨文赚差价
量子位· 2025-09-23 01:10
Core Viewpoint - Nvidia plans to invest up to $100 billion in OpenAI to build at least 10GW of AI data centers, all utilizing Nvidia's systems [1][11][30] Group 1: Investment and Infrastructure - The first $10 billion will be invested upon the completion of the first 1GW data center, expected in the second half of 2026 [3][13] - OpenAI commits to deploying a total of 10GW, equivalent to 4-5 million GPUs [2][11] - The cost to build a 1GW data center is estimated to be around $50-60 billion [12] Group 2: Strategic Relationships - The partnership creates a triangular relationship involving Oracle, where OpenAI spends $300 billion on Oracle's cloud services, which in turn purchases Nvidia GPUs [6][17] - This cycle suggests that Nvidia's investment may eventually return to them through chip sales to Oracle [18][24] Group 3: Market Positioning - OpenAI's user base for ChatGPT has reached 700 million weekly active users, necessitating substantial computational power for model iteration and operational services [22] - Nvidia secures a core customer in OpenAI while expanding sales through Oracle's GPU procurement, solidifying its position in the AI supply chain [23] - Oracle benefits from the $300 billion cloud order, boosting its stock price and ensuring computational power through Nvidia's chips [24] Group 4: Future Outlook - Nvidia and OpenAI's collaboration is seen as a significant step towards the next leap in AI, driven by the 10GW system [29] - Nvidia is also making additional investments in companies like Intel and Nscale, indicating a broader strategy in the AI infrastructure space [30]
百度开源视觉理解模型Qianfan-VL!全尺寸领域增强+全自研芯片计算
量子位· 2025-09-22 11:16
Core Viewpoint - Baidu's Qianfan-VL series of visual understanding models has been officially launched and is fully open-sourced, featuring three sizes (3B, 8B, and 70B) optimized for enterprise-level multimodal applications [1][34]. Model Performance and Features - The Qianfan-VL models demonstrate significant core advantages in benchmark tests, with performance improving notably as the parameter size increases, showcasing a good scaling trend [2][4]. - In various benchmark tests, the 70B model achieved a score of 98.76 in ScienceQA_TEST and 88.97 in POPE, indicating its superior performance in specialized tasks [4][5]. - The models are designed to meet diverse application needs, providing reasoning capabilities and enhanced OCR and document understanding features [3][5]. Benchmark Testing Results - The Qianfan-VL series models (3B, 8B, 70B) excel in OCR and document understanding, achieving high scores in various tests such as OCRBench (873 for 70B) and DocVQA_VAL (94.75 for 70B) [6][5]. - The models also show strong performance in reasoning tasks, with the 70B model scoring 78.6 in MathVista-mini and 50.29 in MathVision [8][7]. Technical Innovations - Qianfan-VL employs advanced multimodal architecture and a four-stage training strategy to enhance domain-specific capabilities while maintaining general performance [9][12]. - The models leverage Baidu's Kunlun chip P800 for efficient computation, supporting large-scale distributed computing with up to 5000 cards [12][1]. Application Scenarios - Beyond OCR and document understanding, Qianfan-VL can be applied in chart analysis and video understanding, demonstrating excellent model performance across various scenarios [33][34]. - The open-sourcing of Qianfan-VL marks a significant step towards integrating AI technology into real-world productivity applications [33].
腾讯用AI把美术管线重新做了一遍,混元3D Studio架构曝光
量子位· 2025-09-22 11:16
Core Viewpoint - Tencent has developed a professional-grade AI workstation called Hunyuan 3D Studio, designed specifically for 3D designers, game developers, and modelers, which streamlines the entire design process from concept to final game assets, significantly reducing production time from days to minutes [3][4]. Group 1: Key Features of Hunyuan 3D Studio - The platform includes seven core technology modules that ensure a seamless and automated workflow throughout the asset creation process [6]. - The workflow encompasses component splitting, controllable image generation, high-fidelity geometry generation, low-poly topology generation, semantic UV unwrapping, texture generation and editing, and rigging and animation effects [9][10]. Group 2: Component Splitting - The component splitting module utilizes connectivity analysis and semantic segmentation algorithms to automatically decompose complex models into logically and functionally independent components, allowing for independent editing and animation [9][10]. - The process involves using a feature extractor and segmentation heads to predict masks for component boundaries, ensuring high accuracy in the segmentation results [15][18]. Group 3: Controllable Image Generation - The controllable image generation module allows users to generate 3D design images in various mainstream game art styles by providing input images and style instructions [33][34]. - The system employs a dataset constructed from pairs of images to achieve precise mapping between realistic images and stylized outputs, enhancing the model's ability to generate consistent and high-quality designs [34][41]. Group 4: High-Fidelity Geometry Generation - High-fidelity geometry generation is based on the Hunyuan 3D framework, which includes a variational autoencoder for compressing and reconstructing 3D geometries [43][45]. - The process utilizes a diffusion model to efficiently generate high-quality samples from single input images, ensuring that the generated geometries align closely with the input prompts [47][50]. Group 5: Low-Poly Topology Generation - The low-poly topology module aims to create clean and art-compliant topology structures from high-fidelity geometries, employing a self-regressive model to predict vertices and faces directly from point clouds [55][56]. - The module incorporates a tokenization method that enhances training and inference efficiency by modeling the mesh as a sequence [59][60]. Group 6: Texture Generation and Editing - The texture generation framework extends 2D diffusion models to support multi-view texture generation, addressing challenges such as cross-view consistency and the transition from RGB textures to physically-based rendering (PBR) materials [76][78]. - A text-guided texture editing model has been developed, allowing for robust texture synthesis and editing based on high-quality PBR material datasets [81][84]. Group 7: Rigging and Animation Effects - The rigging and animation module includes a humanoid character animation branch and a general character animation branch, ensuring accurate bone generation and skinning through a template-based approach [97][100]. - The system allows for parameterized control, enabling high-level artistic adjustments throughout the pipeline while maintaining the ability to incrementally update without complete recalculation [104][105].
首创双NPU架构一鸣惊人!联发科天玑9500重磅加码主动式AI体验
量子位· 2025-09-22 11:16
克雷西 发自 深圳 量子位 | 公众号 QbitAI 一个问题,在当前的智能手机中,如果AI需要成为具有自主意识、会主动实现功能的"常驻能力",而不只是一个需要频繁被动焕新的"功能模 块",什么样的芯片架构才能真正跟得上这样的改变? 联发科给出的答案是:以更犀利的算力和更友好的能效表现,创造 超性能+超能效双NPU架构 ,始终让"AI Always on"。 这是一次从技术形态到使用方式的转变:目的是 让AI不再依赖被动唤醒,而是作为系统能力始终在线 、随时响应,融入用户的每一次操作。 这一趋势正在形成共识。 随着大模型下沉,端侧AI的使用频率越来越高,从输入法里的预测补全,到拍照时的构图建议,从锁屏摘要到图像生成,AI正在从"调用一 次"变为"时刻可用"。 为此,SoC不仅要能跑得快,更要 让AI跑得久、跑得稳 ,甚至在用户毫无察觉的情况下完成实时响应。 天玑9500 围绕这一目标重构芯片底座:首发双NPU架构,结合存算一体、硬件压缩等多项关键技术,在ETHZ苏黎世移动SoC AI榜单中蝉联 榜首,相比上一代跑分翻倍。 天玑9500正在让手机的AI变得更快、更聪明,也更贴近你的使用节奏。 写文案、整理想法、 ...
GPT-5编程测评大反转!表面不及格,实际63.1%的任务没交卷,全算上成绩比Claude高一倍
量子位· 2025-09-22 08:08
Core Insights - The article discusses the performance of leading AI models on the new software engineering benchmark SWE-BENCH PRO, revealing that none of the top models achieved a solution rate above 25% [1][23]. Group 1: Benchmark Overview - SWE-BENCH PRO is a new benchmark that presents more challenging tasks compared to its predecessor, SWE-Bench-Verified, which had an average accuracy of 70% [5][6]. - The new benchmark aims to eliminate data contamination risks by ensuring that models have not encountered the test content during training [9][12]. - SWE-BENCH PRO includes a diverse codebase of 1865 commercial applications, B2B services, and developer tools, structured into public, commercial, and reserved subsets [12][18]. Group 2: Model Performance - The top-performing models on the public set were GPT-5 and Claude Opus 4.1, with solution rates of 23.3% and 22.7%, respectively [25][26]. - In the commercial set, even the best models scored below 20%, indicating limited capabilities in solving real-world business problems [27][28]. - The performance of models varied significantly across programming languages, with Go and Python generally performing better than JavaScript and TypeScript [30]. Group 3: Failure Analysis - The primary failure modes for the models included semantic understanding issues, syntax errors, and incorrect answers, highlighting challenges in problem comprehension and algorithm correctness [34]. - GPT-5 exhibited a high unanswered rate of 63.1%, indicating that while it performs well on certain tasks, it struggles with more complex problems [32]. - The analysis suggests that the difficulty of programming languages, the nature of codebases, and the types of models are key factors influencing performance [28][29].
奥特曼预告ChatGPT新产品!Pro会员也要额外收费,这次不计成本投入算力
量子位· 2025-09-22 05:54
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 奥特曼真·算力氪金玩家。 OpenAI已经花了 160亿美元 (约人民币1138亿)租用计算资源。 相当于每天一睁眼,就有几千万花出去租服务器。 但这还不是最夸张的。据The Information消息,OpenAI计划在未来五年 额外支出约1000亿美元 ,用于从云服务提供商处租用 备用服务器 在计划的3500亿美元算力投入之外,这么多钱租来的服务器还只是"备用"的…… 不过,OpenAI这波操作,也是为了能在AI算力需求暴增的时候不掉链子。 奥特曼预告未来几周新产品是计算密集型 OpenAI的首席财务官Sarah Friar在最近高盛的一次会议上就透露过,由于计算能力短缺,公司曾多次推迟新功能和新的AI模型的发布, 甚 至要刻意降低某些产品的运行速度 。 。 面对这些计算上的难题,OpenAI这也算是下血本了。 但今年的160亿美元还只是冰山一角。 从长远规划来看,OpenAI计划要在2024到2030年间砸下3500亿用于服务器租赁,仅2030单年,预计服务器租赁支出就高达1000亿美元。 就在前几天,OpenAI还和甲骨文签订了一份为期五年、价值3 ...
马斯克新模型性价比拉满:1折价格实现Gemini 2.5性能,支持2M上下文
量子位· 2025-09-21 13:29
Core Viewpoint - The article discusses the launch of Grok 4 Fast by Elon Musk's xAI, highlighting its competitive pricing and advanced capabilities in multimodal reasoning and context handling [1][3]. Group 1: Product Features and Performance - Grok 4 Fast achieves a price-performance benchmark by matching the price of Gemini 2.5 while offering a 2 million token context window [1][3]. - It significantly reduces token costs, using 40% fewer tokens on average compared to Grok 4 while maintaining similar performance levels [11][12]. - In benchmark tests, Grok 4 Fast outperformed Grok 3 Mini and ranked 8th in text arena competitions, demonstrating superior performance among similarly sized models [17][18]. Group 2: Competitive Advantage - Grok 4 Fast leads the "price-intelligence" ratio in the industry, as verified by independent assessments [14]. - It scored 1163 points in the search arena, outperforming the second-place model by 17 points, showcasing its competitive edge [18]. Group 3: Technological Innovations - The model employs end-to-end reinforcement learning to enhance its tool usage, excelling in determining when to invoke tools like code execution or web browsing [20]. - Grok 4 Fast integrates advanced search capabilities, allowing seamless web browsing and real-time data enhancement for queries [21][22]. - It features a unified architecture that reduces end-to-end latency and token costs, making it suitable for real-time applications [25]. Group 4: Market Position and Future Developments - Grok 4 Fast is now available to all users, with complex queries automatically utilizing its capabilities in Auto mode [26]. - Two new models are set to be launched, with specific pricing for input and output tokens detailed [27]. - The recruitment of Dustin Tran from Google, a key figure in the development of Gemini models, strengthens the team behind Grok 4 Fast [28][30].