Workflow
量子位
icon
Search documents
DeepSeek新模型上线!引入DSA新稀疏注意力,还又狙了CUDA一枪
量子位· 2025-09-29 10:44
Core Insights - DeepSeek has launched its latest model, DeepSeek-V3.2-Exp, which introduces a new attention mechanism called DeepSeek Sparse Attention (DSA) [1][6] - The model aims to enhance long text processing and inference efficiency without significantly affecting output quality [7] - A significant price reduction for the official API has been announced, starting at 50% off [3][17] Model Updates - DeepSeek-V3.2-Exp is built on the previous version, DeepSeek-V3.1-Terminus, which focused on stability, tool invocation capabilities, language consistency, and error correction [9] - In benchmark comparisons, DeepSeek-V3.2-Exp shows comparable performance to DeepSeek-V3.1-Terminus across various evaluation sets [10] - The model demonstrates improved inference costs when handling 128K long contexts, particularly during the decoding phase [12] Technical Innovations - The introduction of DSA allows for fine-grained attention mechanisms, leading to significant improvements in processing efficiency [6][7] - DeepSeek has open-sourced GPU operators in both TileLang and CUDA versions, facilitating research and development [13][15] - The company recommends using the TileLang version for debugging and rapid iteration during experimental research [16] Community Engagement - The announcement includes a call to action for the community to engage with the new model and take advantage of the promotional pricing [18] - Links to access the model on platforms like HuggingFace and ModelScope have been provided [19]
十亿级参数,千亿级性能,上海AI Lab发布新一代文档解析大模型,复杂场景解析精度媲美人类专家
量子位· 2025-09-29 10:44
MinerU2.5团队 投稿 量子位 | 公众号 QbitAI 大模型越来越大,参数量动辄千亿,但真要在实际场景里做到"高精度+高效率",却并不容易。 | Model Type | Models | Slides | Academic | Book | Textbook | Exam | Magazine | Newspaper | Notes | Financial | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | | | Papers | | | Papers | | | | Report | | | Marker-1.8.2 [32] | 0.1796 | 0.0412 | 0.1010 | 0.2908 | 0.2958 | 0.1111 | 0.2717 | 0.4656 | 0.0341 | | Pipeline | MinerU2-pipeline [46] | 0.4244 | 0.0230 | 0.2628 | 0.1224 | 0.0822 | 0.395 | 0.0736 | 0.2603 ...
前馈3D高斯泼溅新方法,浙大团队提出“体素对齐”,直接在三维空间融合多视角2D信息
量子位· 2025-09-29 04:57
Core Viewpoint - The article discusses the rapid industrialization of Feed-Forward 3D Gaussian Splatting (3DGS) and introduces VolSplat, which abandons the traditional pixel-aligned strategy in favor of a voxel-aligned framework, enhancing robustness, efficiency, and engineering feasibility in multi-view rendering [1][2]. Summary by Sections Introduction to VolSplat - VolSplat addresses the limitations of existing pixel-aligned methods, which struggle with precise alignment of 2D features in 3D space and are constrained by the pixel grid in Gaussian density allocation [2][6]. Performance Comparison - Experimental results on public datasets like RealEstate10K and ScanNet show that VolSplat outperforms various pixel-aligned baselines in visual quality and geometric consistency [4][5]. Core Concepts of VolSplat - The core idea of VolSplat is to shift alignment from 2D to 3D, allowing for better integration of multi-view information and overcoming challenges related to multi-view consistency and Gaussian density allocation [6][9]. Methodology Breakdown - The VolSplat pipeline consists of three clear modules: 1. 2D feature extraction and depth estimation 2. Lifting pixels to voxels and feature aggregation 3. Sparse 3D refinement and Gaussian regression [9][11]. Step-by-Step Process - **Step 1**: 2D features are extracted using a shared encoder, and depth maps are constructed to provide necessary geometric priors for subsequent processing [11]. - **Step 2**: Pixels are projected into 3D space based on predicted depths, creating a point cloud that is voxelized for feature aggregation, enhancing cross-view consistency [12][13]. - **Step 3**: A sparse 3D U-Net refines voxel features, predicting corrections for each voxel and regressing Gaussian parameters for rendering [14]. Experimental Highlights - VolSplat demonstrates superior zero-shot generalization across datasets, maintaining high performance even on unseen data, with a PSNR of 32.65 dB on the ACID dataset [15][17]. Practical Implications - The advancements in VolSplat lead to fewer artifacts and better geometric fidelity, translating to improved user experiences in applications like virtual tours and indoor navigation [17][19]. Future Directions - VolSplat opens new avenues for research in 3D reconstruction, robotics, autonomous driving, and AR/VR, providing a unified framework for integrating multimodal data [19][20].
华为盘古718B模型最新成绩:开源第二
量子位· 2025-09-29 04:57
Core Viewpoint - Huawei has emerged as a strong competitor in the AI model landscape, particularly highlighted by its performance in the latest SuperCLUE benchmark evaluation, showcasing its capabilities in various dimensions of AI model performance [1][2]. Group 1: Model Rankings and Performance - The top three models in the SuperCLUE evaluation based on open-source and domestic criteria are: 1. DeepSeek-V3.1-Terminus-Thinking 2. openPangu-Ultra-MoE-718B 3. Qwen3-235B-A22B-Thinking-2507 [5]. - Huawei's openPangu-Ultra-MoE-718B model, with 718 billion parameters, stands out due to its unique training philosophy that emphasizes quality over sheer data volume [6][35]. Group 2: Data Quality and Training Strategy - The openPangu team adheres to three core principles in post-training data construction: quality first, diversity coverage, and complexity adaptation [10][21]. - A comprehensive framework for data generation, scientific selection, and precise enhancement has been established to ensure high data quality, which is crucial for improving the model's reasoning capabilities in complex scenarios [13][35]. Group 3: Pre-training and Optimization Techniques - The pre-training process for openPangu-718B is divided into three stages: General, Reasoning, and Annealing, each focusing on different aspects of knowledge and reasoning enhancement [15][35]. - The model employs a "Critique Internalization" mechanism to mitigate hallucinations, allowing it to self-evaluate its reasoning process and improve output reliability [19][22]. Group 4: Tool Usage and Agent Capabilities - To enhance the model's ability to use tools, the team developed the ToolACE framework, which generates high-quality, complex multi-tool interaction data for training [23][26]. - The model's training includes a three-step post-training fine-tuning scheme to optimize performance, ensuring a balance between underfitting and overfitting [27][29]. Group 5: Technical Innovations and Industry Implications - The systematic technical innovations across various training stages contribute to the superior performance of openPangu-718B, setting a valuable example for the industry on the importance of meticulous technical refinement and deep insights into core challenges [35].
Flash Attention作者最新播客:英伟达GPU统治三年内将终结
量子位· 2025-09-29 04:57
Group 1 - The core argument is that Nvidia's dominance in the GPU market will face increasing competition within the next 2-3 years as specialized chips for different workloads emerge, leading to a more diversified ecosystem [6][9][23] - Tri Dao emphasizes that the architecture for AI models, particularly the Transformer, is stabilizing, but there are still ongoing changes and challenges in chip design and workload adaptation [11][12][21] - The future of AI workloads will include three main types: traditional chatbots, ultra-low latency scenarios, and large-scale batch processing, which will require tailored optimizations from hardware vendors [24][96] Group 2 - The cost of inference has decreased by approximately 100 times since the launch of ChatGPT, driven by improvements in model efficiency and inference optimization techniques [73][75][90] - Techniques such as model quantization and collaborative design between model architecture and hardware have significantly contributed to this cost reduction [82][84][88] - There is still an estimated potential for a further 10-fold improvement in inference optimization, particularly through specialized hardware and model advancements [90][93][95] Group 3 - The AI hardware landscape is expected to diversify as companies like Cerebras, Grok, and SambaNova introduce solutions that emphasize low-latency inference and high throughput for various applications [23][24][96] - The emergence of specialized AI inference providers will lead to different trade-offs, with some focusing on broad coverage while others aim for excellence in specific scenarios [96][97] - The evolution of AI workloads will continue to drive demand for innovative solutions, particularly in real-time video generation and agentic applications that require seamless integration with human tools [117][115][120]
8.9ms,推理速度新记录!1块钱百万token,浪潮信息AI服务器加速智能体产业化
量子位· 2025-09-29 04:57
克雷西 henry 发自 凹非寺 量子位 | 公众号 QbitAI 一百万Token的输出推理成本,只要一块钱了。 今年的人工智能计算大会上,浪潮信息发布了超扩展AI服务器元脑HC1000,把AI推理成本狠狠地打了下来。 与此同时,浪潮信息还推出另一杀手锏——元脑SD200超节点,也将DeepSeek-R1的Token生成时间打到了毫秒量级。 △ 浪潮信息首席AI战略官刘军 随着AI竞赛进入智能体产业化阶段,能力、速度和成本成为了决胜的核心三要素。 元脑SD200和元脑HC1000,将为多智能体协同与复杂任务推理的规模化落地,提供高速度、低成本的算力基础设施。 DeepSeek-R1推理进入10ms时代 首先来看 元脑SD200 超节点AI服务器。 特别是在速度上,元脑SD200率先将大模型端到端推理延迟控制在了10ms以内。 实测中,元脑SD200在运行DeepSeek-R1时,TPOT(每Token输出时间)仅有 8.9ms ,领先了前SOTA(15ms)近一倍,还使 DeepSeek-R1 671B的推理性能实现了最高16.3倍的 超线性扩展率 。 它可以在单机内同时运行DeepSeek-R1、Kimi ...
GPT-5为量子计算提供关键思路!大牛盛赞:不到半小时给出“灵魂一击”
量子位· 2025-09-29 03:46
Core Viewpoint - GPT-5 is potentially underestimated in its capabilities, particularly in assisting with complex quantum computing problems, as demonstrated by its role in providing critical insights during a recent research collaboration [1][20][26]. Group 1: GPT-5's Role in Quantum Research - Scott Aaronson, a prominent figure in quantum computing, expressed that the insights provided by GPT-5 were impressive enough to be considered as coming from a highly intelligent student [2][3]. - In a recent collaboration, GPT-5 contributed significantly to a paper titled "Limits to black-box amplification in QMA," which explores the limitations of amplification techniques in quantum complexity classes [5][4]. - The research involved analyzing how the maximum eigenvalue of a Hermitian matrix changes with parameters, a task that GPT-5 helped expedite, leading to a breakthrough in the research [22][25]. Group 2: Quantum Complexity Class (QMA) - QMA (Quantum Merlin Arthur) is a complexity class that describes a verification process where a verifier (Arthur) checks the validity of a quantum state provided by a prover (Merlin) [9][10]. - A long-standing question in QMA is whether the completeness error can be improved from 2/3 to 1, meaning whether a verifier can always accept a correct answer with certainty [10][12]. - Recent findings by researchers indicate that any QMA protocol can be amplified to achieve an exponentially small completeness error, showcasing the potential for significant advancements in quantum computing [15][19]. Group 3: Industry Reactions and Developments - The collaboration between researchers and GPT-5 has sparked discussions about the changing dynamics of research and the role of AI in scientific discovery [27][28]. - There are concerns regarding OpenAI's recent model downgrades, which have led to user dissatisfaction and calls for transparency in model usage [30][31]. - OpenAI has responded to these concerns by stating that the model switching is part of a "safety routing test" aimed at handling sensitive topics more rigorously [31].
十位离职华为的「天才少年」
量子位· 2025-09-28 11:54
Core Viewpoint - The article discusses the transition of Huawei's "Genius Youth" program participants, highlighting their shift from Huawei to various entrepreneurial and academic paths, particularly in the AI sector, showcasing their contributions and achievements in the industry [1][2][82]. Group 1: Entrepreneurial Paths - The "Genius Youth" program has produced notable entrepreneurs, with six out of ten participants choosing to start their own companies [82]. - 彭志辉, a prominent figure, left Huawei to co-found 智元机器人, which has secured significant funding and contracts, indicating strong market potential [10][15]. - 季宇 founded 行云集成电路, focusing on AI chip development, and has successfully launched a new product with competitive pricing [34][36]. - 王乃行 established 博思芯宇, targeting AI chip lifecycle management, and has also secured substantial funding [41][43]. - 丁文超 transitioned from Huawei to academia and then co-founded 它石智航, which focuses on embodied intelligence and has achieved significant funding milestones [48][50]. - 黄青虬, known for his work in laser radar algorithms, is also venturing into entrepreneurship in the field of embodied intelligence [56]. Group 2: Academic Paths - Four participants returned to academia, contributing to research and education in their respective fields [82]. - 周满 joined 华中科技大学, focusing on cybersecurity and wireless systems [62][63]. - 任宇翔 became an assistant professor at 南京大学, specializing in graph computing and AI models [70][72]. - 徐科 returned to 南京大学, where he is involved in data intelligence and visualization research [75][76]. - 邵典 took a position at 西北工业大学, focusing on AI and computer vision [81]. Group 3: Background of the "Genius Youth" Program - The "Genius Youth" program was initiated by 任正非 in 2019, aiming to cultivate top talent in key technological fields [85][88]. - Participants were offered competitive salaries, with the highest tier reaching up to 201 million yuan, attracting elite graduates [88][90]. - The program has been influential in shaping the careers of its participants, many of whom have made significant contributions to the tech industry [91][92].
Transformer作者初创公司最新成果:开源新框架突破进化计算瓶颈,样本效率暴涨数十倍
量子位· 2025-09-28 11:54
Core Insights - The article discusses the launch of an open-source framework called ShinkaEvolve, developed by Sakana AI, which significantly enhances sample efficiency in various computational tasks, achieving results that previously required thousands of evaluations with only 150 samples [1][3][22]. Group 1: Framework Overview - ShinkaEvolve allows large language models (LLMs) to optimize their own code while maintaining efficiency, likened to equipping evolutionary computation with an "acceleration engine" [3][6]. - The framework demonstrates performance comparable to Google's AlphaEvolve but with higher sample efficiency and open-source accessibility [6][22]. Group 2: Key Innovations - The framework incorporates three major architectural innovations that enhance its performance across tasks such as mathematical optimization, agent design, and competitive programming [5][11]. - The first innovation is a parent sampling technique that balances exploration and exploitation through a layered strategy and multi-method integration [11][13]. - The second innovation involves a novelty rejection sampling method that reduces ineffective computations by filtering out low-novelty variants using a two-tiered mechanism [14][16]. - The third innovation is a multi-armed bandit LLM selection strategy based on the UCB1 algorithm, which dynamically schedules LLMs based on their performance during different task phases [17][18]. Group 3: Performance Validation - In mathematical optimization, ShinkaEvolve achieved a significant breakthrough by requiring only 150 evaluations to optimize the placement of 26 circles within a unit square, compared to thousands needed by AlphaEvolve [20][22]. - For agent design, experiments showed that ShinkaEvolve outperformed baseline models in solving mathematical reasoning problems, achieving maximum performance with just seven LLM queries [23][25]. - In competitive programming benchmarks, ShinkaEvolve improved average scores by 2.3% across ten AtCoder problems, demonstrating its effectiveness without extensive code restructuring [28]. - The framework also excelled in evaluating load balancing loss functions in mixed expert models, showing higher accuracy and lower perplexity across multiple downstream tasks [30][32].
机器人感知大升级!轻量化注入几何先验,成功率提升31%
量子位· 2025-09-28 11:54
当前基于显式深度输入的增强方案虽有效,但依赖额外传感器或深度估计网络,存在部署难度、精度噪声等问题。 Evo-0团队 投稿 量子位 | 公众号 QbitAI 在机器人学习领域,如何让AI真正"看懂"三维世界一直是个难题。 VLA模型通常建立在预训练视觉语言模型(VLM)之上,仅基于2D图像-文本数据训练,缺乏真实世界操作所需的3D空间理解能力。 为此, 上海交通大学和剑桥大学提出一种增强视觉语言动作(VLA)模型空间理解能力的轻量化方法Evo-0, 通过隐式注入3D几何先验 , 无需显式深度输入或额外传感器。 该方法利用视觉几何基础模型VGGT, 从多视角RGB图像中提取3D结构信息 ,并融合到原有视觉语言模型中,实现空间感知能力的显著提 升。 在rlbench仿真实验中,Evo-0在5个需要精细操作的任务上,平均成功率超过基线pi0 15%,超过openvla-oft 31%。 Evo-0:实现2D–3D表征的融合 Evo-0提出将VGGT作为空间编码器,引入VGGT训练过程中针对3D结构任务提取的t3^D token。这些token包含深度上下文、跨视图空间对 应关系等几何信息。 模型引入一个cross- ...