量子位 - filings, earnings calls, financial reports, news

量子位

Search documents

量子位· 2025-09-30 04:36

Core Viewpoint - The article highlights a significant wireless security vulnerability in multiple models of robots from Unitree, which allows attackers to gain root access and potentially create a zombie network of infected robots [1][4]. Vulnerability Details - Various models of Unitree robots have a serious vulnerability in their BLE (Bluetooth Low Energy) Wi-Fi configuration interface, enabling attackers to achieve maximum control [2]. - Attackers can bypass authentication using a hardcoded key in the firmware, allowing them to execute commands with root privileges [10][11]. - The vulnerability is characterized as "wormable," meaning that once one robot is compromised, it can automatically infect other nearby Unitree devices [15][16]. Researcher Communication - The researchers who discovered the vulnerability, Andreas Makris and Kevin Finisterre, communicated with Unitree multiple times since May 2023, but progress on fixing the issue was minimal [20][21]. - The researchers publicly released a toolchain called UniPwn on GitHub, which exploits the vulnerability, revealing that multiple security flaws still exist in Unitree's firmware as of September 20, 2025 [22][23]. Company Response - In response to the growing concerns, Unitree acknowledged the security issues and stated that they have formed a product security team to enhance product safety [6][25]. - The company claimed to have completed most of the necessary fixes and will push updates to users soon [25]. - Unitree expressed gratitude for external oversight and aims to collaborate with others to improve safety in the robotics field [27][31].

Claude Sonnet 4.5被炸出来了，依旧最强编程，连续30小时自主运行写代码

量子位· 2025-09-30 00:57

Core Insights - The article discusses the release of Claude Sonnet 4.5, which has shown significant improvements over its predecessor, Claude Sonnet 4, in various performance metrics [2][8]. Performance Improvements - Claude Sonnet 4.5 achieved a score of 82.0% on the SWE-bench, an increase of 1.8 percentage points from Sonnet 4 [2]. - In the OSWorld test, it scored 60.2, nearly a 50% improvement over Sonnet 4 [7]. - The model can autonomously write code for up to 30 hours, producing over 11,000 lines of code, which is a significant increase from the previous model's 7-hour capability [3][5]. Benchmark Comparisons - Claude Sonnet 4.5 outperformed other models in various benchmarks, including: - Agentic coding: 77.2% [10] - Terminal-Bench: 50.0% [10] - High school math (AIME 2025): 100% accuracy with Python and 87% without tools [9][10]. - In specialized fields like finance, healthcare, and law, it showed over 60% win rates against baseline models [11]. Safety and Alignment - The model has undergone safety training to reduce undesirable behaviors such as flattery and deception, with a significant decrease in false positives from 0.15% to 0.02% [12][13]. - Claude Sonnet 4.5 has made notable advancements in defending against immediate injection attacks [12]. Pricing and Accessibility - The pricing for Claude Sonnet 4.5 remains the same as Sonnet 4, at $3 per million input tokens and $15 per million output tokens [24]. New Features and SDK - The Claude Agent SDK has been upgraded to support the development of general autonomous agents, enhancing its capabilities beyond just coding tasks [27]. - A new feature called "Imagine with Claude" allows users to generate software in real-time based on their requirements, facilitating the creation of functional prototypes without existing templates [32].

Artificial Intelligence

Claude Sonnet 4.5

Claude Agent SDK

Imagine with Claude

Artificial Intelligence

Claude Sonnet 4.5

Claude Agent SDK

Imagine with Claude

DeepSeek突然拥抱国产GPU语言！TileLang对标CUDA替代Triton，华为昇腾Day0官宣支持适配

量子位· 2025-09-30 00:57

Core Viewpoint - The article highlights the significance of TileLang, a domain-specific language for GPU kernel development, which has been adopted by DeepSeek in its v3.2 update, showcasing its performance advantages over traditional methods like Flash Attention 2 [1][6][26]. Group 1: TileLang Overview - TileLang is designed to simplify the development of high-performance GPU/CPU kernels, comparable to NVIDIA's CUDA, and is recommended by DeepSeek for experiments due to its debugging and rapid iteration advantages [6][10]. - The language allows developers to write efficient code with significantly reduced lines, achieving performance parity with existing implementations [5][8]. - TileLang's development is led by a team from Peking University, including key figures such as Wang Lei and Dong Yuqi [15][19]. Group 2: DeepSeek's Adoption of TileLang - DeepSeek's choice to use TileLang was first showcased at the Beijing Zhiyuan Conference in June, where its potential for faster operator implementation was discussed [10][11]. - The integration of TileLang has been recognized by industry leaders, including Huawei, which announced support for the language [7][4]. - DeepSeek's v3.2 release demonstrates that TileLang can effectively be used for model training, validating its capabilities in real-world applications [34][26]. Group 3: Performance and Technical Aspects - TileLang provides three programming interfaces catering to different developer expertise levels, from beginners to performance-focused experts [20][21][23]. - The language's architecture allows for decoupling scheduling space from data flow, enabling more efficient optimization by the compiler [19]. - DeepSeek's implementation of TileLang has resulted in significant performance improvements, with claims of achieving a 30% speed increase over traditional methods [5][27].

Artificial Intelligence

Domain-specific Language

Software

TileLang

DeepSeek v3.2

Artificial Intelligence

Domain-specific Language

Software

TileLang

DeepSeek v3.2

DeepSeek新模型上线！引入DSA新稀疏注意力，还又狙了CUDA一枪

量子位· 2025-09-29 10:44

Core Insights - DeepSeek has launched its latest model, DeepSeek-V3.2-Exp, which introduces a new attention mechanism called DeepSeek Sparse Attention (DSA) [1][6] - The model aims to enhance long text processing and inference efficiency without significantly affecting output quality [7] - A significant price reduction for the official API has been announced, starting at 50% off [3][17] Model Updates - DeepSeek-V3.2-Exp is built on the previous version, DeepSeek-V3.1-Terminus, which focused on stability, tool invocation capabilities, language consistency, and error correction [9] - In benchmark comparisons, DeepSeek-V3.2-Exp shows comparable performance to DeepSeek-V3.1-Terminus across various evaluation sets [10] - The model demonstrates improved inference costs when handling 128K long contexts, particularly during the decoding phase [12] Technical Innovations - The introduction of DSA allows for fine-grained attention mechanisms, leading to significant improvements in processing efficiency [6][7] - DeepSeek has open-sourced GPU operators in both TileLang and CUDA versions, facilitating research and development [13][15] - The company recommends using the TileLang version for debugging and rapid iteration during experimental research [16] Community Engagement - The announcement includes a call to action for the community to engage with the new model and take advantage of the promotional pricing [18] - Links to access the model on platforms like HuggingFace and ModelScope have been provided [19]

Artificial Intelligence

DeepSeek Sparse Attention (DSA)

Artificial Intelligence

DeepSeek-V3.2-Exp

GLM-4.6

Artificial Intelligence

DeepSeek Sparse Attention (DSA)

Artificial Intelligence

DeepSeek-V3.2-Exp

GLM-4.6

十亿级参数，千亿级性能，上海AI Lab发布新一代文档解析大模型，复杂场景解析精度媲美人类专家

量子位· 2025-09-29 10:44

MinerU2.5团队投稿量子位 | 公众号 QbitAI 大模型越来越大，参数量动辄千亿，但真要在实际场景里做到"高精度+高效率"，却并不容易。 | Model Type | Models | Slides | Academic | Book | Textbook | Exam | Magazine | Newspaper | Notes | Financial | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | | | | Papers | | | Papers | | | | Report | | | Marker-1.8.2 [32] | 0.1796 | 0.0412 | 0.1010 | 0.2908 | 0.2958 | 0.1111 | 0.2717 | 0.4656 | 0.0341 | | Pipeline | MinerU2-pipeline [46] | 0.4244 | 0.0230 | 0.2628 | 0.1224 | 0.0822 | 0.395 | 0.0736 | 0.2603 ...

前馈3D高斯泼溅新方法，浙大团队提出“体素对齐”，直接在三维空间融合多视角2D信息

量子位· 2025-09-29 04:57

Core Viewpoint - The article discusses the rapid industrialization of Feed-Forward 3D Gaussian Splatting (3DGS) and introduces VolSplat, which abandons the traditional pixel-aligned strategy in favor of a voxel-aligned framework, enhancing robustness, efficiency, and engineering feasibility in multi-view rendering [1][2]. Summary by Sections Introduction to VolSplat - VolSplat addresses the limitations of existing pixel-aligned methods, which struggle with precise alignment of 2D features in 3D space and are constrained by the pixel grid in Gaussian density allocation [2][6]. Performance Comparison - Experimental results on public datasets like RealEstate10K and ScanNet show that VolSplat outperforms various pixel-aligned baselines in visual quality and geometric consistency [4][5]. Core Concepts of VolSplat - The core idea of VolSplat is to shift alignment from 2D to 3D, allowing for better integration of multi-view information and overcoming challenges related to multi-view consistency and Gaussian density allocation [6][9]. Methodology Breakdown - The VolSplat pipeline consists of three clear modules: 1. 2D feature extraction and depth estimation 2. Lifting pixels to voxels and feature aggregation 3. Sparse 3D refinement and Gaussian regression [9][11]. Step-by-Step Process - **Step 1**: 2D features are extracted using a shared encoder, and depth maps are constructed to provide necessary geometric priors for subsequent processing [11]. - **Step 2**: Pixels are projected into 3D space based on predicted depths, creating a point cloud that is voxelized for feature aggregation, enhancing cross-view consistency [12][13]. - **Step 3**: A sparse 3D U-Net refines voxel features, predicting corrections for each voxel and regressing Gaussian parameters for rendering [14]. Experimental Highlights - VolSplat demonstrates superior zero-shot generalization across datasets, maintaining high performance even on unseen data, with a PSNR of 32.65 dB on the ACID dataset [15][17]. Practical Implications - The advancements in VolSplat lead to fewer artifacts and better geometric fidelity, translating to improved user experiences in applications like virtual tours and indoor navigation [17][19]. Future Directions - VolSplat opens new avenues for research in 3D reconstruction, robotics, autonomous driving, and AR/VR, providing a unified framework for integrating multimodal data [19][20].

量子位· 2025-09-29 04:57

Core Viewpoint - Huawei has emerged as a strong competitor in the AI model landscape, particularly highlighted by its performance in the latest SuperCLUE benchmark evaluation, showcasing its capabilities in various dimensions of AI model performance [1][2]. Group 1: Model Rankings and Performance - The top three models in the SuperCLUE evaluation based on open-source and domestic criteria are: 1. DeepSeek-V3.1-Terminus-Thinking 2. openPangu-Ultra-MoE-718B 3. Qwen3-235B-A22B-Thinking-2507 [5]. - Huawei's openPangu-Ultra-MoE-718B model, with 718 billion parameters, stands out due to its unique training philosophy that emphasizes quality over sheer data volume [6][35]. Group 2: Data Quality and Training Strategy - The openPangu team adheres to three core principles in post-training data construction: quality first, diversity coverage, and complexity adaptation [10][21]. - A comprehensive framework for data generation, scientific selection, and precise enhancement has been established to ensure high data quality, which is crucial for improving the model's reasoning capabilities in complex scenarios [13][35]. Group 3: Pre-training and Optimization Techniques - The pre-training process for openPangu-718B is divided into three stages: General, Reasoning, and Annealing, each focusing on different aspects of knowledge and reasoning enhancement [15][35]. - The model employs a "Critique Internalization" mechanism to mitigate hallucinations, allowing it to self-evaluate its reasoning process and improve output reliability [19][22]. Group 4: Tool Usage and Agent Capabilities - To enhance the model's ability to use tools, the team developed the ToolACE framework, which generates high-quality, complex multi-tool interaction data for training [23][26]. - The model's training includes a three-step post-training fine-tuning scheme to optimize performance, ensuring a balance between underfitting and overfitting [27][29]. Group 5: Technical Innovations and Industry Implications - The systematic technical innovations across various training stages contribute to the superior performance of openPangu-718B, setting a valuable example for the industry on the importance of meticulous technical refinement and deep insights into core challenges [35].

Artificial Intelligence

openPangu-Ultra-MoE-718B

Artificial Intelligence

openPangu-Ultra-MoE-718B

Flash Attention作者最新播客：英伟达GPU统治三年内将终结

量子位· 2025-09-29 04:57

Group 1 - The core argument is that Nvidia's dominance in the GPU market will face increasing competition within the next 2-3 years as specialized chips for different workloads emerge, leading to a more diversified ecosystem [6][9][23] - Tri Dao emphasizes that the architecture for AI models, particularly the Transformer, is stabilizing, but there are still ongoing changes and challenges in chip design and workload adaptation [11][12][21] - The future of AI workloads will include three main types: traditional chatbots, ultra-low latency scenarios, and large-scale batch processing, which will require tailored optimizations from hardware vendors [24][96] Group 2 - The cost of inference has decreased by approximately 100 times since the launch of ChatGPT, driven by improvements in model efficiency and inference optimization techniques [73][75][90] - Techniques such as model quantization and collaborative design between model architecture and hardware have significantly contributed to this cost reduction [82][84][88] - There is still an estimated potential for a further 10-fold improvement in inference optimization, particularly through specialized hardware and model advancements [90][93][95] Group 3 - The AI hardware landscape is expected to diversify as companies like Cerebras, Grok, and SambaNova introduce solutions that emphasize low-latency inference and high throughput for various applications [23][24][96] - The emergence of specialized AI inference providers will lead to different trade-offs, with some focusing on broad coverage while others aim for excellence in specific scenarios [96][97] - The evolution of AI workloads will continue to drive demand for innovative solutions, particularly in real-time video generation and agentic applications that require seamless integration with human tools [117][115][120]

Nvidia(US:NVDA)

AGI

Transformer

Mixture of Experts (MoE)

Mixture of Experts (MoE)

模型推理成本

人工智能

芯片制造

8.9ms，推理速度新记录！1块钱百万token，浪潮信息AI服务器加速智能体产业化

量子位· 2025-09-29 04:57

Core Viewpoint - The article discusses the advancements made by Inspur Information in AI computing infrastructure, specifically through the introduction of the Meta-Brain HC1000 and SD200 servers, which significantly reduce AI inference costs and improve processing speed, addressing key challenges in the commercialization of AI agents [2][43]. Group 1: Speed and Cost Reduction - The Meta-Brain HC1000 server reduces the cost of generating one million tokens to just 1 yuan, achieving a 60% reduction in single-card costs and a 50% reduction in system costs [26][27]. - The Meta-Brain SD200 server achieves an end-to-end inference latency of under 10 milliseconds, with a token output time of only 8.9 milliseconds, nearly doubling the performance of previous state-of-the-art systems [10][12]. - The combination of these servers provides a high-speed, low-cost computational infrastructure essential for the large-scale deployment of multi-agent collaboration and complex task inference [8][43]. Group 2: Technological Innovations - The Meta-Brain SD200 employs an innovative multi-host 3D Mesh architecture that integrates GPU resources across multiple hosts, significantly enhancing memory capacity and reducing communication latency [19][21]. - The server's communication protocol is simplified to three layers, allowing for direct GPU access to remote memory, which minimizes latency to the nanosecond level [21][22]. - The HC1000 server optimizes the inference process by decoupling different computational stages, improving resource utilization and reducing power consumption [39][40]. Group 3: Market Implications - The demand for tokens in AI applications is surging, with a 50-fold increase in token consumption for programming assistance over the past year, leading to an average monthly cost of $5,000 per deployed agent [30][31]. - The article emphasizes that as the complexity and frequency of tasks increase, the cost of tokens will become a bottleneck for large-scale deployment unless reduced significantly [34][35]. - The shift from general-purpose computing architectures to specialized AI computing systems is necessary to meet the growing computational demands of the AI agent era [46][50].

GPT-5为量子计算提供关键思路！大牛盛赞：不到半小时给出“灵魂一击”

量子位· 2025-09-29 03:46

Core Viewpoint - GPT-5 is potentially underestimated in its capabilities, particularly in assisting with complex quantum computing problems, as demonstrated by its role in providing critical insights during a recent research collaboration [1][20][26]. Group 1: GPT-5's Role in Quantum Research - Scott Aaronson, a prominent figure in quantum computing, expressed that the insights provided by GPT-5 were impressive enough to be considered as coming from a highly intelligent student [2][3]. - In a recent collaboration, GPT-5 contributed significantly to a paper titled "Limits to black-box amplification in QMA," which explores the limitations of amplification techniques in quantum complexity classes [5][4]. - The research involved analyzing how the maximum eigenvalue of a Hermitian matrix changes with parameters, a task that GPT-5 helped expedite, leading to a breakthrough in the research [22][25]. Group 2: Quantum Complexity Class (QMA) - QMA (Quantum Merlin Arthur) is a complexity class that describes a verification process where a verifier (Arthur) checks the validity of a quantum state provided by a prover (Merlin) [9][10]. - A long-standing question in QMA is whether the completeness error can be improved from 2/3 to 1, meaning whether a verifier can always accept a correct answer with certainty [10][12]. - Recent findings by researchers indicate that any QMA protocol can be amplified to achieve an exponentially small completeness error, showcasing the potential for significant advancements in quantum computing [15][19]. Group 3: Industry Reactions and Developments - The collaboration between researchers and GPT-5 has sparked discussions about the changing dynamics of research and the role of AI in scientific discovery [27][28]. - There are concerns regarding OpenAI's recent model downgrades, which have led to user dissatisfaction and calls for transparency in model usage [30][31]. - OpenAI has responded to these concerns by stating that the model switching is part of a "safety routing test" aimed at handling sensitive topics more rigorously [31].

Quantum Computing

Quantum Complexity Theory

Artificial Intelligence

GPT-5

ChatGPT

Quantum Computing

Quantum Complexity Theory

Artificial Intelligence

GPT-5

ChatGPT

Previous Next