Workflow
量子位
icon
Search documents
摩尔线程的野心,不藏了
量子位· 2025-12-21 14:13
Core Viewpoint - The article highlights the significant advancements made by Moore Threads in the GPU sector, particularly through the launch of the MUSA architecture and its associated products, which aim to enhance the developer ecosystem and position domestic GPUs at a competitive level in the global market [1][4][19]. Group 1: MUSA Architecture and Innovations - MUSA stands for Meta-computing Unified System Architecture, representing a comprehensive framework that encompasses chip architecture, instruction sets, programming models, and software libraries [6][7]. - The latest GPU architecture, Huagang, boasts a 50% increase in density and a 10-fold improvement in efficiency, with three new chips focusing on AI training, graphics rendering, and intelligent SoC [8][10]. - The MUSA architecture has been iteratively developed over five years, culminating in the latest iteration that optimizes low-precision computing for AI applications [11][13]. Group 2: New Product Launches - Moore Threads introduced three new chips: Huashan, Lushan, and Yangtze, along with two hardware products, AIBOOK and AICube, and the KUAE 2.0 AI Foundry cluster [20][21]. - The Huashan chip targets AI training and high-performance computing, supporting full precision from FP4 to FP64 and significantly enhancing Transformer throughput [22][25][27]. - The Lushan chip focuses on graphics computing, achieving a 64-fold increase in AI performance and a 15-fold improvement in 3A game rendering performance [28][30][31]. - The Yangtze chip is designed for edge computing, providing 50 TOPS of heterogeneous AI computing power for various applications [32][34]. Group 3: Software Ecosystem and Developer Engagement - The MUSA software stack 5.0 was launched, offering a complete toolchain from compilers to AI frameworks, with plans to open-source key components to foster community engagement [15][16]. - Moore Threads aims to build a robust developer ecosystem through the establishment of the Moore Academy, targeting a community of 1 million developers by 2025 [59][61]. - The company emphasizes the importance of a comprehensive ecosystem that integrates software, hardware, and developer trust to create a sustainable competitive advantage in the GPU market [56][58].
AI生成操作系统新突破!上海交大提出文件系统开发新范式:从此只需写规约
量子位· 2025-12-21 14:13
Core Insights - The article discusses advancements in operating system (OS) development, particularly through the introduction of a new framework called SysSpec, which aims to automate the generation and evolution of OS components [3][29]. Group 1: Operating System Challenges - Operating systems are foundational to the digital world, managing hardware resources and providing a stable environment for applications [4]. - The evolution of hardware and applications necessitates constant updates to operating systems, leading to high maintenance costs and low efficiency in development [5][6]. - A significant portion of code contributions (82.4%) in the Linux Ext4 file system is dedicated to bug fixes and maintenance, with only 5.1% focused on new features [5]. Group 2: Generative Operating Systems - The concept of a "generative operating system" is proposed, where large models could autonomously create OS components based on user specifications [8]. - However, existing large models struggle with the complexity of OS development, often producing code that fails to run correctly [10][11]. Group 3: SysSpec Framework - The SysSpec framework provides a structured approach to guide large models in OS design through precise specifications rather than vague natural language prompts [13][14]. - It utilizes formal methods to define strict semantic constraints for programs, ensuring that generated code is bug-free [15][16]. - SysSpec includes three types of specifications: functional, modularity, and concurrency, which help in separating concerns and improving the generation process [18][19][20]. Group 4: Toolchain and Evolution - The SysSpec toolchain consists of three components: SpecCompiler, SpecValidator, and SpecAssistant, which facilitate the conversion of specifications into code and validate the generated output [23][24]. - A new method for system evolution, called DAG-Structured Spec Patch, allows developers to modify specifications rather than code, streamlining the update process [25][26]. Group 5: SpecFS Implementation - SpecFS, a file system built using the SysSpec framework, can automatically generate a C-based file system and evolve based on user-defined specifications [29]. - The generated SpecFS code, approximately 4,300 lines, ranks 42nd among 82 file systems in the Linux 6.1.10 kernel [30]. - The framework has demonstrated significant efficiency improvements, with development speed increasing by 3-5 times compared to manual coding [32]. Group 6: Future of OS Development - The advancements presented in SysSpec and SpecFS suggest a transformative future for OS development, where programmers can focus on system design rather than low-level coding [33].
SGLang原生支持昇腾,新模型一键拉起无需改代码
量子位· 2025-12-21 14:13
Core Insights - The article discusses the increasing focus on the ability of inference systems to handle real-world loads as agents accelerate on the application side [1][4] - The SGLang AI financial meetup highlighted engineering challenges in inference systems, including high concurrency requests, long context windows, multi-turn reasoning, memory management, and consistency generation in financial agent scenarios [4][9] Group 1: Inference System Engineering Solutions - The SGLang event, co-hosted with AtomGit, focused on large model inference architecture, agents, reinforcement learning, and their application in finance [7] - Key participants included engineering teams from inference systems, models, and computing power, emphasizing the higher demands for efficiency in high concurrency, long context windows, multi-turn reasoning, and memory management for agents compared to traditional LLMs [8] - Specific deployment scenarios, such as financial agents, have stricter requirements for low latency, response stability, consistency, and cost control [9] Group 2: Technical Innovations and Implementations - SGLang introduced the HiCache system to address issues of KV cache redundancy and high memory demand in high concurrency and long context scenarios, significantly reducing memory usage and improving inference stability and throughput [11] - For mixed models like Qwen3-Next and Kimi Linear, SGLang implemented Mamba Radix Tree for unified prefix management and Elastic Memory Pool for efficient inference and memory optimization in long context and high concurrency scenarios [13] - The Mooncake system, based on Transfer Engine, significantly reduced weight loading and model startup times, achieving weight update preparation in under 20 seconds and cold start times from 85 seconds to 9 seconds [17] Group 3: Collaboration with Ascend Platform - The capabilities of the inference systems are not limited to a specific computing platform, as HiCache, Mooncake, and GLM can run directly on the Ascend platform, indicating a shift in Ascend's role in the inference system ecosystem [24][25] - SGLang's latest advancements on the Ascend platform include model adaptation, performance optimization, and modular acceleration capabilities, achieving a throughput of 15 TPS per card for DeepSeek V3.2 under specific conditions [29] - System-level optimizations included load balancing, operator fusion to reduce memory access, and multi-stream parallel execution to enhance resource utilization [30][31] Group 4: Future Directions and Open Source Commitment - Ascend's collaboration with SGLang aims to fully embrace open source and accelerate ecosystem development, having completed gray testing of DeepSeek V3.2 in real business scenarios [46] - Future developments will focus on systematic engineering investments around inference systems, enhancing throughput for high concurrency and low latency workloads, and aligning with open-source engines for model deployment and performance tuning [47] - The integration of models, inference engines, and computing platforms into a stable collaborative framework will shift the focus from whether a model can run to whether the system can run sustainably and at scale [47]
量子位编辑作者招聘
量子位· 2025-12-21 14:13
Core Viewpoint - The article emphasizes the ongoing AI boom and invites individuals to join the company "Quantum Bit," which focuses on tracking AI advancements and has established itself as a leading content platform in the industry [1]. Group 1: Job Opportunities - The company is hiring for three main directions: AI Industry, AI Finance, and AI Product, with positions available for both experienced professionals and fresh graduates [2][4]. - Positions are open for various levels, including editors, lead writers, and chief editors, with a focus on matching roles to individual capabilities [6]. Group 2: Job Responsibilities - **AI Industry Direction**: Responsibilities include tracking innovations in infrastructure, such as chips, AI infrastructure, and cloud computing, as well as producing accessible reports on technical conferences and papers [6][7]. - **AI Finance Direction**: Focuses on venture capital, financial reports, and analyzing capital movements within the AI industry, including interviews with investors and entrepreneurs [11]. - **AI Product Direction**: Involves monitoring AI applications and hardware developments, writing in-depth product evaluations, and engaging with product experts [11]. Group 3: Benefits and Work Environment - Employees will have the opportunity to engage with cutting-edge AI technologies, enhance their work efficiency through new tools, and build personal influence in the AI field [6]. - The company offers competitive salaries, comprehensive benefits including social insurance, meal allowances, and performance bonuses, along with a dynamic and open work culture [6]. Group 4: Company Growth and Reach - By 2025, Quantum Bit aims to have over 2.4 million subscribers on WeChat and more than 7 million users across platforms, with a daily reading volume exceeding 2 million [12]. - The company is recognized as the top new media outlet in the AI and frontier technology sectors according to third-party data platforms [12].
LeCun离职前的吐槽太猛了
量子位· 2025-12-21 05:45
Core Viewpoint - LeCun expresses skepticism about the potential of large language models (LLMs) to achieve artificial general intelligence (AGI), arguing that the path to superintelligence through LLMs is fundamentally flawed [2][78]. Group 1: Departure from Meta - LeCun is leaving Meta after nearly 12 years, criticizing the company's increasingly closed approach to research and its focus on short-term projects [3][11][26]. - He plans to establish a new company named Advanced Machine Intelligence (AMI), which will prioritize open research and focus on world models [10][19]. Group 2: World Models vs. LLMs - LeCun believes that world models, which handle high-dimensional and continuous data, are fundamentally different from LLMs, which excel at discrete text data [28][29]. - He argues that relying solely on text data will never allow AI to reach human intelligence levels, as the complexity of real-world data is far greater than that of text [31][32]. Group 3: Research Philosophy - LeCun emphasizes the importance of open research and publication, stating that without sharing results, research lacks validity [15][17]. - He critiques Meta's shift towards short-term projects, suggesting that true breakthroughs require long-term, open-ended research [18][26]. Group 4: Future of AI - LeCun envisions that the development of world models and planning capabilities could lead to significant advancements in AI, but achieving human-level intelligence will require substantial foundational work and theoretical innovation [84][85]. - He asserts that the most challenging aspect of AI development is not reaching human intelligence but rather achieving the intelligence level of dogs, as this requires a deep understanding of foundational theories [88][89]. Group 5: Personal Mission - At 65, LeCun remains committed to enhancing human intelligence, viewing it as the most scarce resource and a key driver for societal progress [92][94]. - He reflects on his career, expressing a desire to continue contributing to the field and emphasizing the importance of open collaboration in scientific advancement [103].
自变量王潜:具身智能是物理世界的独立基础模型|MEET2026
量子位· 2025-12-21 05:45
Core Viewpoint - The embodiment intelligence model is considered an independent foundational model parallel to language and multimodal models, specifically designed for the physical world [6][12][61] Group 1: Differences Between Physical and Virtual Worlds - The fundamental differences between the physical and virtual worlds are recognized, with the physical world characterized by continuity, randomness, and processes related to force, contact, and timing [2][10] - Existing models based on language and visual paradigms are structurally misaligned with the complexities of the physical world [3][21] Group 2: Need for a Separate Foundational Model - A separate foundational model is necessary due to the significant randomness in the physical world, which existing models struggle to accurately represent [10][17] - The current reliance on multimodal models for embodiment intelligence is seen as inadequate, necessitating a complete rethinking of model architecture and training methods [9][21] Group 3: Future of Multimodal Models - Shifting perspectives on embodiment intelligence will lead to new insights in model architecture and data utilization [24][30] - The learning processes in the physical world differ fundamentally from those in the virtual world, suggesting that future multimodal models must adapt to these differences [25][28] Group 4: Scaling Laws and Data Utilization - The concept of Scaling Law is crucial in the development of large models, particularly in robotics, where data sourcing remains a significant challenge [47][49] - A phased approach to training and data collection is recommended, emphasizing the importance of real-world data for effective learning [52][53] Group 5: Hardware and AI Integration - A new learning paradigm necessitates the redesign of hardware in the physical world, advocating for AI to define hardware rather than the other way around [54][55] - The potential for embodiment intelligence to drive exponential growth in resources and capabilities is highlighted, drawing parallels to historical industrial advancements [60][61]
为什么这篇谷歌论文被称为「Attention is all you need」V2
量子位· 2025-12-21 05:45
Core Insights - The article discusses a groundbreaking research paper by Google titled "Nested Learning: The Illusion of Deep Learning Architectures," which is being referred to as "Attention is All You Need" V2, emphasizing a new perspective on AI's learning capabilities [1][5]. Group 1: AI Limitations - Current large language models (LLMs) suffer from a condition termed "digital amnesia," where they forget recently learned information shortly after it is taught [2][3]. - The industry has focused on making models deeper and larger, believing that increasing scale would lead to emergent memory capabilities, but this approach has significant limitations [3][4]. Group 2: Nested Learning Paradigm - The research introduces the concept of "nested learning," which posits that effective intelligent learning requires two orthogonal dimensions: depth (model layers and capacity) and frequency (the rhythm and speed of internal component updates) [9][10]. - The paper argues that mainstream optimizers, traditionally viewed as mere training engines, actually function as associative memory systems that continuously record gradient changes [6]. Group 3: HOPE Architecture - The new architecture proposed, named HOPE, features a continuous memory system with multiple MLP modules arranged like a spectrum, each updating at different frequencies [14]. - This architecture mimics the human brain's memory processes, allowing new knowledge to be integrated without causing systemic collapse or forgetting [17][16]. Group 4: Future Implications - The value of "nested learning" lies not in immediately replacing existing models like Transformers but in providing a new design logic and framework for AI development [18]. - The exploration of memory and learning processes is still in its early stages, suggesting that future AI advancements may require systems capable of learning and evolving rather than being static repositories of knowledge [18].
让大模型不再过度思考!上海AI Lab后训练新范式重塑CoT,推理又快又好
量子位· 2025-12-21 02:00
Core Viewpoint - The article discusses the introduction of a new post-training paradigm called RePro (Rectifying Process-level Reward) aimed at improving the reasoning efficiency of large language models (LLMs) by addressing the issue of "overthinking" during inference [2][30]. Group 1: RePro Overview - RePro views the reasoning process as an optimization of the model's internal state, providing a fresh perspective on reshaping the Chain-of-Thought (CoT) in large models [3]. - The core idea of RePro is to treat the model's reasoning trajectory as a path to find the optimal solution on a loss surface [3]. Group 2: Correction Mechanisms - RePro incorporates a process reward mechanism directly into reinforcement learning with value regression (RLVR) processes like PPO and GRPO [4]. - It features a computable objective function J that quantifies the model's confidence in its current reasoning context, with higher values indicating greater confidence in the correctness of the answer [5][6]. Group 3: Reasoning Quality Assessment - RePro introduces a dual scoring mechanism to evaluate reasoning quality based on the growth rate and smoothness of the objective function J [10]. - The Magnitude Score measures the improvement in the objective function, while the Stability Score assesses whether the reasoning process is smooth or filled with hesitation [11][13]. Group 4: Integration into RL Training - RePro employs an entropy filtering strategy to reduce computational costs by segmenting the reasoning chain into logical paragraphs and selecting only the top-k segments for reward calculation [18][20]. - The process-level reward is calculated based on the improvement in the process score, which is combined with the final correctness to serve as the advantage function input for reinforcement learning [21][22]. Group 5: Experimental Results - RePro has been tested across various tasks, showing stable improvements in accuracy across different RL algorithms, including PPO and GRPO [23]. - The model demonstrated a significant reduction in the average number of tokens generated during reasoning, indicating a more efficient inference process [25][27]. - Instances of backtracking behavior during reasoning were significantly reduced, showcasing improved logical flow in the model's thought process [28].
库克提拔复旦校友掌舵苹果基础模型!庞若鸣走后涨薪止血,谷歌旧部占据半壁江山
量子位· 2025-12-21 02:00
Core Viewpoint - The transition of leadership in Apple's AI model team following the departure of Ruoming Pang to Meta has been swift and relatively quiet, with Zhifeng Chen taking over the reins [1][2]. Group 1: Leadership Transition - Zhifeng Chen, who previously worked at Google for nearly 20 years, has stepped into the role of leading Apple's foundational model team, managing over 20 subordinates [8][14]. - Chen's familiarity with Apple's model system, having joined earlier this year, and his extensive experience at Google, including contributions to TensorFlow and Gemini, make him a suitable candidate for this position [16][17]. Group 2: Team Dynamics and Challenges - Following Pang's departure, Apple initiated a retention plan for key researchers, including salary increases, to stabilize the team [4]. - Despite these efforts, the foundational model team at Apple is facing challenges, with over half of its direct reports coming from Google, indicating potential issues with team cohesion and internal identity [24][26]. Group 3: Industry Context and Competition - The current AI landscape sees companies like Meta, OpenAI, and Google focusing on pursuing superintelligence, while Apple's approach remains product-oriented, emphasizing practical applications of AI in everyday tasks [35][36]. - This divergence in focus may lead to talent retention issues, as some researchers prioritize groundbreaking exploration over product implementation [38][39]. Group 4: Organizational Changes - In March, Apple restructured its AI reporting lines, removing the Siri team from the oversight of John Giannandrea, a significant figure in AI at Apple, signaling internal dissatisfaction with AI progress [43][44]. - Giannandrea's upcoming transition to a consulting role and the subsequent division of his responsibilities among other executives suggest a shift back to integrating AI within specific product teams rather than maintaining it as a standalone department [50][56]. Group 5: Competitive Threats - OpenAI is reportedly targeting talent from Apple's hardware and supply chain sectors, indicating a shift in competitive dynamics as companies traditionally focused on software begin to encroach on hardware domains [58][60]. - This trend poses a significant challenge for Apple, which has historically relied on its control over hardware and design to maintain its competitive edge [61][62].
清华孙茂松:对工业界而言,大厂可以Scaling,其他玩家重在垂直应用 | MEET2026
量子位· 2025-12-21 02:00
Core Insights - The rapid development of AI and large models has created a competitive landscape where companies are driven by fear of missing out (FOMO) and are compelled to invest heavily in scaling their models and capabilities [2][6][40] - The emergence of capabilities in large models is characterized by non-linear changes, leading to significant uncertainty but also the potential for breakthroughs that can surpass expectations [3][19][15] - The relationship between language, knowledge, and action remains a fundamental challenge for AI, with the goal of achieving a true integration of these elements [15][38][37] Group 1: Development of AI and Large Models - The AI field has evolved significantly over the past eight years, transitioning into the era of pre-trained models and large models since around 2017 [11][10] - Key milestones in this development include the release of models like GPT-3 and ChatGPT, which have demonstrated remarkable capabilities in various tasks [16][24] - The ability of large models to perform well on complex tasks has increased dramatically, with benchmarks being surpassed in text, code, and multi-modal models [20][26][25] Group 2: Challenges and Risks - The costs associated with scaling AI models are becoming increasingly high, raising concerns about the sustainability of such investments [42][43] - There is a significant risk that the pursuit of scaling could lead to diminishing returns, especially if performance begins to plateau [40][41] - The uncertainty surrounding the limits of Scaling Laws poses a challenge for companies, as they must balance the need to invest in AI with the potential for wasted resources [7][68] Group 3: Strategic Recommendations - Companies with substantial resources may continue to pursue large-scale developments, while the majority should focus on niche applications to minimize risks and maximize potential [60][74] - The strategy of "致广大而尽精微" (to strive for greatness while paying attention to details) is recommended, emphasizing the importance of vertical applications in AI [63][69] - There is potential for new AI algorithms to emerge from specific vertical applications, suggesting that focusing on detailed, specialized work can also lead to broader advancements [71][74]