同策略强化学习

Search documents
融资20亿美元的Thinking Machines Lab首次公开:破解LLM随机性,实现可复现的“确定性”推理
锦秋集· 2025-09-11 09:19
Core Insights - The article discusses the fundamental issue of reproducibility in large language models (LLMs), attributing the uncertainty in inference results not to "concurrent computation and floating-point errors" but to the lack of "Batch Invariance" in core computational operators [1][7][11]. Group 1: Problem Identification - The article identifies that inference servers dynamically batch user requests, leading to results that depend on the batch size and composition, introducing inherent uncertainty [1][29]. - It challenges the common belief that floating-point non-associativity is the primary cause of uncertainty, suggesting that the real issue lies in how kernel functions are implemented [20][21]. Group 2: Proposed Solutions - The authors propose a solution by rewriting key computational modules in the Transformer model—RMSNorm, matrix multiplication, and attention mechanisms—to ensure they possess "Batch Invariance," thus making the computation independent of batch size [2][34]. - Experimental results demonstrate that with the new approach, repeated requests yield consistent results, contrasting with the previous method where 1000 identical requests produced 80 different outputs [2][75]. Group 3: Technical Implementation - The article details the implementation of batch-invariant RMSNorm, matrix multiplication, and attention mechanisms, emphasizing the need for consistent reduction strategies that do not depend on batch size [34][47][62]. - It highlights the challenges in maintaining batch invariance, particularly in attention mechanisms where the reduction order must remain consistent regardless of the number of tokens processed [66][72]. Group 4: Performance Analysis - The performance of the batch-invariant kernels is evaluated, showing a 20% performance loss compared to cuBLAS, but still maintaining acceptable efficiency for LLM inference [59][78]. - The article notes that while the performance of the batch-invariant implementation is not optimized, it remains viable for practical applications [78]. Group 5: Implications for Reinforcement Learning - The article discusses the implications of achieving deterministic inference for reinforcement learning (RL), enabling true on-policy RL by ensuring consistent results between training and inference [79][83]. - It emphasizes that achieving bitwise identical results between the sampler and trainer is crucial for effective RL training [80].
成立7个月首发声,百亿美金独角兽万字雄文:攻克LLM推理非确定性难题
3 6 Ke· 2025-09-11 08:11
Core Insights - Thinking Machines Lab has launched its flagship product named "Connection Machine" and introduced a research blog titled "Connectionism" to share advancements in AI research [1][3][4] - The blog's first article discusses the challenge of achieving reproducible results in large language model (LLM) inference, highlighting the non-deterministic nature of LLM outputs [6][9][20] Group 1: Product and Research Focus - The "Connectionism" blog will evolve with the company's research, covering topics from numerical computation to prompt engineering [3] - The name "Connection Machine" reflects a historical reference to early AI research focused on neural networks [4] Group 2: Non-Determinism in LLM Inference - Achieving reproducibility in LLM inference is crucial, yet challenging, as identical inputs can yield different outputs due to sampling processes [9][11] - The research identifies a hypothesis linking non-determinism to floating-point arithmetic and concurrent execution, suggesting that the order of operations can affect results [14][20] Group 3: Solutions for Reproducibility - The study proposes that non-determinism in LLM inference arises from batch size variations rather than atomic competition, emphasizing the need for batch-invariant operations [20][21] - Implementing batch-invariant strategies for operations like RMSNorm, matrix multiplication, and attention mechanisms is essential for achieving reproducibility [21][28][33] Group 4: Performance and Experimentation - Initial experiments with a deterministic kernel showed that while performance may decline, it remains acceptable, with a performance drop of about 20% compared to standard methods [29][43] - The research demonstrated that using a deterministic kernel resulted in identical outputs across multiple completions, contrasting with the variability seen in non-deterministic settings [42]