成立7个月首发声，百亿美金独角兽万字雄文：攻克LLM推理非确定性难题

Core Insights - Thinking Machines Lab has launched its flagship product named "Connection Machine" and introduced a research blog titled "Connectionism" to share advancements in AI research [1][3][4] - The blog's first article discusses the challenge of achieving reproducible results in large language model (LLM) inference, highlighting the non-deterministic nature of LLM outputs [6][9][20] Group 1: Product and Research Focus - The "Connectionism" blog will evolve with the company's research, covering topics from numerical computation to prompt engineering [3] - The name "Connection Machine" reflects a historical reference to early AI research focused on neural networks [4] Group 2: Non-Determinism in LLM Inference - Achieving reproducibility in LLM inference is crucial, yet challenging, as identical inputs can yield different outputs due to sampling processes [9][11] - The research identifies a hypothesis linking non-determinism to floating-point arithmetic and concurrent execution, suggesting that the order of operations can affect results [14][20] Group 3: Solutions for Reproducibility - The study proposes that non-determinism in LLM inference arises from batch size variations rather than atomic competition, emphasizing the need for batch-invariant operations [20][21] - Implementing batch-invariant strategies for operations like RMSNorm, matrix multiplication, and attention mechanisms is essential for achieving reproducibility [21][28][33] Group 4: Performance and Experimentation - Initial experiments with a deterministic kernel showed that while performance may decline, it remains acceptable, with a performance drop of about 20% compared to standard methods [29][43] - The research demonstrated that using a deterministic kernel resulted in identical outputs across multiple completions, contrasting with the variability seen in non-deterministic settings [42]