LLaDA2.1 - filings, earnings calls, financial reports, news

LLaDA2.1

Search documents

小众架构赢麻了，通过编辑功能让100B扩散模型飙出892 tokens/秒的速度

3 6 Ke· 2026-02-11 05:21

Core Insights - The article highlights the significant advancements of Ant Group's LLaDA2.1 model, which has achieved a peak speed of 892 tokens per second in complex programming tasks, outperforming mainstream autoregressive models that operate at much lower speeds [1][18][20]. Group 1: Model Development and Features - LLaDA2.1 represents a historic shift from being a research model to a practical tool, showcasing improved efficiency and usability [2][5]. - The model introduces a dual-mode design, allowing users to switch between Speedy Mode and Quality Mode with a single configuration, thus simplifying user experience and model management [4][6]. - The Speedy Mode allows for rapid initial draft generation, while the Quality Mode focuses on accuracy, catering to different user needs [6][21]. Group 2: Technical Innovations - The model employs an Error-Correcting Editable (ECE) mechanism, enabling self-correction during the generation process, which addresses the common issues of inconsistency in earlier diffusion models [8][13]. - LLaDA2.1 successfully implements reinforcement learning (RL) on a 100B parameter diffusion model, a feat previously considered impossible, enhancing its performance in alignment tasks [16][22]. Group 3: Performance Metrics - In benchmark tests, LLaDA2.1 outperformed its predecessor LLaDA2.0 across various tasks, demonstrating superior performance in both speed and quality [22][23]. - The model's peak speed in Speedy Mode reached 892 tokens per second on the HumanEval+ benchmark, while the mini version exceeded 1500 tokens per second in certain tasks [18][24]. Group 4: Industry Implications - The advancements in LLaDA2.1 challenge the dominance of autoregressive models, suggesting a potential shift in industry standards towards more efficient and versatile models [20][26]. - The open-sourcing of LLaDA2.1 and its mini version indicates a strategic move to foster wider adoption and innovation within the AI community [24][27].

扩散语言模型

自回归模型

Artificial Intelligence

LLaDA2.1

扩散语言模型

自回归模型

Artificial Intelligence

LLaDA2.1

里程碑时刻，100B扩散语言模型跑出892 Tokens /秒，AI的另一条路走通了

3 6 Ke· 2026-02-11 04:31

Core Insights - The release of LLaDA2.1 marks a significant transformation in the field of diffusion language models (dLLM), which was previously considered a niche area. The new version includes LLaDA2.1-Mini (16 billion parameters) and LLaDA2.1-Flash (100 billion parameters) [1][3] - LLaDA2.1 achieves a peak speed of 892 tokens per second, demonstrating a practical efficiency advantage and breaking the "fast but inaccurate" paradigm with its error-correcting mechanism [3][10] - The model introduces a dual-mode system allowing users to switch between quality and speed, addressing the trade-off between these two aspects effectively [15][19] Model Performance - LLaDA2.1's 100 billion parameter version achieved a peak speed of 892 tokens per second, which is particularly notable given the complexity of tasks it can handle, such as programming benchmarks [10][11] - The model's architecture allows for parallel generation and self-correction, which enhances its usability compared to traditional autoregressive models that lack this capability [13][14] - In experimental evaluations, LLaDA2.1 outperformed its predecessor LLaDA2.0 in quality mode across various benchmarks, while also showing significant improvements in throughput in speed mode [20][22] Technical Innovations - The introduction of an Error-Correcting Editable (ECE) mechanism allows LLaDA2.1 to draft answers quickly and then edit them, enabling a more flexible and accurate output generation process [13][18] - The model employs a reinforcement learning phase to enhance its understanding of instructions and alignment with user intent, marking a first for diffusion models at this scale [16][17] - The dual-mode design allows users to configure the model for either speed or quality, simplifying user experience and model management [15][19] Industry Implications - LLaDA2.1's advancements suggest a potential shift in the landscape of AI models, challenging the dominance of autoregressive architectures and opening up new avenues for research and application in language modeling [26] - The successful implementation of a 100 billion parameter diffusion model indicates that the barriers to scaling such models may be diminishing, encouraging further investment and exploration in this area [11][26] - The model's ability to handle complex tasks efficiently positions it as a competitive alternative in the AI landscape, potentially influencing future developments in language processing technologies [10][26]

扩散语言模型

自回归模型

Artificial Intelligence

LLaDA2.1

扩散语言模型

自回归模型

Artificial Intelligence

LLaDA2.1

里程碑时刻！100B扩散语言模型跑出892 Tokens /秒，AI的另一条路走通了

机器之心· 2026-02-11 01:59

机器之心编辑部扩散语言模型（dLLM），这个曾被认为是「小众赛道」的研究方向，如今终于迎来了质变。本周一，LLaDA2.1 在 HuggingFace 上悄悄上线，距离上一版本 LLaDA2.0 发布仅仅过去了两个月。本次发布共包含两个版本： LLaDA2.1-Mini （16B）与 LLaDA2.1-Flash（100B）。作为这一赛道的标杆，LLaDA 的每一次迭代都牵动着整个方向的走向。而这一次，LLaDA2.1 几乎凭一己之力完成了扩散语言模型的「成人礼」—— 892 Tokens / 秒的峰值速度让理论上的效率优势第一次照进现实；边生成边纠错的机制，打破了「快则不准」的魔咒；再加上可切换的双模式、首次跑通的强化学习后训练…… 这些信号再明确不过：这条曾被视为小众的学术路线，已经长成了真正可用、甚至在效率上更为优越的强大工具。时至今日，逐个生成下一个 Token 的自回归模型仍是主流。但长文本生成里，计算成本高、推理速度慢只是明面上的麻烦；真正棘手却鲜被正视的是模型只能单向往前猜，看不到后文语境，写错了也没法回头改，误差像滚雪球一样越积越重。这些困境就像房间里的大象，始终横亘在规模化 ...

小众架构赢麻了！通过编辑功能让100B扩散模型飙出892 tokens/秒的速度！

量子位· 2026-02-11 01:55

金磊发自凹非寺量子位 | 公众号 QbitAI 谁能想到啊，在自回归模型（Autoregressive，AR）当道的现在，一个非主流架构的模型突然杀了回马枪—— 被长期视为学术玩具的扩散语言模型，直接在复杂编程任务中飙出了 892 tokens/秒的速度！你没看错，当主流大模型还在以几十token的速度逐字蹦词时，这个非主流模型已经在100B参数规模上，跑出了如此的速度。而这一次，LLaDA2.1的诞生，标志着这个路线的历史性转折。它不再只是一个"学术研究"，而是真正可用、甚至在效率上更为优越的强大工具。那么在整个行业都在卷更大的自回归模型时，蚂蚁到底是怎么低调修了另一条"能跑通的高速公路"的？接下来，我们就再一起扒一扒这个非共识技术背后的原理。怎么做到的？ 2025年，蚂蚁集团资深技术专家赵俊博曾经带着LLaDA2.0登上量子位MEET大会的舞台，而如今，他们的最新版本LLaDA2.1来了，蚂蚁技术研究院重磅开源！三个月前，在LLaDA2.0时代，这更多是一个充满挑战的研究性模型。在深入技术之前，我们先得聊聊为什么现在的ChatGPT、Claude们总是慢条斯理 ...