Workflow
Kimi Linear
icon
Search documents
AI产业跟踪:月之暗面发布全新注意力架构:KimiLinear,持续关注AgentLLM技术迭代
Changjiang Securities· 2025-11-06 11:05
Investment Rating - The report maintains a "Positive" investment rating for the industry [8]. Core Insights - On October 31, the company "月之暗面" launched a new hybrid linear attention architecture called Kimi Linear, aimed at addressing the computational efficiency and performance bottlenecks faced by current LLMs in handling long sequence tasks. The core code has been open-sourced and validated [2][5]. - Kimi Delta Attention (KDA) enhances expressive capability through a refined gating mechanism and a highly optimized block processing algorithm, potentially opening a new paradigm for cost reduction in token consumption [2][10]. - The report emphasizes continued optimism for the domestic AI industry chain, recommending shovel stocks and major players with significant positioning advantages [2][10]. Summary by Sections Event Description - The launch of Kimi Linear focuses on solving the core bottlenecks of traditional Transformers in long text processing and agent-based reasoning, with a 3:1 mixed hierarchical structure that reduces KV cache by 75% and improves long sequence decoding efficiency [10]. Performance Comparison - Kimi Linear outperforms Full Attention in various metrics, achieving the highest accuracy across tasks as sequence length increases, with significant improvements in convergence speed compared to GDN [10]. - In long context performance, Kimi Linear scores 54.5, surpassing MLA (52.2) and GDN-H (51.2), demonstrating its robustness in handling long texts [10]. Efficiency Comparison - Kimi Linear shows a dramatic advantage in decoding speed, requiring only 1.84ms per token for a 1M length, which is 6.3 times faster than MLA [10]. - The memory usage of Kimi Linear's KV cache is approximately 25% of that of the pure MLA model, indicating a potential for lower inference costs and improved user experience [10]. Future Outlook - The report suggests that KDA represents a significant potential for linear attention in various applications, particularly in long text reasoning and enterprise-level knowledge systems, with a focus on reducing inference costs and delays for large-scale deployment [10].
Kimi Linear一作张宇:关于模型训练的一些感想
自动驾驶之心· 2025-11-06 00:04
作者 | yzhangcs@知乎 编辑 | 青稞AI 原文链接:https://www.zhihu.com/question/1967345030881584585/answer/1967730385816385407 点击下方 卡片 ,关注" 大模型之心Tech "公众号 戳我-> 领取大模型巨卷干货 本文只做学术分享,如有侵权,联系删文 ,欢迎添加小助理微信AIDriver004做进一步咨询 终于忙完了 Kimi Linear 的 Model Card 和 Paper ArXiv 上传,放空了半天。现在稍微分享一下个人感想,顺便做一些澄清。 Paper:https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf代码:https://github.com/Moonshot 模型架构 模型整体架构设计如图所示,延续了 Moonlight 的设计思路,别的回答已经有不少优秀的解读了。这次最大的不同在于我们将MoE的稀疏度设置 得更激进,从8到32。 而 Kimi Linear 的核心设计原则,第一主要采用Linear Attenti ...
Kimi开源新线性注意力架构,人工智能AIETF(515070)持仓股三六零盘中涨超7%
Mei Ri Jing Ji Xin Wen· 2025-11-03 02:54
A股三大指数低开后跌幅扩大,创业板指跌幅扩大至1%。板块方面,海南、游戏、光热发电、影视院 线等板块涨幅居前,贵金属、芬太尼、电池等板块跌幅居前。截至10点25分,人工智能AIETF (515070)跌1.53%,其持仓股三七互娱涨停,三六零盘中涨7.1%,石头科技跌5.2%,澜起科技跌 4.98%,恒玄科技跌3.77%,浪潮信息跌3.51%。 消息面上,月之暗面开源混合线性注意力架构Kimi Linear,该架构首次在短上下文、长上下文、强化学 习扩展机制等各种场景中超越了Transformer架构的全注意力机制(Full Attention)。其核心创新在于 "Kimi Delta Attention",这是对 Gated DeltaNet 的优化升级,引入了更高效的门控机制,以提升有限状 态 RNN(递归神经网络)记忆的使用效率。官方数据显示,在处理1M token 的场景下,Kimi Linear 的 KV cache 占用量减少了75%,解码吞吐量最高提升了6倍。而在 TPOT训练速度的提升上,相较于传统 MLA,Kimi Linear 实现了6.3倍的加速。 中信建投证券分析指出,AI大模型的发展 ...
腾讯研究院AI速递 20251103
腾讯研究院· 2025-11-02 16:06
Group 1: AI Security Solutions - OpenAI has launched the "white hat" Agent Aardvark powered by GPT-5, capable of automatically identifying and fixing security vulnerabilities in codebases, having recognized 92% of known and artificially injected vulnerabilities [1] - Aardvark's workflow includes threat modeling, submission scanning, sandbox validation, and Codex repair, utilizing LLM reasoning capabilities to operate like human security researchers [1] - Major tech companies such as Google, Anthropic, and Microsoft have also released similar white hat agents in October to address the increasing number of vulnerabilities and the sophistication of attack methods in the AI era [1] Group 2: AI Programming Models - The AI programming application Cursor and Windsurf's newly released models, Composer-1 and SWE-1.5, are suspected to be based on Chinese models, with Cursor showing a tendency to respond in Chinese [2] - Users discovered that Cursor Composer-1 employs the same tokenizer as DeepSeek, while Windsurf's claims of being self-developed were contradicted by its ties to the GLM model developed by Zhiyu [2] - Chinese open-source models dominate performance rankings, filling the top 5 and even top 10, making them a rational choice for startups due to their cost-effectiveness [2] Group 3: Attention Mechanisms in AI Models - Linear attention mechanisms are making a comeback, with domestic models like MiniMax-M1, Qwen3-Next, and DeepSeek V3.2 adopting linear or sub-quadratic attention variants [3] - The new MiniMax model M2 has reverted to traditional attention, citing accuracy issues with linear attention in reasoning and multi-turn dialogue tasks [3] - Kimi Linear proposes a hybrid attention strategy, combining three linear attention blocks with one full attention block, achieving a 75% reduction in KV cache and up to a 6x increase in decoding throughput [3] Group 4: Canva's AI Innovations - Canva, valued at $42 billion, has introduced a self-training foundational model capable of producing complete design files with editable layers and has made the acquired Affinity tool permanently free [4] - The core feature, Ask @Canva, is deeply integrated into the design interface, allowing users to modify elements using natural language, with AI also providing suggestions for design improvements [4] - Canva's annual revenue is approximately $3 billion, with over 240 million monthly active users, and it is expected to go public in 2026, directly competing with Adobe for a 70% market share [4] Group 5: Neuralink's Ambitions - Elon Musk announced that the first Neuralink recipient, Noland Arbaugh, may be the first to receive upgrades or dual chip implants, predicting that Neuralink users could eventually outperform others in gaming [5] - Neuralink has had 12 users with a cumulative usage of over 2,000 days and a total active time exceeding 15,000 hours, with research results from the first three trial participants submitted to the New England Journal of Medicine [5] - The company has initiated a new clinical trial called "thought-to-text," aiming to implant 20,000 individuals annually by 2031, targeting annual revenue exceeding $1 billion and applications for healthy individuals starting in 2030 [5] Group 6: AI in Speech Therapy - A research team from Stanford University tested 15 mainstream models for speech disorder recognition, with the best-performing model achieving only 55% accuracy, below the FDA's clinical standard of 80-85% [6] - The study revealed biases in the models, with better performance on male voices compared to female, and English speakers outperforming those using other languages, as well as older children over younger ones [6] - Fine-tuning techniques have shown promise, with performance accuracy improving by 10% after utilizing a small dataset of children's speech for fine-tuning, indicating the potential of multimodal language models in speech pathology applications [6] Group 7: AI Workflow Transformation - Brex, valued at $12.3 billion, is transforming its internal AI platform into a product, built on Retool and reusing external AI capabilities, maintained by a 25-person systems engineering team [7] - The COO is restructuring the operational workflow, delegating L1 tasks to AI, shifting L2 roles from managers to managing agents, and evolving L3 responsibilities from problem-solving to system design, predicting a 5 to 10 times increase in operational efficiency [7] - Recruitment strategies are shifting from favoring specialists to generalists, with interviews focusing on AI usage habits, requiring AI case studies, and assessing AI application capabilities through real business challenges [7] Group 8: OpenAI's Restructuring - OpenAI has completed a restructuring, with a non-profit foundation holding shares valued at $130 billion, becoming one of the largest charitable foundations globally, with an initial investment of $25 billion for healthcare and AI safety [8] - A new agreement stipulates that OpenAI's current and future AGI model APIs will be exclusively deployed on Azure for seven years, with Microsoft holding approximately 32.5% of OpenAI's shares valued at around $135 billion [8] - Both parties have signed a $250 billion pre-purchase contract for Azure, with Microsoft's capital expenditure reaching $34.9 billion last quarter, a 40% increase from the previous quarter, primarily directed towards new data centers and AI chip procurement [8] Group 9: Legal Issues Surrounding OpenAI - Ilya Sutskever testified for nearly 10 hours in the lawsuit filed by Elon Musk against OpenAI [9] - Ilya submitted a 52-page memorandum detailing allegations against Altman, including accusations of deceiving the board, sowing discord, creating chaos, and enabling the growth of Anthropic [9] - Following Altman's dismissal, the board seriously considered the possibility of merging with Anthropic and appointing Dario Amodei as CEO, but this plan fell through due to operational challenges and a revolt from 700 employees [10]
刚刚,Kimi开源新架构,开始押注线性注意力
机器之心· 2025-10-31 04:11
Core Insights - The article discusses the advancements in attention mechanisms, particularly focusing on the Kimi Linear architecture, which combines linear attention and full attention to improve efficiency and performance in various tasks [1][2][4]. Group 1: Kimi Linear Architecture - Kimi Linear introduces a new hybrid linear attention architecture called Kimi Delta Attention (KDA), which optimizes memory usage in limited state RNNs through a more efficient gating mechanism [4][10]. - The architecture features a 3:1 ratio of KDA layers to periodic full attention layers, significantly reducing memory usage while maintaining or exceeding the quality of full attention [10][32]. - Kimi Linear has a total of 48 billion parameters, with 3 billion activated parameters, and can handle context lengths of up to 1 million tokens [5][10]. Group 2: Performance and Efficiency - Kimi Linear demonstrates superior performance across various tasks, outperforming traditional full attention methods, especially in long-context tasks, by reducing the need for large key-value caches by up to 75% [5][10]. - The model achieves a decoding throughput that is six times faster than complete multi-head attention models when processing long contexts [5][59]. - In comparative evaluations, Kimi Linear consistently outperforms baseline models like MLA and GDN-H in general knowledge, reasoning, and Chinese tasks [44][49]. Group 3: Technical Innovations - The KDA mechanism introduces fine-grained control over memory decay and position awareness, enhancing the model's expressiveness and efficiency [20][24]. - The architecture employs a block-wise recursive and intra-block parallel strategy to maximize matrix multiplication throughput, leveraging Tensor Cores effectively [26][59]. - The NoPE (No Position Encoding) design in Kimi Linear allows for efficient long-context training by delegating position information responsibilities to KDA layers [34][39]. Group 4: Experimental Results - Kimi Linear achieved the highest average scores in long-context benchmarks, demonstrating its effectiveness in handling extensive sequences [52][53]. - In reinforcement learning scenarios, Kimi Linear showed faster and better performance improvements compared to MLA, particularly in mathematical reasoning tasks [56][57]. - The model's efficiency remains high, with negligible latency overhead compared to GDN-H during pre-filling, while showing significant speed advantages as sequence lengths increase [59][60].