Workflow
MuonClip优化器
icon
Search documents
杨植麟回复:Kimi K2训练用的H800!但“只花了460万美元”嘛…
量子位· 2025-11-11 11:11
Core Insights - The Kimi K2 Thinking model reportedly cost only $4.6 million to train, which is lower than the $5.6 million for DeepSeek V3, raising questions about the valuation of closed-source giants in Silicon Valley [13][14]. - The Kimi K2 model is causing a migration trend in Silicon Valley as it offers superior performance at a lower cost compared to existing models [5][6]. - The Kimi K2 model utilizes innovative engineering techniques, including a self-developed MuonClip optimizer, which allows for stable gradient training without human intervention [18]. Training Cost and Performance - The training cost of Kimi K2 is claimed to be $4.6 million, significantly lower than other models, prompting reflection within the industry [13][14]. - Investors and companies are migrating to Kimi K2 due to its strong performance and cost-effectiveness, with reports of it being five times faster and 50% more accurate than closed-source models [8][6]. Technical Innovations - Kimi K2 has optimized its architecture by increasing the number of experts in the MoE layer from 256 to 384 while reducing the number of active parameters during inference from approximately 37 billion to 32 billion [16]. - The model employs Quantization-Aware Training (QAT) to achieve native INT4 precision inference, which enhances speed and reduces resource consumption by about 2 times [21]. Community Engagement and Future Developments - The team behind Kimi K2 engaged with the developer community through a three-hour AMA session, discussing future architectures and the potential for a next-generation K3 model [22][24]. - The team revealed that the unique writing style of Kimi K2 results from a combination of pre-training and post-training processes, and they are exploring longer context windows for future models [26][27].
K2开源大模型,会是Kimi的DeepSeek时刻吗?
Hu Xiu· 2025-07-14 03:20
Core Insights - The article discusses the emergence of MoonShot's latest open-source model K2, which has a parameter scale of 1 trillion, making it the largest open-source model currently available [2] - K2's performance in various benchmarks positions it as a strong competitor against established models like Claude 4 Opus and GPT-4.1, highlighting China's growing influence in the global AI landscape [2][4] - The competitive landscape in the AI sector is intensifying, with Chinese companies like MoonShot and MiniMax leading the charge in open-source innovation, challenging Western counterparts [4][6] Company Developments - MoonShot's K2 model has quickly gained popularity, becoming the top trending open-source model on HuggingFace shortly after its release [4] - The model's architecture incorporates fewer attention heads and more experts, enhancing efficiency in processing long contexts, which is a significant improvement over previous models [8][10] - MoonShot has disclosed a total funding amount of approximately $1.5 billion, which is significantly lower than that of its Western competitors, indicating a more efficient operational model [6] Market Impact - K2's compatibility with OpenAI and Anthropic's API formats positions it favorably in the AI application development market, potentially allowing it to capture a significant share of the market [7] - The article notes that the competitive dynamics between MoonShot and DeepSeek have intensified, with both companies releasing multiple models aimed at various AI applications [5][12] - The focus on multi-agent collaboration and the integration of various models into K2 may enhance its commercial viability and market appeal [12]