开源模型
Search documents
杨植麟回复:Kimi K2训练用的H800!但“只花了460万美元”嘛…
量子位· 2025-11-11 11:11
Core Insights - The Kimi K2 Thinking model reportedly cost only $4.6 million to train, which is lower than the $5.6 million for DeepSeek V3, raising questions about the valuation of closed-source giants in Silicon Valley [13][14]. - The Kimi K2 model is causing a migration trend in Silicon Valley as it offers superior performance at a lower cost compared to existing models [5][6]. - The Kimi K2 model utilizes innovative engineering techniques, including a self-developed MuonClip optimizer, which allows for stable gradient training without human intervention [18]. Training Cost and Performance - The training cost of Kimi K2 is claimed to be $4.6 million, significantly lower than other models, prompting reflection within the industry [13][14]. - Investors and companies are migrating to Kimi K2 due to its strong performance and cost-effectiveness, with reports of it being five times faster and 50% more accurate than closed-source models [8][6]. Technical Innovations - Kimi K2 has optimized its architecture by increasing the number of experts in the MoE layer from 256 to 384 while reducing the number of active parameters during inference from approximately 37 billion to 32 billion [16]. - The model employs Quantization-Aware Training (QAT) to achieve native INT4 precision inference, which enhances speed and reduces resource consumption by about 2 times [21]. Community Engagement and Future Developments - The team behind Kimi K2 engaged with the developer community through a three-hour AMA session, discussing future architectures and the potential for a next-generation K3 model [22][24]. - The team revealed that the unique writing style of Kimi K2 results from a combination of pre-training and post-training processes, and they are exploring longer context windows for future models [26][27].
K2 Thinking再炸场,杨植麟凌晨回答了21个问题
3 6 Ke· 2025-11-11 10:30
Core Insights - The K2 Thinking model, developed by Kimi, has gained significant attention following its release, showcasing advancements in AI model architecture and performance [1][2][8] - The model features a sparse mixture of experts (MoE) architecture with 1 trillion parameters, making it one of the largest open-source models available [7][8] - K2 Thinking has demonstrated superior performance in various benchmark tests, outperforming competitors like GPT-5 in specific tasks [8][9] Group 1: Model Features and Performance - K2 Thinking is designed to enhance task execution capabilities, focusing on agentic abilities rather than just conversational skills [12][18] - The model's training cost has been a topic of discussion, with the co-founder clarifying that the reported $4.6 million is not an official figure and is difficult to quantify due to the research and experimental components involved [18][24] - K2 Thinking's output cost is significantly lower than that of GPT-5, priced at $2.5 per million tokens, which is one-fourth of GPT-5's cost [8] Group 2: Community Engagement and Feedback - The Kimi team engaged with the developer community through an AMA session on Reddit, receiving numerous questions and positive feedback regarding the model's capabilities and open-source approach [2][10] - Developers expressed a desire for smaller versions of K2 Thinking to be deployed in PC environments or enterprise settings, indicating strong interest in practical applications [2][10] - The community's enthusiasm reflects a growing trend in the domestic AI model landscape, with multiple companies releasing competitive models in a short timeframe [9][18] Group 3: Technical Innovations and Future Directions - K2 Thinking incorporates innovative techniques such as INT4 quantization and a focus on long reasoning chains, allowing it to perform complex tasks with multiple tool calls [12][14][35] - The Kimi team is exploring advancements in other modalities, such as visual understanding, although timelines for these developments may be extended [17] - Future iterations, including K3, are expected to incorporate significant architectural changes and new features, with a focus on enhancing model capabilities [40][43]
罕见,月之暗面杨植麟、周昕宇、吴育昕回应一切:打假460万美元、调侃OpenAI
3 6 Ke· 2025-11-11 04:25
Core Insights - The core discussion revolves around the Kimi K2 Thinking model, its training costs, performance metrics, and the company's future plans for model development and open-source strategies [1][3][13] Group 1: Kimi K2 Thinking Model - The training cost of the Kimi K2 Thinking model is rumored to be $4.6 million, but the CEO clarified that this figure is not official and that training costs are difficult to quantify due to significant research and experimental expenses [1] - The current priority for the Kimi K2 Thinking model is absolute performance rather than token efficiency, with plans to improve token usage in future iterations [3][4] - The model has shown high scores in benchmark tests like HLE, but there are concerns about the gap between its performance in tests and real-world applications [4] Group 2: Open Source and Safety - The company embraces open-source strategies, believing that open safety alignment technology can help researchers maintain safety while fine-tuning models [2][8] - The CEO emphasized the importance of establishing mechanisms to ensure that subsequent work adheres to safety protocols [2] Group 3: Future Developments - The company is exploring a visual-language version of the K2 model and has plans for the K3 model, although no specific release date has been provided [1][2] - There are discussions about expanding the context window of the Kimi K2 Thinking model, with current support for 256K tokens and potential future increases [11] Group 4: Community Engagement - The recent AMA session on Reddit highlighted the global interest in the Kimi series, reflecting a growing recognition of China's AI innovation capabilities [13] - The company is actively responding to community feedback and questions, indicating a commitment to transparency and user engagement [13]
AI产业跟踪:MiniMax-M2发布,登顶开源模型,持续关注大模型商业化落地进展
Changjiang Securities· 2025-11-09 14:32
Investment Rating - The report maintains a "Positive" investment rating for the software and services industry [8]. Core Insights - On October 27, Xiyu Technology officially open-sourced and launched MiniMax M2, a model with a total parameter count of 230 billion, specifically designed for agent and code applications. The complete weights of M2 are fully open-sourced under the MIT license and are available globally for a limited time free of charge. The MiniMax Agent has also launched a domestic version and upgraded its overseas version [2][5]. - The launch of M2 opens new possibilities for open-source models in intelligent execution and enterprise applications, with the potential for accelerated commercialization of large models. The report emphasizes the importance of cost reduction effects of the models and continues to favor the domestic AI industry chain, recommending shovel stocks and major players with significant positioning advantages [2][10]. Summary by Sections Event Description - The report details the launch of MiniMax M2, which features a MoE architecture and is tailored for agent and code applications. The model's complete weights are open-sourced and available for free globally for a limited time. Additionally, the MiniMax Agent has launched a domestic version and upgraded its overseas version [5]. Event Commentary - MiniMax M2 has demonstrated exceptional performance in various benchmarks, including a SWE-bench Verified score of 69.4, placing it among the top models for real programming tasks. The model also achieved a score of 61 in the Artificial Analysis test, ranking fifth overall and first among open-source models. In terms of tool usage, it scored 77.2 in the τ²-Bench test, leading among domestic models [10]. - The model's architecture focuses on executable agent tasks, ensuring that every reasoning step has complete context visibility. The interleaved thinking format allows the model to plan and verify operations across multiple dialogues, which is crucial for agent reasoning [10]. - M2's pricing is competitive, with input costs around $0.3 per MToken and output costs approximately $1.20 per MToken, significantly lower than competitors. The model also offers a TPS (tokens per second) output of around 100, which is rapidly improving [10]. - The market response to M2 has been enthusiastic, with it ranking first on OpenRouter and HuggingFace trend charts. The model has surpassed 50 billion daily token consumption, indicating strong market interest and potential for commercial application [10].
专访龚克:AI时代对人的科学素养和价值判断力提出更高要求
Nan Fang Du Shi Bao· 2025-11-09 04:42
Core Viewpoint - The rapid proliferation of artificial intelligence (AI) applications necessitates higher levels of scientific literacy, questioning ability, and value judgment among individuals [1][4]. Group 1: AI Development and Trends - AI agents have become a significant focus for technology companies, seen as a new entry point for future traffic and services [3]. - The concept of "intelligent agents" has gained popularity due to the accelerated iteration of large models and the emergence of various functional models, serving as an interface between humans and AI [3][4]. - Despite initial excitement around AI agents, many have faced criticism for being "unusable" and "unreliable," often only capable of performing standardized tasks in specific scenarios [3][4]. Group 2: Human-AI Interaction - The effectiveness of AI tools depends on individuals' ability to communicate clearly and set boundaries for tasks and questions directed at AI [4][5]. - The ability to ask the right questions is emphasized as being more critical than solving problems in the era of large models, highlighting the importance of scientific and ethical literacy [5][6]. Group 3: Future Directions in AI - The evolution of AI is expected to transition from single-modal to multi-modal capabilities, expanding from text to images, audio, video, and code [6]. - The rise of embodied intelligence, which involves interaction with physical entities, is identified as a key trend in AI development [6]. - Open-source models are anticipated to play a crucial role in the future of large model development, promoting faster iteration and greater transparency [6]. - The necessity for green transformation in AI is highlighted, focusing on the sustainable use of resources and the integration of renewable energy in AI applications [6][7].
Kimi K2 Thinking突袭,智能体&推理能力超GPT-5,网友:再次缩小开源闭源差距
3 6 Ke· 2025-11-07 03:07
Core Insights - Kimi K2 Thinking has been released and is now open-source, featuring a "model as agent" approach that allows for 200-300 consecutive tool calls without human intervention [1][3] - The model significantly narrows the gap between open-source and closed-source models, becoming a hot topic upon its launch [3][4] Technical Details - Kimi K2 Thinking has 1TB of parameters, with 32 billion activated parameters, and utilizes INT4 precision instead of FP8 [5][26] - It features a context window of 256K tokens, enhancing its reasoning and agent capabilities [5][8] - The model demonstrates improved performance in various benchmarks, achieving a state-of-the-art (SOTA) score of 44.9% in the Human Last Exam (HLE) [9][10] Performance Metrics - Kimi K2 Thinking outperformed closed-source models like GPT-5 and Claude Sonnet 4.5 in multiple benchmarks, including HLE and BrowseComp [10][18] - In the BrowseComp benchmark, where human average intelligence scored 29.2%, Kimi K2 Thinking achieved a score of 60.2%, showcasing its advanced search and browsing capabilities [18][20] - The model's agent programming capabilities have also improved, achieving a SOTA score of 93% in the ²-Bench Telecom benchmark [15] Enhanced Capabilities - The model exhibits enhanced creative writing abilities, producing clear and engaging narratives while maintaining stylistic coherence [25] - In academic and research contexts, Kimi K2 Thinking shows significant improvements in analytical depth and logical structure [25] - The model's responses to personal and emotional queries are more empathetic and nuanced, providing actionable insights [25] Quantization and Performance - Kimi K2 Thinking employs native INT4 quantization, which enhances compatibility with various hardware and improves inference speed by approximately 2 times [26][27] - The model's design allows for dynamic cycles of "thinking → searching → browsing → thinking → programming," enabling it to tackle complex, open-ended problems effectively [20] Practical Applications - The model has demonstrated its ability to solve complex problems, such as a doctoral-level math problem, through a series of reasoning and tool calls [13] - In programming tasks, Kimi K2 Thinking quickly engages in coding challenges, showcasing its practical utility in software development [36]
中国AI的性价比 已成全球杀器
Feng Huang Wang· 2025-11-05 00:32
Core Insights - Airbnb's CEO Brian Chesky publicly stated that the company relies heavily on Alibaba's Qwen model due to its speed, efficiency, and cost-effectiveness, indicating a shift in preference towards Chinese AI models over established players like OpenAI [1][2] - The trend of adopting Chinese AI models is gaining momentum globally, driven by their open-source nature and competitive pricing [3][4] Group 1: Adoption of Chinese AI Models - Notable figures in Silicon Valley, such as Chamath Palihapitiya, have shifted their core business workloads from American AI models to China's Kimi K2 model, citing superior performance and significantly lower costs [4] - Research indicates that a substantial percentage of AI startups in Silicon Valley are now utilizing Chinese open-source models, a stark contrast to three years ago when OpenAI dominated the market [5] - Companies like HSBC and Saudi Aramco are testing or deploying Chinese models like DeepSeek, showcasing a broader trend of international firms moving towards these alternatives [5] Group 2: Competitive Landscape and Challenges - Major tech companies like Microsoft and Amazon are facing significant challenges related to computing power, leading to large-scale layoffs as they attempt to balance costs with the need for robust AI capabilities [7][8] - The high cost of advanced AI chips, primarily dominated by Nvidia, exacerbates the computing power anxiety among these companies, impacting their operational strategies [8][9] Group 3: Technological Innovations - Chinese AI companies are not solely competing on price; they are also making significant technological advancements, such as DeepSeek's new multi-modal model and Kimi's innovative linear attention architecture [10] - Nvidia's CEO highlighted the competitive nature of AI development, emphasizing the importance of maintaining an open ecosystem and the need for skilled professionals in the AI sector [10][11] Group 4: Market Dynamics - The rapid rise of Chinese AI models is reshaping the global AI landscape, moving towards a more diverse competitive environment that challenges the previous dominance of a few major players [9][12] - OpenAI is reportedly preparing for an IPO, which could become one of the largest financing events in history, reflecting the intense demand for computing resources in the AI sector [9]
硅谷大佬带头弃用 OpenAI、“倒戈”Kimi K2,直呼“太便宜了”,白宫首位 AI 主管也劝不住
3 6 Ke· 2025-11-04 10:50
Core Insights - Silicon Valley is shifting from expensive closed-source models to cheaper open-source alternatives, driven by cost considerations and performance improvements [1][2][5] - The Kimi K2 model, developed by a Chinese startup, has gained traction due to its superior performance and lower costs compared to models from OpenAI and Anthropic [1][5] - The emergence of open-source models like DeepSeek is putting pressure on the U.S. AI industry, as these models offer significant cost savings [3][8] Cost Considerations - Chamath Palihapitiya highlighted that the decision to switch to open-source models is primarily based on cost, as existing systems like Anthropic's are too expensive [2][5] - The DeepSeek 3.2 EXP model can reduce API costs by up to 50%, charging $0.28 per million inputs and $0.42 per million outputs, compared to Anthropic's Claude model, which costs around $3.15 [3][8] Model Performance and Transition Challenges - Transitioning to new models requires significant time for fine-tuning and engineering adjustments, complicating the switch despite the lower costs of alternatives like DeepSeek [2][6] - The Kimi K2 model has been adopted by major users, indicating a trend towards prioritizing performance and cost efficiency in AI model selection [1][5] Open-Source vs. Closed-Source Dynamics - The discussion emphasizes a growing divide where high-performance closed-source models are predominantly American, while high-performance open-source models are primarily Chinese [10][12] - The U.S. is facing challenges in the open-source model space, with significant investments in closed-source models, while China is leading in open-source developments [8][10] Security and Operational Concerns - Concerns about the security of using Chinese models in the U.S. are addressed, with assurances that running these models on local infrastructure mitigates risks of data leakage [12][16] - The competitive landscape is fostering a culture of scrutiny, where companies are actively testing models for vulnerabilities, contributing to a responsible development environment [16]
硅谷今夜学中文,Cursor被曝「套壳」国产,AI顶级人才全是华人
3 6 Ke· 2025-11-03 03:36
Core Insights - The article highlights a significant shift in the AI landscape, where Chinese language and models are gaining prominence in Silicon Valley, contrasting with the traditional English-dominated environment [1][11][57] - Chinese talent is increasingly recognized as top-tier in AI, with many prominent figures in major companies like Meta and OpenAI being of Chinese descent [24][30][37] Group 1: Chinese Influence in AI - In recent AI conferences, a notable presence of Chinese professionals has been observed, indicating their growing influence in the field [3][11] - Major AI companies, including Meta, have a substantial number of Chinese researchers, with many holding key positions [26][30][37] Group 2: Adoption of Chinese Open-Source Models - Companies are increasingly opting for Chinese open-source models due to their performance, cost-effectiveness, and large-scale capabilities [11][47][49] - Chamath Palihapitiya's team has migrated workloads to Kimi K2, citing its superior performance and lower cost compared to OpenAI and Anthropic [11][13] Group 3: Performance of Chinese Models - Chinese open-source models are ranked highly in various AI capability indices, often outperforming their closed-source counterparts [15][21][57] - Models like GLM-4.6 and Qwen have been recognized for their exceptional performance in coding and AI applications [47][49] Group 4: Challenges for Foreign Companies - Companies like Cursor face challenges in developing their own models, leading them to rely on Chinese open-source models for training and performance enhancement [4][51] - The rapid evolution of AI models means that companies must adapt quickly to remain competitive, often turning to established Chinese models for efficiency [14][57] Group 5: Broader Implications - The shift towards Chinese models signifies a potential redefinition of global AI infrastructure, with open-source models providing significant advantages in performance and cost [57] - The article suggests that this trend may lead to a more balanced representation of talent and technology in the AI sector, moving away from a solely Western-centric view [58][64]
最新外国「自研」大模型,都是套壳国产?
3 6 Ke· 2025-11-01 05:02
Core Insights - The article discusses the emergence of Chinese open-source AI models as significant players in the global AI landscape, particularly in light of recent developments from American tech companies [4][21][26] Group 1: New Developments in AI Models - Cursor has released a major update, introducing its own code model, Composer, which utilizes reinforcement learning and is capable of processing code efficiently [4][7] - The Composer model reportedly generates code four times faster than similar models, indicating a significant advancement in performance [7] - Speculation arises regarding the underlying technology of these models, with suggestions that they may be based on Chinese AI models, particularly the GLM series [9][11][16] Group 2: Industry Reactions and Analysis - Industry experts suggest that many new models, including Cursor's Composer, are fine-tuned versions of existing Chinese models rather than entirely new creations, highlighting the high costs associated with developing foundational models from scratch [17][18] - The success of open-source models is emphasized, with Nvidia's CEO noting their role in accelerating AI applications and the need for developers to leverage these resources [21][23] - The article points out that the leading open-source models in the HuggingFace community predominantly originate from Chinese companies, showcasing their growing influence [23][26] Group 3: Implications for Global AI Competition - The advancements in Chinese open-source models are reshaping the competitive landscape of AI, with a shift in positions between leaders and followers in the technology race [26] - The article concludes that the capabilities of Chinese models are now sufficient to support the development of Western products, indicating a new era of multipolar competition in AI [20][26]