Workflow
闭源模型
icon
Search documents
杨植麟回复:Kimi K2训练用的H800!但“只花了460万美元”嘛…
量子位· 2025-11-11 11:11
Core Insights - The Kimi K2 Thinking model reportedly cost only $4.6 million to train, which is lower than the $5.6 million for DeepSeek V3, raising questions about the valuation of closed-source giants in Silicon Valley [13][14]. - The Kimi K2 model is causing a migration trend in Silicon Valley as it offers superior performance at a lower cost compared to existing models [5][6]. - The Kimi K2 model utilizes innovative engineering techniques, including a self-developed MuonClip optimizer, which allows for stable gradient training without human intervention [18]. Training Cost and Performance - The training cost of Kimi K2 is claimed to be $4.6 million, significantly lower than other models, prompting reflection within the industry [13][14]. - Investors and companies are migrating to Kimi K2 due to its strong performance and cost-effectiveness, with reports of it being five times faster and 50% more accurate than closed-source models [8][6]. Technical Innovations - Kimi K2 has optimized its architecture by increasing the number of experts in the MoE layer from 256 to 384 while reducing the number of active parameters during inference from approximately 37 billion to 32 billion [16]. - The model employs Quantization-Aware Training (QAT) to achieve native INT4 precision inference, which enhances speed and reduces resource consumption by about 2 times [21]. Community Engagement and Future Developments - The team behind Kimi K2 engaged with the developer community through a three-hour AMA session, discussing future architectures and the potential for a next-generation K3 model [22][24]. - The team revealed that the unique writing style of Kimi K2 results from a combination of pre-training and post-training processes, and they are exploring longer context windows for future models [26][27].
Kimi K2 Thinking突袭,智能体&推理能力超GPT-5,网友:再次缩小开源闭源差距
3 6 Ke· 2025-11-07 03:07
Core Insights - Kimi K2 Thinking has been released and is now open-source, featuring a "model as agent" approach that allows for 200-300 consecutive tool calls without human intervention [1][3] - The model significantly narrows the gap between open-source and closed-source models, becoming a hot topic upon its launch [3][4] Technical Details - Kimi K2 Thinking has 1TB of parameters, with 32 billion activated parameters, and utilizes INT4 precision instead of FP8 [5][26] - It features a context window of 256K tokens, enhancing its reasoning and agent capabilities [5][8] - The model demonstrates improved performance in various benchmarks, achieving a state-of-the-art (SOTA) score of 44.9% in the Human Last Exam (HLE) [9][10] Performance Metrics - Kimi K2 Thinking outperformed closed-source models like GPT-5 and Claude Sonnet 4.5 in multiple benchmarks, including HLE and BrowseComp [10][18] - In the BrowseComp benchmark, where human average intelligence scored 29.2%, Kimi K2 Thinking achieved a score of 60.2%, showcasing its advanced search and browsing capabilities [18][20] - The model's agent programming capabilities have also improved, achieving a SOTA score of 93% in the ²-Bench Telecom benchmark [15] Enhanced Capabilities - The model exhibits enhanced creative writing abilities, producing clear and engaging narratives while maintaining stylistic coherence [25] - In academic and research contexts, Kimi K2 Thinking shows significant improvements in analytical depth and logical structure [25] - The model's responses to personal and emotional queries are more empathetic and nuanced, providing actionable insights [25] Quantization and Performance - Kimi K2 Thinking employs native INT4 quantization, which enhances compatibility with various hardware and improves inference speed by approximately 2 times [26][27] - The model's design allows for dynamic cycles of "thinking → searching → browsing → thinking → programming," enabling it to tackle complex, open-ended problems effectively [20] Practical Applications - The model has demonstrated its ability to solve complex problems, such as a doctoral-level math problem, through a series of reasoning and tool calls [13] - In programming tasks, Kimi K2 Thinking quickly engages in coding challenges, showcasing its practical utility in software development [36]
硅谷大佬带头弃用 OpenAI、“倒戈”Kimi K2,直呼“太便宜了”,白宫首位 AI 主管也劝不住
3 6 Ke· 2025-11-04 10:50
Core Insights - Silicon Valley is shifting from expensive closed-source models to cheaper open-source alternatives, driven by cost considerations and performance improvements [1][2][5] - The Kimi K2 model, developed by a Chinese startup, has gained traction due to its superior performance and lower costs compared to models from OpenAI and Anthropic [1][5] - The emergence of open-source models like DeepSeek is putting pressure on the U.S. AI industry, as these models offer significant cost savings [3][8] Cost Considerations - Chamath Palihapitiya highlighted that the decision to switch to open-source models is primarily based on cost, as existing systems like Anthropic's are too expensive [2][5] - The DeepSeek 3.2 EXP model can reduce API costs by up to 50%, charging $0.28 per million inputs and $0.42 per million outputs, compared to Anthropic's Claude model, which costs around $3.15 [3][8] Model Performance and Transition Challenges - Transitioning to new models requires significant time for fine-tuning and engineering adjustments, complicating the switch despite the lower costs of alternatives like DeepSeek [2][6] - The Kimi K2 model has been adopted by major users, indicating a trend towards prioritizing performance and cost efficiency in AI model selection [1][5] Open-Source vs. Closed-Source Dynamics - The discussion emphasizes a growing divide where high-performance closed-source models are predominantly American, while high-performance open-source models are primarily Chinese [10][12] - The U.S. is facing challenges in the open-source model space, with significant investments in closed-source models, while China is leading in open-source developments [8][10] Security and Operational Concerns - Concerns about the security of using Chinese models in the U.S. are addressed, with assurances that running these models on local infrastructure mitigates risks of data leakage [12][16] - The competitive landscape is fostering a culture of scrutiny, where companies are actively testing models for vulnerabilities, contributing to a responsible development environment [16]
硅谷大佬带头弃用OpenAI、“倒戈”Kimi K2,直呼“太便宜了”,白宫首位AI主管也劝不住
3 6 Ke· 2025-10-28 10:39
Core Insights - Silicon Valley is shifting from expensive closed-source models to cheaper open-source alternatives, driven by cost considerations and performance improvements [1][2][14] - The Kimi K2 model, developed by a Chinese startup, has gained traction due to its superior performance and significantly lower costs compared to models from OpenAI and Anthropic [1][5][14] - The introduction of the DeepSeek model, which offers a 50% reduction in API costs, is putting pressure on the U.S. AI industry to adapt [3][8] Cost Considerations - Chamath Palihapitiya highlighted that the decision to switch to open-source models is primarily based on cost, as existing systems like Anthropic's are too expensive [2][5] - The DeepSeek model charges $0.28 per million inputs and $0.42 per million outputs, while Anthropic's Claude model costs approximately $3.15 for similar services, making DeepSeek 10 to 35 times cheaper [3][8] Model Performance and Transition Challenges - Transitioning to new models like DeepSeek requires significant time for adjustments and fine-tuning, complicating the switch despite the cost benefits [2][6] - Companies are facing a dilemma on whether to switch to cheaper models or wait for existing models to catch up in performance [6][10] Open-Source vs. Closed-Source Dynamics - The current landscape shows that high-performance closed-source models are predominantly from the U.S., while high-performance open-source models are emerging from China [10][12] - The open-source movement is seen as a way to counterbalance the power of large tech companies, but the leading open-source models are currently from China [8][10] Security and Ownership Concerns - There are concerns regarding the ownership and potential security risks associated with using Chinese models, but deploying them on U.S. infrastructure mitigates some of these risks [12][16] - The competitive landscape encourages rigorous testing for vulnerabilities, which is seen as a positive development for model safety [16][17] Future Implications - The ongoing shift towards open-source models may lead to significant changes in the AI industry, particularly in terms of cost and energy consumption [5][10] - Companies are exploring solutions to manage rising energy costs associated with AI operations, indicating a need for sustainable practices in the industry [11][12]
硅谷大佬带头弃用 OpenAI、“倒戈”Kimi K2!直呼“太便宜了”,白宫首位 AI 主管也劝不住
AI前线· 2025-10-28 09:02
Core Insights - The article discusses a significant shift in Silicon Valley from expensive closed-source AI models to more affordable open-source alternatives, particularly highlighting the Kimi K2 model developed by a Chinese startup [2][3] - Chamath Palihapitiya, a prominent investor, emphasizes the cost advantages of using the Kimi K2 model over models from OpenAI and Anthropic, which he describes as significantly more expensive [3][5] - The conversation also touches on the competitive landscape of AI, where open-source models from China are putting pressure on the U.S. AI industry [5][10] Cost Considerations - Palihapitiya states that the decision to switch to open-source models is primarily driven by cost considerations, as the existing systems from Anthropic are too expensive [3][5] - The new DeepSeek 3.2 EXP model from China offers a substantial reduction in API costs, with charges of $0.28 per million inputs and $0.42 per million outputs, compared to Anthropic's Claude model, which costs approximately $3.15 per million [5][10] Model Performance and Transition Challenges - The Kimi K2 model boasts a total parameter count of 1 trillion, with 32 billion active parameters, and has been integrated by various applications, indicating its strong performance [2][5] - Transitioning to new models like DeepSeek is complex and time-consuming, often requiring weeks or months for fine-tuning and engineering adjustments [3][7] Open-Source vs. Closed-Source Dynamics - The article highlights a structural shift in the AI landscape, where open-source models from China are gaining traction, while U.S. companies are primarily focused on closed-source models [10][12] - There is a growing concern that the U.S. is lagging in the open-source AI model space, with significant investments from Chinese companies leading to advancements that challenge U.S. dominance [10][12] Security and Ownership Issues - Palihapitiya explains that Groq's approach involves obtaining the source code of models like Kimi K2, deploying them in the U.S., and ensuring that data does not return to China, addressing concerns about data security [15][18] - The discussion raises questions about the potential risks of using Chinese models, including the possibility of backdoors or vulnerabilities, but emphasizes that open-source nature allows for community scrutiny [18][19] Future Implications - The article suggests that the ongoing competition between U.S. and Chinese AI models could lead to significant changes in the industry, particularly in terms of cost and energy consumption [6][12] - There is a recognition that the future of AI will be decentralized, with numerous players in both the U.S. and China contributing to the landscape, making it essential to address national security concerns [19][20]
张亚勤院士:AI五大新趋势,物理智能快速演进,2035年机器人数量或比人多
机器人圈· 2025-10-20 09:16
Core Insights - The rapid development of the AI industry is accelerating iterations across various sectors, presenting significant industrial opportunities [3] - The scale of the AI industry is projected to be at least 100 times larger than the previous generation, indicating substantial growth potential [5] Group 1: Trends in AI Development - The first major trend is the transition from discriminative AI to generative AI, now evolving towards agent-based AI, with task lengths doubling and accuracy exceeding 50% in the past seven months [7] - The second trend indicates a slowdown in the scaling law during the pre-training phase, with more focus shifting to post-training stages like reasoning and agent applications, while reasoning costs have decreased by 10 times [7] - The third trend highlights the rapid advancement of physical and biological intelligence, particularly in the intelligent driving sector, with expectations for 10% of vehicles to have L4 capabilities by 2030 [7] Group 2: AI Risks and Industry Structure - The emergence of agent-based AI has significantly increased AI risks, necessitating greater attention from global enterprises and governments [8] - The fifth trend reveals a new industrial structure characterized by foundational large models, vertical models, and edge models, with expectations for 8-10 foundational large models globally by 2026, including 3-4 from China and the same from the U.S. [8] - The future is anticipated to favor open-source models, with a projected ratio of 4:1 between open-source and closed-source models [8]
当着白宫AI主管的面,硅谷百亿投资人“倒戈”中国模型
Huan Qiu Shi Bao· 2025-10-15 03:24
Core Insights - Prominent investor Chamath Palihapitiya has shifted significant demand from Amazon's Bedrock to the Chinese model Kimi K2 due to its superior performance and lower cost compared to OpenAI and Anthropic [1][3] Group 1: Market Dynamics - The U.S. AI landscape is transitioning from a focus on extreme parameters to a new phase dominated by cost-effectiveness, commercial efficiency, and ecological value [3] - Chinese open-source models like DeepSeek, Kimi, and Qwen are challenging the dominance of U.S. closed-source models [3][4] - Following Anthropic's API service policy changes that restricted access to certain countries, developers are actively seeking high-cost performance alternatives [4] Group 2: Technological Advancements - Kimi K2 recently updated to version K2-0905, achieving over 94% on the Roo Code platform, marking it as the first open-source model to surpass 90% [4] - The 2025 AI Status Report indicates that China has transitioned from a follower to a competitor in the AI space, with significant advancements in open-source AI and commercialization [5] - DeepSeek has surpassed OpenAI's o1-preview in complex reasoning tasks and is successfully applying high-end technology to commercial scenarios [7] Group 3: Competitive Landscape - The report highlights that China now holds two out of three top positions in significant language models, showcasing its advancements in the AI sector [5][7] - The competition is no longer just about larger models but also about cost efficiency and speed in delivering stable services to users [7] - The market is increasingly favoring solutions that offer lower costs and faster service, indicating a shift in developer preferences, including those in Silicon Valley [7]
专家:2035年机器人数量或比人多
Core Insights - The rapid development of the AI industry is accelerating iterations across various sectors, presenting significant industrial opportunities [1] Group 1: Trends in AI Industry - The first major trend is the transition from discriminative AI to generative AI, now evolving towards agent-based AI, with task length doubling and accuracy exceeding 50% in the past seven months [3] - The second trend indicates a slowdown in the scaling law during the pre-training phase, shifting focus to post-training stages like inference and agent applications, with inference costs decreasing by 10 times while computational complexity for agents has increased by 10 times [3] - The third trend highlights the rapid development of physical and biological intelligence, particularly in the smart driving sector, predicting that by 2030, 10% of vehicles will possess Level 4 autonomous driving capabilities [3] Group 2: Future Projections and Risks - The fourth trend points to a significant rise in AI risks, with the emergence of agents increasing risks at least twofold, necessitating greater attention from global enterprises and governments [4] - The fifth trend reveals a new industrial landscape for AI, characterized by a combination of foundational large models, vertical models, and edge models, with expectations that by 2026, there will be approximately 8-10 foundational large models globally, including 3-4 from China and 3-4 from the U.S. [4] - The future is expected to favor open-source models, with a projected ratio of 4:1 between open-source and closed-source models [4]
为 OpenAI 秘密提供模型测试, OpenRouter 给 LLMs 做了套“网关系统”
海外独角兽· 2025-09-23 07:52
Core Insights - The article discusses the differentiation of large model companies in Silicon Valley, highlighting OpenRouter as a key player in model routing, which has seen significant growth in token usage [2][3][6]. Group 1: OpenRouter Overview - OpenRouter was established in early 2023, providing a unified API Key for users to access various models, including mainstream and open-source models [6]. - The platform's token usage surged from 405 billion tokens at the beginning of the year to 4.9 trillion tokens by September, marking an increase of over 12 times [2][6]. - OpenRouter addresses three major pain points in API calls: lack of a unified market and interface, API instability, and balancing cost with performance [7][9]. Group 2: Model Usage Insights - OpenRouter's model usage reports have sparked widespread discussion in the developer and investor communities, becoming essential reading [3][10]. - The platform provides insights into user data across different models, helping users understand model popularity and performance [10]. Group 3: Founder Insights - Alex Atallah, the founder of OpenRouter, believes that the large model market is not a winner-takes-all scenario, emphasizing the need for developers to control model routing based on their requests [3][18]. - Atallah draws parallels between OpenRouter and his previous venture, OpenSea, highlighting the importance of integrating disparate resources into a cohesive platform [19][20]. Group 4: OpenRouter Functionality - OpenRouter functions as a model aggregator and marketplace, allowing users to manage over 470 models through a single interface [31]. - The platform employs intelligent load balancing to route requests to the most suitable providers, enhancing reliability and performance [37]. - OpenRouter aims to empower developers by providing a unified view of model access, allowing them to choose the best models based on their specific needs [34][35]. Group 5: Future Directions - OpenRouter is exploring the potential of personalized models based on user prompts while ensuring user data remains private unless opted in for recording [52][55]. - The platform aims to become the best reasoning layer for agents, providing developers with the tools to create intelligent agents without being locked into specific suppliers [58][60].
朱啸虎:搬离中国,假装不是中国AI创业公司,是没有用的
Hu Xiu· 2025-09-20 14:15
Group 1 - The discussion highlights the impact of DeepSeek and Manus on the AI industry, emphasizing the importance of open-source models in China and their potential to rival closed-source models in the US [3][4][5] - The conversation indicates that the open-source model trend is gaining momentum, with Chinese models already surpassing US models in download numbers on platforms like Hugging Face [4][5] - The competitive landscape is shifting towards "China's open-source vs. America's closed-source," with the establishment of an open-source ecosystem being beneficial for China's long-term AI development [6][7] Group 2 - Manus is presented as a case study for Go-to-Market strategies, illustrating that while Chinese entrepreneurs have strong product capabilities, they often lack effective market entry strategies [10][11] - Speed is identified as a critical barrier for AI application companies, with the need to achieve rapid growth to outpace competitors [11][12] - Token consumption is discussed as a significant cost indicator, with Chinese companies focusing on this metric due to lower willingness to pay among domestic users [12][13][14] Group 3 - The AI coding sector is characterized as a game dominated by large companies, with high token costs making it challenging for startups to compete effectively [15][16] - The conversation suggests that AI coding is not a viable area for startups due to the lack of customer loyalty among programmers and the high costs associated with token consumption [16][18] - Investment in vertical applications rather than general-purpose agents is preferred, as the latter may be developed by model manufacturers themselves [20] Group 4 - The discussion on robotics emphasizes investment in practical, value-creating robots rather than aesthetically pleasing ones, with examples of successful projects like a boat-cleaning robot [21][22] - The importance of combining functionality with sales capabilities in robotic applications is highlighted, as this can lead to a more favorable ROI [22][23] Group 5 - The conversation stresses the need for AI hardware companies to focus on simplicity and mass production rather than complex features, as successful hardware must be deliverable at scale [28][29] - The potential for new hardware innovations in the AI era is questioned, with a belief that significant breakthroughs may still be years away [30][31] Group 6 - The dialogue addresses the challenges of globalization for Chinese companies, noting that successful market entry in the US requires a deep understanding of local dynamics and compliance [36][37] - The importance of having a local sales team for B2B applications in the US is emphasized, as relationships play a crucial role in sales success [38][39] Group 7 - The conversation highlights the risks associated with high valuations, which can limit a company's flexibility and increase pressure for performance [42][43] - The discussion suggests that IPOs for Chinese companies may increasingly occur in Hong Kong rather than the US, as liquidity issues persist in the market [46][48] Group 8 - The need for startups to operate outside the influence of large companies is emphasized, with a call for rapid growth and innovation in the AI sector [49][53] - The potential for AI startups to achieve significant scale quickly is acknowledged, but the conversation warns that the speed of evolution in the AI space may outpace traditional exit strategies [52][53]