Workflow
DeepSeek V3
icon
Search documents
谁在为美国买单?
Guan Cha Zhe Wang· 2025-11-18 01:04
Group 1: Core Insights - The U.S. is leveraging its retirement funds to fill a projected $1.5 trillion financing gap in AI investments, as tech giants' cash flows can only cover half of the expected $3 trillion global data center capital expenditure by 2028 [1][3] - The U.S. private capital market dominates AI investments, with $109.1 billion in private AI investment in 2024, nearly 12 times China's $9.3 billion [3][4] - The U.S. government and major tech companies are also investing heavily in AI, with a combined planned investment of $36.4 billion in 2025, contributing to GDP growth [4][5] Group 2: Investment Landscape - Venture capital and private equity are significant sources of funding, with over 50% of global VC funds directed towards AI, and the U.S. accounting for more than 75% of this [3][4] - The bond market is a primary financing tool for tech giants, with over $2 trillion in investment-grade corporate bonds issued in the first ten months of 2025, and insurance companies being key buyers [4][5] - The U.S. has seen a surge in AI-related stock performance, contributing to 75% of the S&P 500's returns since the launch of ChatGPT in 2022 [5][6] Group 3: Competitive Advantages - The U.S. has a unique financial and innovation ecosystem that supports AI investment, including a robust VC network and top-tier universities [5][6] - The U.S. controls 74% of the global high-end AI computing capacity, significantly outpacing China and the EU [11][12] - Early investments in computing and software have positioned the U.S. as a leader in AI innovation, with a tenfold increase in annual investments from 1995 to 2021 [9][11] Group 4: Challenges and Risks - The rapid increase in AI investments has led to signs of a bubble, with a high dependency on optimistic investor expectations [6][7] - Regulatory compliance costs are rising, with fragmented state-level AI regulations increasing operational costs for companies [7][8] - The potential for a financial crisis exists if the AI investment bubble bursts, given the concentration of market value among a few tech giants [6][8] Group 5: China's Position and Strategy - China is significantly behind in private AI investment, with only $39 billion compared to the U.S., but is leveraging a state-led approach to build resilience in AI funding [13][14] - China's strategy focuses on application-oriented AI, cost reduction through local chip production, and global outreach to developing countries [13][14] - The competitive edge for China lies in its ability to innovate at lower costs, as demonstrated by companies like DeepSeek, which offers AI solutions at a fraction of the cost of U.S. counterparts [14]
梁文锋代表DeepSeek,他代表梁文锋
量子位· 2025-11-15 02:08
Core Viewpoint - The article discusses the emergence of "Hangzhou Six Little Dragons" at the World Internet Conference in Wuzhen, highlighting the presence of key figures in AI and technology, particularly focusing on DeepSeek and its representative, Chen Deli, who expressed both optimism and concerns about the future impact of AI on society [1][3][41]. Group 1: DeepSeek and Its Representation - DeepSeek's founder Liang Wenfeng did not attend the conference; instead, researcher Chen Deli represented the company, marking a significant public appearance for DeepSeek [3][6][41]. - Chen Deli, who joined DeepSeek in 2023, has been involved in critical research areas such as language models and alignment mechanisms, contributing to several important publications [18][22][20]. - The article notes that Chen Deli's presence at the conference has made him the second public representative of DeepSeek after Liang Wenfeng, emphasizing his role as a spokesperson for the company's views on AI [41][42]. Group 2: AI Perspectives - Chen Deli expressed a mixed outlook on AI, stating that while there is a "honeymoon period" between humans and AI over the next three to five years, there are significant long-term concerns about AI potentially replacing most jobs in society [8][9]. - He highlighted that the current AI revolution differs fundamentally from previous industrial revolutions, as AI is beginning to possess its own "intelligence," which could surpass human capabilities in certain areas [10][11]. - The potential for AI to disrupt existing social order and economic structures is a major concern, with Chen suggesting that technology companies may need to act as "guardians" to mitigate negative impacts [12][13]. Group 3: Value Alignment in AI - During his presentation, Chen Deli introduced the concept of "value alignment decoupling," proposing that core values should be unified while allowing users to customize diverse values, ensuring safety and adaptability to societal diversity [25][24]. - This approach aims to address the rigidity of traditional large models, which often embed fixed values that do not reflect the complexity of human society [24][25]. - The idea of "harmony in diversity" encapsulates this new perspective on AI value alignment, suggesting a more flexible and user-centric approach to AI development [26][25].
Kimi杨植麟称“训练成本很难量化”,仍将坚持开源策略
第一财经· 2025-11-11 12:04
Core Viewpoint - Kimi, an AI startup, is focusing on open-source model development, with the recent release of Kimi K2 Thinking, which has a training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 and OpenAI's GPT-3 [3][4][6] Summary by Sections Model Development and Costs - Kimi has invested heavily in open-source model research and updates over the past six months, releasing Kimi K2 Thinking on November 6, with a reported training cost of $4.6 million, lower than DeepSeek V3's $5.6 million and OpenAI GPT-3's billions [3][4] - CEO Yang Zhilin clarified that the $4.6 million figure is not official, as most expenses are on research and experimentation, making it difficult to quantify training costs [4][6] Model Performance and Challenges - Users raised concerns about the reasoning length of Kimi K2 Thinking and discrepancies between leaderboard scores and actual performance. Yang stated that the model currently prioritizes absolute performance, with plans to improve token efficiency in the future [4][7] - The gap between leaderboard performance and real-world experience is expected to diminish as the model's general capabilities improve [7] Market Position and Strategy - Chinese open-source models are increasingly being utilized in the international market, with five Chinese models appearing in the top twenty of the OpenRouter model usage rankings [7] - Kimi currently can only be accessed via API due to interface issues with the OpenRouter platform [7] - Kimi plans to maintain its open-source strategy, focusing on the application and optimization of Kimi K2 Thinking while balancing text and multimodal model development, avoiding direct competition with leading firms like OpenAI [6][8]
杨植麟回复:Kimi K2训练用的H800!但“只花了460万美元”嘛…
量子位· 2025-11-11 11:11
Core Insights - The Kimi K2 Thinking model reportedly cost only $4.6 million to train, which is lower than the $5.6 million for DeepSeek V3, raising questions about the valuation of closed-source giants in Silicon Valley [13][14]. - The Kimi K2 model is causing a migration trend in Silicon Valley as it offers superior performance at a lower cost compared to existing models [5][6]. - The Kimi K2 model utilizes innovative engineering techniques, including a self-developed MuonClip optimizer, which allows for stable gradient training without human intervention [18]. Training Cost and Performance - The training cost of Kimi K2 is claimed to be $4.6 million, significantly lower than other models, prompting reflection within the industry [13][14]. - Investors and companies are migrating to Kimi K2 due to its strong performance and cost-effectiveness, with reports of it being five times faster and 50% more accurate than closed-source models [8][6]. Technical Innovations - Kimi K2 has optimized its architecture by increasing the number of experts in the MoE layer from 256 to 384 while reducing the number of active parameters during inference from approximately 37 billion to 32 billion [16]. - The model employs Quantization-Aware Training (QAT) to achieve native INT4 precision inference, which enhances speed and reduces resource consumption by about 2 times [21]. Community Engagement and Future Developments - The team behind Kimi K2 engaged with the developer community through a three-hour AMA session, discussing future architectures and the potential for a next-generation K3 model [22][24]. - The team revealed that the unique writing style of Kimi K2 results from a combination of pre-training and post-training processes, and they are exploring longer context windows for future models [26][27].
Kimi杨植麟称“训练成本很难量化” 仍将坚持开源策略
Di Yi Cai Jing· 2025-11-11 10:45
Core Insights - Kimi, an AI startup, has released its latest open-source model, Kimi K2 Thinking, with a reported training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 at $5.6 million and OpenAI's GPT-3, which costs billions to train [2][3] - The company emphasizes ongoing model updates and improvements, focusing on absolute performance while addressing user concerns regarding inference length and performance discrepancies [2][3] - Kimi's models are gaining traction in the international market, with five Chinese open-source models listed among the top twenty on the OpenRouter platform [3][5] Company Strategy - Kimi plans to maintain its open-source strategy and prioritize the application and optimization of the Kimi K2 Thinking model, while also developing multimodal models [5] - The company aims to differentiate itself from leading competitors like OpenAI by focusing on architectural innovation, open-source strategies, and cost control, avoiding direct competition in specific AI browser markets [5] Technical Aspects - Kimi utilizes H800 GPUs with InfiniBand technology for high-performance computing and AI training, despite having fewer and less powerful chips compared to U.S. counterparts [3] - The training cost and resource allocation for Kimi K2 Thinking are primarily directed towards research and experimentation, making precise cost quantification challenging [2]
Kimi杨植麟称“训练成本很难量化”,仍将坚持开源策略
Di Yi Cai Jing· 2025-11-11 10:35
Core Insights - Kimi, an AI startup, has released its latest open-source model, Kimi K2 Thinking, with a reported training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 at $5.6 million and OpenAI's GPT-3, which costs billions to train [1][2] - The company emphasizes ongoing model updates and improvements, focusing on absolute performance while addressing user concerns regarding inference length and performance discrepancies [1] - Kimi's strategy includes maintaining an open-source approach and advancing the Kimi K2 Thinking model while avoiding direct competition with major players like OpenAI through innovative architecture and cost control [2][4] Model Performance and Market Position - In the latest OpenRouter model usage rankings, five Chinese open-source models, including Kimi's, are among the top twenty, indicating a growing presence in the international market [2] - Kimi's current model can only be accessed via API due to platform limitations, but the team is utilizing H800 GPUs with InfiniBand technology for training, despite having fewer resources compared to U.S. high-end GPUs [2] - The company plans to balance text model development with multi-modal model advancements, aiming to establish a differentiated advantage in the AI landscape [4]
听说,大家都在梭后训练?最佳指南来了
机器之心· 2025-10-09 02:24
Core Insights - The article emphasizes the shift in focus from pre-training to post-training in large language models (LLMs), highlighting the diminishing returns of scaling laws as model sizes reach hundreds of billions of parameters [2][3][11]. Group 1: Importance of Post-Training - Post-training is recognized as a crucial phase for enhancing the reasoning capabilities of models like OpenAI's series, DeepSeek R1, and Google Gemini, marking it as a necessary step towards advanced intelligence [3][11]. - The article introduces various innovative post-training methods such as Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), and Reinforcement Learning with Verifiable Rewards (RLVR) [2][3][12]. Group 2: Transition from Pre-Training to Post-Training - The evolution from pre-training to instruction fine-tuning is discussed, where foundational models are trained on large datasets to predict the next token, but often lack practical utility in real-world applications [7][8]. - Post-training aims to align model behavior with user expectations, focusing on quality over quantity in the datasets used, which are typically smaller but more refined compared to pre-training datasets [11][24]. Group 3: Supervised Fine-Tuning (SFT) - Supervised Fine-Tuning (SFT) is described as a process that transforms a pre-trained model into one that can follow user instructions effectively, relying on high-quality instruction-answer pairs [21][24]. - The quality of the SFT dataset is critical, as even a small number of low-quality samples can negatively impact the model's performance [25][26]. Group 4: Reinforcement Learning Techniques - Reinforcement Learning (RL) is highlighted as a complex yet effective method for model fine-tuning, with various reward mechanisms such as RLHF, RLAIF, and RLVR being employed to enhance model performance [39][41]. - The article outlines the importance of reward models in RLHF, which are trained using human preference data to guide model outputs [44][46]. Group 5: Evaluation of Post-Training Models - The evaluation of post-training models is multifaceted, requiring a combination of automated and human assessments to capture various quality aspects [57][58]. - Automated evaluations are cost-effective and quick, while human evaluations provide a more subjective quality measure, especially for nuanced tasks [59][60].
DeepSeek V3.2要来了?
Guan Cha Zhe Wang· 2025-09-29 09:58
Core Insights - The appearance of DeepSeek-V3.2 on the Hugging Face platform has sparked speculation among users [1] - DeepSeek has a history of releasing new versions and updates around significant holidays [2] - The most recent update prior to the speculation was DeepSeek-V3.1-Terminus, released on September 22, with an open-source announcement [3] Version Release History - DeepSeek V3 was released on December 27, 2024, just before New Year's [3] - DeepSeek-R1-0528 was launched on May 28, 2025, as a special gift for the Dragon Boat Festival [3] - The latest version, DeepSeek-V3.1-Terminus, was made available on September 22, 2023, along with an open-source model [3] Current Status - The Hugging Face interface related to DeepSeek is currently showing errors, and there has been no official response from DeepSeek regarding the situation [4]
谁说Scaling Law到头了?新研究:每一步的微小提升会带来指数级增长
3 6 Ke· 2025-09-16 07:46
Core Insights - The Scaling Law is being questioned due to perceived diminishing returns in model training, but recent research suggests that small improvements in accuracy can lead to exponential growth in task completion length, which may hold more economic value in real-world applications [1][2][4] Group 1: Research Findings - A recent paper from Cambridge University indicates that while there are diminishing returns in metrics like test loss, the real-world value of large language models (LLMs) often comes from their ability to complete longer tasks [2][4] - The paper highlights that the long-term execution of tasks has been a significant weakness in deep learning, with LLMs struggling to perform complex, lengthy tasks despite improvements in reasoning capabilities [4][6] - The authors propose that the failures in long tasks are primarily due to execution challenges rather than reasoning or planning limitations, emphasizing the need for more focus on execution capabilities in LLM research [6][20] Group 2: Experimental Insights - The study measures LLMs' long-horizon execution capabilities by isolating execution from planning and knowledge retrieval, revealing that larger models can significantly increase the number of successful execution rounds [6][23][25] - The concept of self-conditioning is introduced, where the model's performance deteriorates as it builds on its previous errors, leading to a decline in accuracy over multiple rounds [8][26][30] - The research shows that while increasing model size improves task execution, it does not alleviate the self-conditioning effect, which remains a challenge for LLMs in long-term tasks [27][30] Group 3: Implications for Investment - The findings suggest that the economic value of LLMs may not be accurately reflected in short-task benchmarks, as the ability to complete longer tasks is a more reliable indicator of their potential [18][20] - The paper encourages further investment in scaling models, as the ability to perform longer tasks could justify continued financial commitment despite short-term performance metrics suggesting stagnation [10][18] - The research calls for the design of new benchmarks that better assess the execution depth of models, highlighting a potential area for future investment and development in the AI sector [10][18]
谁说Scaling Law到头了?新研究:每一步的微小提升会带来指数级增长
机器之心· 2025-09-16 04:01
Core Viewpoint - The article discusses the ongoing debate regarding the diminishing returns of scaling models in AI, particularly in the context of large language models (LLMs). It presents a new perspective that, despite slower improvements in single-step accuracy, these incremental gains can lead to exponential growth in task completion length, which may hold greater economic value in real-world applications [1][3]. Group 1: Scaling Law and Economic Value - The scaling law indicates that while there may be diminishing returns in metrics like test loss, the real-world value of LLMs often comes from their ability to complete longer tasks. Larger models can compound small improvements in single-step accuracy, resulting in exponential increases in task length [3][6]. - The paper titled "The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs" argues that the economic value of an AI agent is derived from the length of tasks it can complete, rather than short task benchmarks that may suggest stagnation in progress [5][19]. Group 2: Long-Horizon Execution Challenges - Long-term task execution has historically been a significant weakness for deep learning models. The paper highlights that while LLMs have improved in complex reasoning tasks, they still struggle with executing longer tasks reliably [6][11]. - The authors propose that failures in long-term execution are often misattributed to reasoning or planning deficiencies, when in fact, execution remains a critical and under-researched challenge [7][22]. Group 3: Self-Conditioning Effect - The study identifies a self-conditioning effect where the error rate in long tasks increases with each step, leading to a compounding effect of mistakes. This phenomenon contrasts with human performance, where practice typically leads to improvement [9][30]. - The authors found that larger models do not necessarily mitigate the self-conditioning effect, which can lead to a decline in performance over extended tasks [29][32]. Group 4: Impact of Thinking Models - Recent thinking models have shown the ability to correct for self-conditioning limitations, allowing for significantly longer task execution in single rounds. For instance, the GPT-5 thinking version can execute over 1000 steps, far surpassing competitors [10][36]. - The research emphasizes the importance of reasoning before action, as models that utilize thinking chains can perform better in executing longer tasks compared to those that do not [36][37]. Group 5: Experimental Insights - The experiments conducted reveal that increasing model size significantly enhances the number of rounds a model can successfully execute, demonstrating a clear scaling trend [27][28]. - The findings suggest that while larger models can improve task execution, they still face challenges due to self-conditioning, which remains a critical area for future research [29][37].