DeepSeek V3.1
Search documents
Kimi杨植麟称“训练成本很难量化”,仍将坚持开源策略
第一财经· 2025-11-11 12:04
Core Viewpoint - Kimi, an AI startup, is focusing on open-source model development, with the recent release of Kimi K2 Thinking, which has a training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 and OpenAI's GPT-3 [3][4][6] Summary by Sections Model Development and Costs - Kimi has invested heavily in open-source model research and updates over the past six months, releasing Kimi K2 Thinking on November 6, with a reported training cost of $4.6 million, lower than DeepSeek V3's $5.6 million and OpenAI GPT-3's billions [3][4] - CEO Yang Zhilin clarified that the $4.6 million figure is not official, as most expenses are on research and experimentation, making it difficult to quantify training costs [4][6] Model Performance and Challenges - Users raised concerns about the reasoning length of Kimi K2 Thinking and discrepancies between leaderboard scores and actual performance. Yang stated that the model currently prioritizes absolute performance, with plans to improve token efficiency in the future [4][7] - The gap between leaderboard performance and real-world experience is expected to diminish as the model's general capabilities improve [7] Market Position and Strategy - Chinese open-source models are increasingly being utilized in the international market, with five Chinese models appearing in the top twenty of the OpenRouter model usage rankings [7] - Kimi currently can only be accessed via API due to interface issues with the OpenRouter platform [7] - Kimi plans to maintain its open-source strategy, focusing on the application and optimization of Kimi K2 Thinking while balancing text and multimodal model development, avoiding direct competition with leading firms like OpenAI [6][8]
Kimi杨植麟称“训练成本很难量化”,仍将坚持开源策略
Di Yi Cai Jing· 2025-11-11 10:35
Core Insights - Kimi, an AI startup, has released its latest open-source model, Kimi K2 Thinking, with a reported training cost of $4.6 million, significantly lower than competitors like DeepSeek V3 at $5.6 million and OpenAI's GPT-3, which costs billions to train [1][2] - The company emphasizes ongoing model updates and improvements, focusing on absolute performance while addressing user concerns regarding inference length and performance discrepancies [1] - Kimi's strategy includes maintaining an open-source approach and advancing the Kimi K2 Thinking model while avoiding direct competition with major players like OpenAI through innovative architecture and cost control [2][4] Model Performance and Market Position - In the latest OpenRouter model usage rankings, five Chinese open-source models, including Kimi's, are among the top twenty, indicating a growing presence in the international market [2] - Kimi's current model can only be accessed via API due to platform limitations, but the team is utilizing H800 GPUs with InfiniBand technology for training, despite having fewer resources compared to U.S. high-end GPUs [2] - The company plans to balance text model development with multi-modal model advancements, aiming to establish a differentiated advantage in the AI landscape [4]
2026年投资峰会速递:AI产业新范式
HTSC· 2025-11-10 12:07
证券研究报告 科技 2026 年投资峰会速递—— AI 产业新范式 华泰研究 2025 年 11 月 10 日│中国内地 动态点评 11 月 5-6 日,我们组织了 2026 年度投资峰会,我们看到:1)Scaling Law 2.0 时代下,合成数据打开训练数据天花板,Mid Training 范式重塑模型演 进路径;2)AI 应用层面商业化步入规模化阶段,Agent 能力与交易闭环融 合带动产业落地加速。 核心亮点 1、模型:算力扩张仍是核心增长引擎。我们观察到,在 Scaling Law 持续 演进的背景下,模型能力的提升正从"数据—算力—算法"三要素的线性关 系,转向以"数据生成与利用效率"为核心的非线性增长阶段。合成数据持 续扩大训练资源池,推动训练 tokens 规模显著上行;与此同时,算力仍是 大模型演进的底层约束与增长引擎,2010-2024 年代表性模型训练算力年均 增长 4-5 倍,头部模型更达到 9 倍水平。我们判断,算力扩张仍将构成行业 演进的核心节奏,前沿模型单次完整训练成本或在 2027 年达到十亿美元量 级。 2、训练:Mid Training 范式进一步拓展训练边界。Mid T ...
华尔街之狼,与AI共舞
3 6 Ke· 2025-10-28 08:05
Core Insights - The article discusses an AI trading competition in the cryptocurrency market, highlighting the performance of various AI models and their strategies in a volatile environment [1][5][20]. Group 1: Competition Overview - The AI trading competition, organized by Alpha Arena, runs from October 17 to November 3, featuring real-time trading of cryptocurrencies without human intervention [1][5]. - A benchmark participant employs a simple buy-and-hold strategy for Bitcoin (BTC) to compare the performance of AI models [2]. - The competition includes a betting aspect where spectators can wager on which AI will win, adding a layer of engagement [3]. Group 2: Participating AI Models - Six leading AI models are involved: GPT-5, Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, DeepSeek V3.1, and Qwen3 Max, each starting with $10,000 in real funds [5]. - All trades are executed on the Hyperliquid platform, ensuring transparency and security [5]. Group 3: Performance Analysis - As of October 23, Chinese models Qwen3 Max and DeepSeek V3.1 lead the competition, achieving significant profits, while Western models like GPT-5 and Gemini 2.5 Pro face substantial losses [8][10]. - Qwen3 Max adopted an aggressive strategy, leveraging high positions during market surges, resulting in a 13%-47% increase in account value [10]. - DeepSeek V3.1 maintained a steady approach, achieving 8%-21% net gains by adhering to strict risk management and diversified trading [11][12]. Group 4: Challenges Faced by Western Models - GPT-5 suffered from emotional trading and poor stop-loss management, leading to losses of 30%-40% within days, and up to 65%-75% by the end of the week [14]. - Gemini 2.5 Pro's overtrading and excessive leverage resulted in a loss exceeding 55% in the first week, highlighting the risks of high-frequency trading [14]. Group 5: Insights on Trading Strategies - Grok-4 initially gained 35% but later returned to a net loss of approximately 15% due to failure to lock in profits [15]. - Claude Sonnet 4.5, while cautious and conservative, ended with a negative return of about 17%, demonstrating the trade-off between risk and reward [19]. Group 6: Broader Implications - The competition serves as a deep experiment into the capabilities of AI in real market conditions, emphasizing that intelligence in trading is not solely about algorithmic prowess but also about adaptability in unpredictable environments [20].
现在,最会赚钱的AI是Qwen3,全球六大模型厮杀,Top 2来自中国
3 6 Ke· 2025-10-23 12:49
Core Insights - Qwen3 Max has emerged as the leading model in the AI trading competition, surpassing DeepSeek and achieving significant profitability [1][32] - The competition, Alpha Arena, showcases the capabilities of various AI models in real market conditions, emphasizing the financial market as a training ground for AI [30][32] Performance Summary - Qwen3 Max achieved a return of +44.38%, with an account value of $14,438 and total profit of $4,438 [11] - DeepSeek V3.1 follows with a return of +20.92%, account value of $12,092, and total profit of $2,092 [11] - Other models, such as Claude 4.5 Sonnet, Grok 4, Gemini 2.5 Pro, and GPT-5, reported negative returns, with GPT-5 showing the largest loss at -71.48% [10][11] Competition Dynamics - The competition began on October 18 and has seen Qwen3 Max steadily improve its position, particularly after a significant drop in all models on October 22 [22][24] - Qwen3 Max's strategy has been characterized as "quick and precise," allowing it to capitalize on market opportunities effectively [8][32] - The competition has highlighted the contrasting performance of models, with Qwen3 Max and DeepSeek being the only two models consistently performing well [22][24] Market Implications - The success of Qwen3 Max indicates the growing competitiveness of Chinese AI models in the global market, particularly in high-risk financial environments [33] - The Alpha Arena competition serves as a demonstration of how AI can adapt and thrive in real-world financial scenarios, reinforcing the notion that financial markets are ideal for AI training [30][32]
DeepSeek outperforms AI rivals in 'real money, real market' crypto showdown
Yahoo Finance· 2025-10-21 09:30
Core Insights - A new cryptocurrency trading experiment called Alpha Arena has been launched, where leading AI models are evaluated for their investing abilities, with DeepSeek currently outperforming its competitors [1][2] - The experiment involves six large language models (LLMs) investing in cryptocurrency perpetual contracts on the decentralized exchange Hyperliquid, each starting with US$10,000 [1][2] Performance Summary - As of Tuesday, DeepSeek's V3.1 has achieved a profit of 10.11%, while OpenAI's GPT-5 has recorded losses of 39.73%, making it the worst performer [2] - Other participating models include Alibaba Cloud's Qwen 3 Max, Anthropic's Claude 4.5 Sonnet, Google DeepMind's Gemini 2.5 Pro, and xAI's Grok 4, with Grok also being a top performer [2][6] Experiment Objectives and Methodology - The primary goal of Alpha Arena is to create benchmarks that reflect real-world market dynamics, which are inherently unpredictable and adversarial [3] - The models aim to maximize risk-adjusted returns, executing trades autonomously based on shared prompts and input data, with results tracked on a public leaderboard [4] Market Engagement - DeepSeek is currently leading in prediction markets, with a 41% likelihood of topping the benchmark, and betting volume has reached US$29,707 [7] - The public can monitor trades through each model's Hyperliquid wallet address, and the reasoning behind trades is also displayed, showcasing the models' decision-making processes [4]
赚钱,DeepSeek果然第一,全球六大顶级AI实盘厮杀,人手1万刀开局
3 6 Ke· 2025-10-21 01:35
Core Insights - The Alpha Arena experiment initiated by nof1.ai pits six leading AI models against each other in a real trading environment, with DeepSeek V3.1 currently leading in profitability [1][2][46]. - Each model started with an initial capital of $10,000 and received identical market data and trading instructions [4][46]. Performance Summary - **Top Performers**: - DeepSeek V3.1 achieved a profit of $3,677, representing a return of +36.77% [6]. - Grok 4 followed with a profit of $3,168, or +31.68% [6]. - **Underperformers**: - Gemini 2.5 Pro recorded the largest loss of $3,213, with a return of -32.13% [6]. - GPT-5 also performed poorly, with a loss of $2,509, equating to -25.09% [6]. Trading Dynamics - The trading strategies employed by the models varied, with DeepSeek and Grok showing similar trends, initially incurring losses before recovering and achieving significant gains [32][40]. - Gemini 2.5 Pro and GPT-5 exhibited contrasting behavior, initially gaining before experiencing substantial declines [36][37]. Market Environment - The experiment highlights the rapid changes in financial markets, emphasizing the need for AI models to adapt quickly to market fluctuations [7][45]. - The Alpha Arena serves as a new benchmark for AI performance, moving beyond traditional static assessments to a dynamic trading environment [44][48]. Model Strategies - Each model's strategy involved real-time decision-making based on market indicators and account status, with varying degrees of success [9][48]. - The models' performance is assessed not only on profitability but also on their ability to navigate uncertainty and market volatility [48].
HLE“人类最后考试”首次突破60分,Eigen-1基于DeepSeek V3.1显著领先Grok4、GPT-5
3 6 Ke· 2025-09-28 12:05
Core Insights - Eigen-1 multi-agent system has achieved a historic breakthrough with Pass@1 accuracy of 48.3% and Pass@5 accuracy of 61.74% on the HLE Bio/Chem Gold test set, surpassing competitors like Google Gemini 2.5 Pro and OpenAI GPT-5 [1][6][27] - The success is attributed to three innovative mechanisms: Monitor-based RAG, Hierarchical Solution Refinement (HSR), and Quality-Aware Iterative Reasoning (QAIR) [2][5][12] Technical Innovations - **Monitor-based RAG**: This mechanism eliminates the "tool tax" associated with traditional retrieval-augmented generation systems by continuously monitoring reasoning flow and seamlessly integrating retrieved knowledge, resulting in a 53.5% reduction in token consumption and a 43.7% decrease in workflow iterations [8][10] - **Hierarchical Solution Refinement (HSR)**: HSR introduces a hierarchical collaboration model that allows stronger solutions to absorb valuable insights from weaker ones, enhancing the overall quality of the output [12][15] - **Quality-Aware Iterative Reasoning (QAIR)**: This mechanism adapts the depth of iterations based on the quality of answers, ensuring efficient resource utilization by focusing on low-quality candidates for further exploration [15][18] Performance Metrics - Eigen-1's performance metrics demonstrate its superiority across various benchmarks, achieving Pass@1 of 48.3% and Pass@5 of 61.74% on HLE Bio/Chem Gold, and significantly higher scores on SuperGPQA Hard and TRQA [17] - The model's accuracy improved from 25.3% to 48.3% through the integration of various components, showcasing the effectiveness of the innovative mechanisms [20][21] Insights on Error Patterns - Analysis reveals that 92.78% of errors stem from reasoning process issues, indicating that the core challenge lies in integrating knowledge with reasoning rather than mere knowledge retrieval [18] Implications for AI in Science - The breakthrough signifies a new paradigm for AI-assisted scientific research, suggesting that AI can effectively understand and reason through complex human knowledge, thus accelerating the research process [27]
HLE“人类最后考试”首次突破60分!Eigen-1基于DeepSeek V3.1显著领先Grok4、GPT-5
量子位· 2025-09-28 11:54
Core Insights - The article highlights a significant breakthrough in AI capabilities with the Eigen-1 multi-agent system achieving a Pass@1 accuracy of 48.3% and Pass@5 accuracy of 61.74% on the HLE Bio/Chem Gold test set, surpassing major competitors like Google Gemini 2.5 Pro and OpenAI GPT-5 [1][5][39]. Technical Innovations - The success of Eigen-1 is attributed to three innovative mechanisms: Monitor-based RAG, Hierarchical Solution Refinement (HSR), and Quality-Aware Iterative Reasoning (QAIR) [3][15][20]. - Monitor-based RAG reduces the "tool tax" associated with traditional retrieval-augmented generation systems, leading to a 53.5% reduction in token consumption and a 43.7% decrease in workflow iterations while maintaining higher accuracy [11][12][37]. - HSR introduces a hierarchical collaboration model that allows stronger solutions to absorb valuable insights from weaker ones, enhancing the overall problem-solving process [15][18]. - QAIR optimizes the iterative reasoning process by adjusting the depth of exploration based on the quality of answers, ensuring efficient resource utilization [20][21]. Performance Metrics - Eigen-1's performance metrics indicate a significant lead over competitors, with Pass@1 and Pass@5 scores of 48.3% and 61.74% respectively in HLE Bio/Chem Gold, and also strong performances in SuperGPQA Hard and TRQA tasks [27][22]. - The article provides a comparative table showcasing the performance of various models, highlighting Eigen-1's superior results [22]. Insights on Error Patterns - Analysis reveals that 92.78% of errors stem from reasoning process issues, indicating that the core challenge lies in seamlessly integrating knowledge with reasoning rather than mere knowledge retrieval [24][25]. - The article notes that execution and understanding errors are relatively low, suggesting that models have matured in instruction comprehension [26]. Component Contribution Analysis - The team conducted ablation studies to quantify the contributions of each component, demonstrating that the baseline system achieved only 25.3% accuracy without external knowledge, while the full system reached 48.3% accuracy with efficient token usage [29][31]. Implications for AI in Science - The breakthrough signifies a new paradigm for AI-assisted scientific research, suggesting that AI can become a powerful ally for scientists in tackling complex problems [39][40]. - The research team plans to continue optimizing the architecture and exploring applications in other scientific fields, indicating a commitment to advancing AI capabilities in research workflows [42].
2025人工智能产业十大关键词
机器人圈· 2025-09-26 09:29
Core Insights - The 2025 Artificial Intelligence Industry Conference highlighted ten key trends in AI, emphasizing the convergence of technology, applications, and ecosystems, leading to a clearer vision of a smart-native world [1]. Group 1: Foundation Super Models - In 2025, foundational models and reasoning models are advancing simultaneously, with a comprehensive capability increase of over 30% from late 2024 to August 2025 [3][4]. - Key features of leading large models include the integration of thinking and non-thinking modes, enhanced understanding and reasoning abilities, and built-in agent capabilities for real-world applications [4][6]. - The emergence of foundational super models simplifies user interaction, enhances workflow precision, and raises new data supply requirements [6]. Group 2: Autonomous Intelligent Agents - Highly encapsulated intelligent agent products are unlocking the potential of large models, showing better performance in complex tasks compared to single models [9][10]. - Current intelligent agents still have significant room for improvement, particularly in long-duration task execution and interconnectivity [12]. Group 3: Embodied Intelligence - Embodied intelligence is transitioning from laboratory settings to real-world applications, with models being deployed in practical scenarios [15][16]. - Challenges remain in data quality, model generalization, and soft-hard coordination for effective task execution [18]. Group 4: World Models - World models are emerging as a core pathway to general artificial intelligence (AGI), focusing on capabilities like data generation, action interpretation, environment interaction, and scene reconstruction [21][22]. - The development of world models faces challenges such as unclear definitions, diverse technical routes, and limited application scope [22]. Group 5: AI Reshaping Software - AI is transforming the software development lifecycle, with significant increases in token usage for programming tasks and the introduction of advanced AI tools [25][28]. - The role of software developers is evolving into more complex roles, leading to the emergence of "super individuals" [28]. Group 6: Open Intelligent Computing Ecosystem - The intelligent computing landscape is shifting towards an open-source model, fostering collaboration and innovation across various sectors [30][32]. - The synergy between software and hardware is improving, with domestic hardware achieving performance parity with leading systems [30]. Group 7: High-Quality Industry Data Sets - The focus of AI data set construction is shifting from general-purpose to high-quality industry-specific data sets, addressing critical quality issues [35][38]. - New data supply chains are needed to support advanced technologies like reinforcement learning and world models [38]. Group 8: Open Source as Standard - Open-source initiatives are reshaping the AI landscape, with significant adoption of domestic open-source models and a growing number of active developers [40][42]. - The business model is evolving towards "open-source free + high-level service charges," promoting cloud services and chip demand [42]. Group 9: Mitigating Model Hallucinations - The issue of hallucinations in large models is becoming a significant barrier to application, with ongoing research into mitigation strategies [44][46]. - Various approaches are being explored to enhance data quality, model training, and user-side testing to reduce hallucination rates [46]. Group 10: AI as an International Public Good - Global AI development is uneven, necessitating international cooperation to promote equitable access to AI technologies [49][51]. - Strategies are being implemented to address challenges in cross-border compliance and data flow, aiming to make AI a truly shared international public good [51].