Workflow
Seek .(SKLTY)
icon
Search documents
DeepSeek等开源模型,更“浪费”token吗?
Hu Xiu· 2025-10-10 00:09
Core Insights - The article discusses the efficiency and cost implications of open-source models like DeepSeek-R1 compared to closed-source models, particularly focusing on token consumption and its impact on overall reasoning costs [2][19]. Token Consumption and Efficiency - A study by NousResearch found that open-source models, specifically DeepSeek-R1-0528, consume four times more tokens than the baseline for simple knowledge questions, indicating significant inefficiency in straightforward tasks [2]. - For more complex tasks such as math problems and logic puzzles, the token consumption of DeepSeek-R1-0528 is reduced to about twice the baseline, suggesting that the type of question posed significantly affects token usage [3][6]. AI Productivity Index - An independent study by AI recruitment unicorn Mercor noted that models like Qwen-3-235B and DeepSeek-R1 have longer output lengths compared to other leading models, which can enhance average performance at the cost of increased token consumption [5]. Economic Value of Tokens - The economic value of tokens is determined by the model's ability to solve real-world problems and the monetary value of those problems, emphasizing the importance of creating economic value in practical scenarios [10]. - The unit cost of tokens is crucial for the economic viability of models, with companies like NVIDIA and OpenAI exploring custom AI chips to reduce inference costs [10]. Hardware and Software Optimization - Microsoft’s research highlighted that actual energy consumption during AI queries can be 8-20 times lower than estimated, due to hardware improvements and workload optimizations [11]. - Techniques such as KV cache management and intelligent routing to appropriate models are being explored to enhance token generation efficiency and reduce consumption [11]. Token Economics in Different Regions - There is a divergence in token economics between China and the U.S., with Chinese open-source models focusing on achieving high value with more tokens, while U.S. closed-source models aim to reduce token consumption and enhance token value [15][16]. Environmental Impact - A study indicated that DeepSeek-R1 has the highest carbon emissions among leading models, attributed to its reliance on deep thinking and less efficient hardware configurations [18]. Overall Cost Advantage - Despite the higher token consumption, open-source models like DeepSeek still maintain a cost advantage overall, but this advantage diminishes at higher API pricing levels, especially for simple queries [19]. Conclusion on AI Economics - The pursuit of performance is overshadowed by the need for economic efficiency, with the goal being to solve valuable problems using the least number of tokens possible [20].
蚂蚁、OpenAI、DeepSeek卷疯了!国产最强万亿参数旗舰模型Ling-1T开源
Tai Mei Ti A P P· 2025-10-09 04:14
Core Insights - Ant Group has released and open-sourced its trillion-parameter general language model, Ling-1T, marking a significant advancement in AI technology [2][3] Model Performance - Ling-1T is the flagship model of the Ant Group's Ling 2.0 series, achieving state-of-the-art (SOTA) performance in various complex reasoning benchmarks, including code generation and mathematical reasoning [3][10] - In the AIME 25 competition math benchmark, Ling-1T achieved a 70.42% accuracy rate with an average consumption of over 4000 tokens, outperforming Google's Gemini-2.5-Pro, which had a 70.10% accuracy rate with over 5000 tokens [3][10] Competitive Landscape - The AI model competition is intensifying, with major players like OpenAI, Alibaba, and DeepSeek launching new models around the same time [4][9] - The AI industry is experiencing a "arms race" for foundational models, as noted by industry leaders [5] Investment Trends - In 2023, global AI startups attracted a record $192.7 billion in venture capital, with the U.S. investing 62.7% of its funds into AI companies [15][16] - OpenAI has become the most valuable startup globally, with a valuation of $500 billion and projected annual revenue of $12 billion [16] Technological Innovations - Ant Group's Ling-1T utilizes a hybrid precision training method (FP8), which significantly reduces memory usage and enhances training efficiency by over 15% [11][12] - The model incorporates a novel policy optimization method (LingPO) for stable training and a new reward mechanism to improve its understanding of visual aesthetics [12][14] Future Developments - Ant Group is also training a deep-thinking model, Ring-1T, which is expected to be released soon [14]
信创ETF(159537)涨近6%,DeepSeek-V3.2-Ex发布,国产云厂商day0适配
Mei Ri Jing Ji Xin Wen· 2025-10-09 03:28
Group 1 - DeepSeek officially released the DeepSeek-V3.2-Exp model on September 29, which is an experimental version aimed at optimizing training and inference efficiency for long texts [1] - The new model introduces DeepSeek Sparse Attention, a sparse attention mechanism, building on the previous V3.1-Terminus version [1] - The development of the new model utilized TileLang, an open-source AI operator programming language developed by a team led by Associate Professor Yang Zhi from Peking University [1] Group 2 - The 信创 ETF (159537) tracks the 国证信创指数 (CN5075), which selects listed companies in the semiconductor, software development, and computer equipment sectors from the Shanghai and Shenzhen markets [2] - The index focuses on reflecting the overall performance of the information technology innovation theme, with a significant emphasis on semiconductor and software development industries [2] - The average market capitalization of the index constituents is large, showcasing a diversified development pattern within the 信创 industry [2]
DeepSeek与国产芯片的“双向奔赴”
Core Viewpoint - The release of DeepSeek-V3.2-Exp model by DeepSeek Company marks a significant advancement in the domestic AI chip ecosystem, introducing a sparse attention mechanism that reduces computational resource consumption and enhances inference efficiency [1][7]. Group 1: Model Release and Features - DeepSeek-V3.2-Exp model incorporates DeepSeek Sparse Attention, leading to a reduction in API prices by 50% to 75% across its official app, web, and mini-programs [1]. - The new model has received immediate recognition and adaptation from several domestic chip manufacturers, including Cambricon, Huawei, and Haiguang, indicating a collaborative ecosystem [2][6]. Group 2: Industry Impact and Ecosystem Development - The rapid adaptation of DeepSeek-V3.2-Exp by various companies suggests a growing consensus within the domestic AI industry regarding the model's significance, positioning DeepSeek as a benchmark for domestic open-source models [2][5]. - The domestic chip industry, primarily operating under a "Fabless" model, is expected to progress quickly as it aligns with standards defined by DeepSeek, which is seen as a key player in shaping the future of the industry [4][5]. Group 3: Comparison with Global Standards - DeepSeek's swift establishment of an ecosystem contrasts with NVIDIA's two-decade-long development of its CUDA platform, highlighting the rapid evolution of the domestic AI landscape [3][8]. - The collaboration among major internet companies like Tencent and Alibaba in adapting to domestic chips further emphasizes the expanding synergy within the AI hardware and software ecosystem [8].
赋能的美妙:DeepSeek开源背后的商业野心和生态架构
Sou Hu Cai Jing· 2025-09-30 18:48
Core Insights - DeepSeek leverages open-source technology to significantly lower the entry barriers in the industrial AI quality inspection market, allowing smaller companies to access advanced AI capabilities at a fraction of the cost [2][3] - The company aims to build a robust ecosystem by attracting developers and partners through free technology, which will later facilitate monetization through various services and collaborations [3][10] Group 1: Business Model - The first step in DeepSeek's monetization strategy involves using free technology to attract partners and build an ecosystem, similar to a shopping mall offering free rent to attract merchants [3][10] - The second step focuses on providing customized enterprise-level services to larger companies, ensuring high reliability and compliance, which allows DeepSeek to charge for these premium services [4][10] - The third step involves collaborating with hardware and cloud service providers, enabling DeepSeek to earn revenue through partnerships without extensive sales efforts [5][6] Group 2: Industry Impact - DeepSeek's open-source approach is changing the competitive landscape of the AI industry, forcing established players to lower their prices and adapt to a more open model [9][10] - The collaboration with domestic chip manufacturers like Huawei enhances the performance of local AI chips, reducing reliance on foreign supply chains and increasing the adoption of domestic solutions [8][10] Group 3: Strategic Insights - The strategy of offering free technology is designed to create a viral adoption effect, leading to a large user base that can later be monetized through high-end services and ecosystem partnerships [11][10] - Building a strong ecosystem is deemed more critical than the technology itself, as a larger user base leads to more tools and resources, solidifying DeepSeek's market position [12][10] - DeepSeek recognizes the potential risks associated with open-source technology and implements strict content review mechanisms and compliance frameworks to mitigate these risks [13][10]
智通港股解盘 | 有色金属迈出六亲不认的步伐 DeepSeek再度催化AI
Zhi Tong Cai Jing· 2025-09-30 12:16
Market Overview - The Hang Seng Index rose by 0.87%, closing at 26,855 points, with a target of reaching 27,000 points [1] - The U.S. government shutdown risk remains unresolved, with the U.S. Department of Commerce announcing that subsidiaries of companies on the "entity list" will automatically be added to a "blacklist" [1] Defense Strategy - The U.S. Defense Secretary convened military leaders to discuss a strategic shift from "deterring China" to focusing on the Western Hemisphere and the U.S. mainland [2] - The new strategy may involve reducing military presence in Europe, indicating a strategic contraction rather than direct military action [2] Investment Trends - The Abu Dhabi sovereign wealth fund, managing $330 billion in assets, signaled increased investment in Asia, particularly China, which is expected to attract around $6 billion from Middle Eastern capital in 2024 [2] - The metals sector remains strong, with companies like China Nonferrous Mining and Jiangxi Copper showing significant gains [3] Technology Sector - The release of DeepSeek's latest model V3.2-Exp has significantly improved training efficiency while reducing API input/output prices by over 50% [4] - Major tech companies are adapting to domestic chips, with stocks like Hua Hong Semiconductor and SenseTime seeing substantial increases [4] Tesla Developments - Tesla announced plans to expand its humanoid robot production, aiming for 1 million units annually by 2030 [5] - Tesla's stock saw a rise, and related companies like Siasun Robot & Automation also experienced gains [5] Gold Mining Sector - Zijin Mining's subsidiary, Zijin Gold International, is set to list on the Hong Kong Stock Exchange, marking a significant IPO in the gold mining sector [6] - The company is projected to see a compound annual growth rate of 21.4% in production and 61.9% in net profit from 2022 to 2024 [6] New Listings Performance - New listings like Botai Carlink and Xipuni performed well, with significant price increases post-IPO [7] Apple iPhone Demand - Morgan Stanley reported strong demand for the iPhone 17 series, with delivery times remaining high compared to last year [8] - The iPhone 17 series is expected to support Apple's Q4 performance despite weaker demand for the Air model [8] Bilibili's Financial Performance - Bilibili reported Q2 revenue of 7.34 billion yuan, a 20% year-on-year increase, with a net profit of 220 million yuan [10] - The platform's user base continues to grow, with a DAU of 107 million and a MAU of 368 million [11] - Bilibili's gaming segment is performing well, with a 68% increase in mobile game revenue [11][12]
DeepSeek 与国产芯片开启“双向奔赴”
Core Insights - DeepSeek company has released the DeepSeek-V3.2-Exp model, introducing a sparse attention mechanism that significantly reduces computational resource consumption and enhances inference efficiency [1] - The new model has led to a price reduction of API services by 50% to 75% [1] - The release has prompted immediate recognition and adaptation from several domestic chip manufacturers, indicating a growing synergy within the domestic AI hardware and software ecosystem [1][2] Group 1: Model Release and Features - The DeepSeek-V3.2-Exp model incorporates the DeepSeek Sparse Attention mechanism, optimizing training and inference efficiency for long texts [5] - The model is compatible with CUDA and utilizes TileLang for rapid prototyping, aiming for higher efficiency through lower-level language implementations [5][6] - The release of V3.2-Exp marks a significant shift from the previous version, V3.1, which did not receive any proactive recognition from companies regarding the "UE8M0 floating-point format" [4][5] Group 2: Industry Response and Ecosystem Development - Within four minutes of the model's release, Cambricon announced its adaptation of the DeepSeek-V3.2-Exp model and open-sourced its large model inference engine [2] - Huawei and Haiguang also quickly followed suit, demonstrating the rapid response from the domestic chip industry to the new model [2] - The consensus within the domestic AI industry regarding the DeepSeek model has empowered the company to take the lead in defining standards for domestic chips [3][4] Group 3: Competitive Landscape - The rapid development of the domestic chip ecosystem is highlighted by the swift adaptation of major players like Tencent and Alibaba, who are actively integrating domestic chips into their cloud computing services [6] - Experts believe that the emergence of DeepSeek has accelerated the pace of domestic chip development, with expectations for significant advancements by 2025 [3]
DeepSeek-V3.2上线国家超算互联网 开发者可免费下载
Sou Hu Cai Jing· 2025-09-30 11:58
Core Insights - DeepSeek has launched the experimental version DeepSeek-V3.2-Exp, which introduces the DeepSeekSparseAttention mechanism to enhance training and inference efficiency for long texts [1] - The AI community now hosts over 700 high-quality open-source models, providing developers with various services including API calls and distributed training [2] Group 1 - DeepSeek-V3.2-Exp is available for free download in the national supercomputing internet AI community, allowing enterprises and developers to quickly develop applications [1] - The new model is a step towards a next-generation architecture, building on the previous version V3.1-Terminus [1] - DeepSeekSparseAttention achieves significant improvements in long text training and inference efficiency with minimal impact on model output [1] Group 2 - The supercomputing internet AI community features a collection of over 700 models, including various versions of the DeepSeek series [2] - Developers can utilize the community for a range of services, including online inference dialogue and model fine-tuning [2] - The community supports a comprehensive MaaS (Model as a Service) offering for developers [2]
DeepSeek,与国产芯片开启“双向奔赴”
Core Insights - DeepSeek company has released the DeepSeek-V3.2-Exp model, introducing a sparse attention mechanism that significantly reduces computational resource consumption and enhances inference efficiency [1][6] - The new model has led to a price reduction of API services by 50% to 75% [1] - The release has prompted immediate recognition and adaptation from several domestic chip manufacturers, including Huawei, Cambricon, and Haiguang, indicating a growing synergy within the domestic AI hardware and software ecosystem [2][4] Summary by Sections Model Release and Features - The DeepSeek-V3.2-Exp model incorporates the DeepSeek Sparse Attention mechanism, optimizing training and inference efficiency for long texts [6] - The model is compatible with CUDA and utilizes TileLang for rapid prototyping, which is designed specifically for AI operator development [6] Industry Response - Cambricon was the first to claim adaptation of the new model, followed by Huawei and Haiguang, showcasing a collaborative effort among domestic manufacturers [2] - The rapid response from these companies indicates a consensus within the domestic AI industry regarding the significance of the DeepSeek model [6] Ecosystem Development - DeepSeek is emerging as a key player in building a new ecosystem for domestic AI, with its model becoming a benchmark for open-source models in China [2][4] - The collaboration among major internet companies like Tencent and Alibaba in adapting domestic chips further accelerates the establishment of this ecosystem [7] Historical Context - The previous version, DeepSeek-V3.1, did not receive any proactive claims from companies regarding its adaptation, highlighting the significant shift in industry dynamics with the latest release [5] - Experts believe that the rapid development of domestic chips by 2025 can be attributed to the emergence of DeepSeek as a standard-setting entity [3]
PPIO首发上线DeepSeek-V3.2-Exp
Zheng Quan Ri Bao Wang· 2025-09-30 06:17
Group 1 - DeepSeek has launched a new experimental model version, DeepSeek-V3.2-Exp, which incorporates the "DeepSeek Sparse Attention" mechanism to enhance training and inference efficiency in long context scenarios [1] - The new architecture of DeepSeek-V3.2-Exp has significantly reduced API pricing, with costs dropping by 75%, making it more affordable for developers to utilize DeepSeek API [1] - PPIO platform offers high-performance API services and features a variety of open-source models, achieving the top rank in throughput tests for DeepSeek-R1-0528 according to the "2025 Large Model Service Performance Ranking" [2] Group 2 - PPIO has successfully achieved over 10 times cost reduction in large model inference through practices in 2024, balancing inference efficiency and resource usage dynamically [2]