Seek .(SKLTY)
Search documents
刚刚,DeepSeek梁文锋入选Nature年度十大人物,被称为“科技颠覆者”
3 6 Ke· 2025-12-09 02:24
Core Insights - Liang Wenfeng, founder of DeepSeek, has been recognized as one of the top ten scientific figures of 2025 by Nature, being labeled a "technology disruptor" for his contributions to AI [1][24] - DeepSeek's R1 model has demonstrated that the perceived gap in AI capabilities between the US and China may not be as significant as previously thought, challenging existing narratives in the AI landscape [5][7] Company Overview - DeepSeek, founded in 2023 by Liang Wenfeng in Hangzhou, has developed a powerful yet affordable AI model, R1, which excels in solving complex tasks by breaking them down into steps [5][13] - The R1 model is the first of its kind to be released with open weights, allowing researchers to download and adapt it for their own applications, significantly impacting the AI research community [7][8] - DeepSeek's commitment to transparency is evident as it was the first mainstream LLM to undergo peer review, with the company publicly sharing the technical details of R1's construction and training [8] Market Impact - The success of DeepSeek has inspired other companies in both China and the US to release their own open-source models, indicating a shift in the competitive landscape of AI development [7] - Despite R1's capabilities being comparable to leading US models, its training costs are significantly lower, with some estimates suggesting that training costs for models like Meta's Llama 3 are over ten times higher [9][15] Leadership and Vision - Liang Wenfeng's background as a former financial analyst who applied AI algorithms to the stock market has shaped his vision for DeepSeek, focusing on achieving general artificial intelligence [17][20] - The company prioritizes individual potential over experience in its hiring practices, fostering a flat organizational structure that empowers researchers to choose their research directions [20] Societal Integration - DeepSeek's models are becoming integral to daily life in China, with local governments utilizing them for chatbots and assisting citizens, reflecting a broader trend of AI integration into economic development [20] - The company is seen as a symbol of China's transformation from a follower to an innovator in the AI field, with expectations for the upcoming R2 model to further this narrative [21][23]
DeepSeek创始人梁文峰入选《自然》杂志2025年最具影响力人物榜单
Xin Hua She· 2025-12-09 00:32
Group 1 - The core focus of the article is the recognition of Chinese AI company DeepSeek's founder Liang Wenfeng and geoscientist Du Mengran in the Nature magazine's annual "Nature 10" list, highlighting significant scientific figures for 2025 [1][2] - Liang Wenfeng's company DeepSeek launched the powerful and cost-effective R1 model in January, which has been noted to challenge the perceived dominance of the US in the AI field [1] - Du Mengran's groundbreaking exploration into the hadal zone, where she and her team discovered the deepest known animal ecosystem on Earth, is also highlighted [1] Group 2 - The "Nature 10" list is compiled by the editors of Nature magazine and is not a ranking or award, but rather a recognition of significant scientific advancements and the individuals involved [2] - The list aims to honor contributions to new fields, breakthroughs in medicine, commitment to scientific integrity, and the formulation of global policies that save lives [2] - The inclusion of these individuals reflects the collective efforts to understand and protect the natural world, which is a key reason for their recognition in this year's list [2]
第二波DeepSeek 冲击:V3.2 改写中国云生态与芯片生态的推理经济学
2025-12-08 15:36
Summary of DeepSeek V3.2 Conference Call Industry Overview - The conference call discusses the **Chinese Internet Industry**, specifically focusing on the **AI market** and the impact of the **DeepSeek V3.2** release on the ecosystem [1][20]. Key Points and Arguments 1. **DeepSeek V3.2 Release**: - The launch of DeepSeek V3.2 marks the beginning of the second wave of "DeepSeek impact" in the domestic AI market, providing near-state-of-the-art open-source inference capabilities at moderate domestic prices [1][20]. - The model API prices have been reduced by **30-70%**, and long-context inference may save **6-10 times** the workload [1][3]. 2. **Technical Enhancements**: - DeepSeek V3.2 retains the mixed expert (MoE) architecture of V3.1 but introduces the DeepSeek Sparse Attention mechanism (DSA), which reduces long-context computation complexity and maintains performance in public benchmarks [2][24]. - The model is designed for "agent" construction, integrating "thinking + tool invocation" in a single trajectory, trained on approximately **1,800 synthetic agent environments** and **85,000 complex instructions** [2][24]. 3. **Economic Impact**: - The DSA mechanism improves inference speed by **2-3 times** and reduces GPU memory usage by **30-40%** when processing **128k tokens** compared to V3.1 [3][24]. - The input/output pricing for V3.2 is set at **$0.28** and **$0.42** per million tokens, respectively, significantly lower than previous models [3][19]. 4. **Beneficiaries in the AI Ecosystem**: - Key beneficiaries identified include **cloud operators** (e.g., Alibaba Cloud, Tencent Cloud, Baidu Smart Cloud) and **domestic chip manufacturers** (e.g., Cambricon, Hygon) [13][14]. - The release is expected to drive demand for domestic chips and AI servers, reducing execution risks for Chinese AI buyers [14][16]. 5. **Competitive Positioning**: - DeepSeek V3.2 is positioned as a price disruptor in the large language model API market, with pricing significantly lower than similar models globally, while maintaining high intelligence levels comparable to **GPT-5** and others [26][27]. - The Chinese models are noted for their attractive value proposition, with higher intelligence scores and lower costs compared to U.S. counterparts [27][29]. Additional Important Content - The report emphasizes the shift towards domestic hardware support, with V3.2 optimized for non-CUDA ecosystems, including Huawei's CANN stack and Ascend hardware [14][24]. - The model's capabilities are expected to enhance the efficiency and economic viability of AI SaaS developers and vertical industry applications, such as coding and legal assistance [16][24]. - The analysis indicates a significant evolution from V3.1 to V3.2, with a **22% increase** in the Artificial Analysis intelligence index and over **50% reduction** in effective token pricing [17][19]. This summary encapsulates the critical insights from the conference call regarding the implications of DeepSeek V3.2 on the Chinese AI landscape and its competitive positioning within the global market.
PriceSeek提醒:铝锭现货价格普遍下跌
Xin Lang Cai Jing· 2025-12-08 12:25
华东市场对外报价21920元/吨,华南市场对外报价21810元/吨,西南市场对外报价21840元/吨,中原市 场对外报价21770元/吨;较上一交易日分别下跌170元/吨、160元/吨、160元/吨、170元/吨。 PriceSeek评析 铝,多空评分:-1 文章显示中国铝业铝锭(AL99.70)现货价格在华东、华南、西南、中原市场分别下跌170元/吨、160元/ 吨、160元/吨、170元/吨,跌幅约0.7-0.8%,表明市场供应充足或需求疲软,对现货价格构成一般利空 影响。 【大宗商品公式定价原理】 生意社基准价是基于价格大数据与生意社价格模型产生的交易指导价,又称生意社价格。可用于确定以 下两种需求的交易结算价: 1、指定日期的结算价 生意社12月08日讯 中国铝业股份有限公司2025年12月8日铝锭(AL99.70)现货价格各地区价格下跌,具体如下: C:升贴水,包括物流成本、品牌价差、区域价差等因素。 生意社12月08日讯 中国铝业股份有限公司2025年12月8日铝锭(AL99.70)现货价格各地区价格下跌,具体如下: 华东市场对外报价21920元/吨,华南市场对外报价21810元/吨,西南市场对外 ...
DeepSeek双模型发布:一位是“话少助手” 一位是“偏科天才”
Ke Ji Ri Bao· 2025-12-08 10:03
Core Insights - DeepSeek has released two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have garnered attention for their performance in comparison to leading models like OpenAI's GPT-5 and Google's Gemini3 Pro [1][2] Model Features - DeepSeek-V3.2 is designed as a high-efficiency assistant with strong reasoning and agent capabilities, aimed at automating complex tasks such as report generation and coding [2] - DeepSeek-V3.2-Speciale focuses on solving high-difficulty mathematical problems and academic research, pushing the limits of open-source model reasoning [2] Technological Innovations - The new models incorporate two significant breakthroughs: Domain-Specific Architecture (DSA) and Thinking Tool Invocation technology [2] - DSA enhances efficiency by allowing the model to retrieve only the most relevant information, reducing resource consumption [2] - Thinking Tool Invocation enables multi-round reasoning and tool usage, allowing the model to think, execute, and iterate on tasks autonomously [2] Market Positioning - The release of these models aims to bridge the performance gap between open-source and closed-source models, providing a competitive edge for open-source development [3][4] - DeepSeek's focus on practicality and generalization is intended to create pressure on closed-source vendors, transforming aspirations into competitive realities [4] Community Engagement - DeepSeek has updated its official web platform, app, and API to the new version, while the Speciale version is currently available only as a temporary API for community evaluation [4]
外媒关注中国发布“全球首款AI手机”:会是第二个“DeepSeek时刻”吗?
Huan Qiu Shi Bao· 2025-12-07 22:51
该产品在市场上引起热烈反响。据报道,这款原型机在中国一经发布便迅速售罄。虽然厂商并未透露总 销量,但其转售价格已在市场飙升约43%。美国科技媒体Wccftech报道称,该产品让人联想到2025年初 DeepSeek引发的轰动,当时全球集体震惊于中国以极低的计算成本提供的顶级推理模型,如今中国科 技公司再次推出全球首款真正具备智能代理功能的AI手机。 《印度快报》报道称,目前全球尚没有其他手机能够达到豆包手机如此高的自主性,虽然商业化进程还 有待观察,但是已清晰地展示了智能手机未来将如何改变我们的生活。同时,这款手机的问世也表明, 首款真正意义上的智能体手机或许并非来自硅谷,而是来自中国融合人工智能和移动技术的生态系统。 尽管这款产品目前只是豆包方面发布的"技术预览版",不过,将语言大模型植入到操作系统层面,也引 发业界关于数据授权、隐私、系统安全等问题的激烈争议。中关村信息消费联盟理事长项立刚告诉《环 球时报》记者,"将大模型与操作系统进行深入融合确实存在很大的争议,其商业推广也阻力重重。但 是如果要让AI Agent更加强大,必须深入到手机硬件和操作系统的底层,才能充分释放AI的能力。"项 立刚认为,"这肯 ...
开源和闭源模型的差距在拉大:这是DeepSeek论文揭示的残酷真相
3 6 Ke· 2025-12-06 00:03
Core Insights - DeepSeek's V3.2 technical report indicates that the performance gap between open-source models and closed-source models is not narrowing but rather widening, based on extensive empirical data [1][2]. Performance Comparison - In benchmark tests, DeepSeek V3.2 scored 85.0 in MMLU-Pro, while GPT-5 scored 87.5 and Gemini 3.0 Pro achieved 90.1. In the GPQA Diamond test, the scores were 82.4 for DeepSeek, 85.7 for GPT-5, and 91.9 for Gemini 3.0 Pro [2][3]. - The most significant gap was observed in the HLE test, where DeepSeek V3.2 scored 25.1, compared to GPT-5's 26.3 and Gemini 3.0 Pro's 37.7, indicating a substantial performance disparity [3][4]. Structural Issues Identified - The report identifies three structural issues limiting the capabilities of open-source models in complex tasks: 1. **Architectural Limitations**: Open-source models rely on traditional vanilla attention mechanisms, which are inefficient for long sequences, hindering scalability and effective post-training [6]. 2. **Resource Investment Gap**: The post-training budget for DeepSeek V3.2 exceeds 10% of its pre-training costs, while most open-source models allocate less than 1%, leading to significant performance differences [7]. 3. **AI Agent Capability Lag**: Open-source models show inferior generalization and instruction-following abilities in real-world applications, as evidenced by lower scores in key agent evaluation benchmarks [8]. DeepSeek's Strategic Innovations - DeepSeek has implemented fundamental technical innovations across three core dimensions: 1. **Architectural Changes**: Introduction of the DSA (DeepSeek Sparse Attention) mechanism, which reduces computational complexity from O(L²) to O(L×k), significantly lowering inference costs while maintaining performance [10]. 2. **Increased Resource Allocation**: DeepSeek has made an unprecedented decision to allocate substantial resources for post-training, training expert models in six key areas with a total of 943.7 billion tokens during the pre-training phase [12]. 3. **Enhanced Agent Capabilities**: Development of a systematic task synthesis process, creating over 1,800 diverse environments and 85,000 complex prompts, which has improved performance in agent-related tests [13]. Conclusion - DeepSeek V3.2 demonstrates a viable path for open-source AI to compete with closed-source models through innovative architecture and strategic resource allocation, suggesting that technological innovation may be the key to survival in the competitive AI landscape [14].
DeepSeek-V3.2巨「吃」Token,竟然是被GRPO背刺了
3 6 Ke· 2025-12-04 10:38
Core Insights - The release of DeepSeek-V3.2 has generated significant attention in the industry, highlighting both its capabilities and areas needing improvement, particularly in token efficiency and output verbosity [1][2][5]. Token Efficiency - DeepSeek-V3.2 Speciale exhibits poor token consumption efficiency, requiring 77,000 tokens for complex tasks compared to Gemini's 20,000 tokens, indicating over three times the token usage for similar quality outputs [1][5]. - Users have noted that if the token generation speed of DeepSeek-V3.2 Speciale could be improved from approximately 30 tokens per second to around 100 tokens per second, the overall usability and experience would significantly enhance [5]. Output Quality - The Speciale version has been criticized for producing lengthy and verbose outputs, often resulting in incorrect answers, which is attributed to inherent flaws in the GRPO algorithm [2][14]. - The technical report from DeepSeek acknowledges the increased token consumption during inference, with the Speciale version consuming 86 million tokens in benchmark tests, up from 62 million in the previous version [7][14]. Algorithmic Issues - The GRPO algorithm, which has been a standard in reinforcement learning, is identified as a source of bias leading to longer and incorrect responses. This includes length bias, where shorter correct responses receive greater updates, and longer incorrect responses face weaker penalties [18][21]. - While the difficulty bias has been optimized in DeepSeek-V3.2, the length bias remains, potentially contributing to the excessive token consumption observed in the Speciale version [18][21].
谷歌掀“美国版DeepSeek冲击”,投资人拆解算力赛道前景|华尔街观察
Di Yi Cai Jing Zi Xun· 2025-12-04 10:09
由于担心谷歌在人工智能(AI)领域取得进展,近期AI头号"卖铲人"英伟达的市值蒸发超千亿美元。 本周,摩根士丹利发布的最新报告预测,到2027年、2028年,谷歌的自有AI专用芯片TPU(张量处理单 元)产量将分别达到约500万片和700万片,较此前预测的300万片和320万片显著上调,这可能为谷歌带 来约130亿美元营收增量及0.40美元的每股收益(EPS)提升。 更早前,谷歌发布了最新的大型语言模型Gemini3,该模型完全由谷歌的TPU训练,而非OpenAI所使用 的英伟达GPU,TPU在训练成本和效率上更占优势。资本市场的兴奋情绪显而易见——谷歌母公司 Alphabet股价冲破320美元,年初至今的涨幅接近70%,市值逼近4万亿美元,市盈率(PE)从年内的14 倍翻倍,逼近28倍。 投资人将其称为"美国版DeepSeek冲击",究竟这一冲击将在未来如何影响AI投资格局?第一财经记者采 访了凯思博投资管理公司(Keywise)创始人、首席投资官郑方。在他看来,谷歌是最接近AGI(通用 人工智能)的企业。就硬件来看,TPU作为专用计算(ASIC),在特定推理场景有优势,但无法取代 GPU的通用计算地位。如 ...
DeepSeek-V3.2被找出bug了:疯狂消耗token,答案还可能出错,研究人员:GRPO老问题没解决
3 6 Ke· 2025-12-04 02:21
Core Insights - DeepSeek-V3.2 has gained significant attention but still exhibits bugs, particularly in token efficiency, which has been a longstanding issue [1][4]. Group 1: Performance Issues - The Speciale version of DeepSeek-V3.2 consumes a higher number of tokens for complex tasks, requiring 77,000 tokens compared to Gemini's 20,000 tokens for the same problem [4]. - The model has a "length bias," where longer incorrect answers are penalized less, leading to the generation of verbose but incorrect responses [8][11]. Group 2: Algorithmic Biases - The GRPO algorithm has two hidden biases: length bias and difficulty bias. The length bias results in longer incorrect answers being favored, while the difficulty bias causes the model to focus excessively on overly simple or overly difficult questions, neglecting those of medium difficulty which are crucial for skill improvement [8][9]. - The core author of the research, Zichen Liu, noted that while the new advantage value calculation has corrected the difficulty bias, the length bias remains unaddressed [10][11]. Group 3: Token Efficiency and Cost - DeepSeek's official report acknowledges that token efficiency is still a challenge for V3.2, as the new models require generating longer trajectories to match the output quality of Gemini-3.0-Pro [14]. - Despite the high token consumption, DeepSeek-V3.2 is priced at only 1/24th of GPT-5, making it relatively acceptable in terms of cost [14].