Seek .(SKLTY)
Search documents
R2还没来,但DeepSeek的秘密武器已经“剧透”了
Hu Xiu· 2025-07-31 07:58
Core Insights - The top conference in the field of natural language processing, ACL, awarded the best paper to a joint work by DeepSeek and Peking University titled "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" [4][3] - This paper introduces a significant advancement in the efficiency of large language models, achieving up to 11 times faster inference while maintaining model performance [5][34] Group 1: Technology and Innovation - The paper presents a novel approach to sparse attention, moving from theoretical reasoning to a complete training process, which is crucial for the future of large models [5][26] - The Native Sparse Attention (NSA) method mimics human reading strategies by compressing long texts, selecting relevant details, and maintaining a sliding window of recent context [26][30] - NSA is designed to be natively trainable, allowing the model to learn efficient attention distribution from the pre-training phase [32][51] Group 2: Performance Metrics - In various benchmark tests, the 27B model utilizing NSA outperformed traditional full attention models in 7 out of 9 metrics, particularly excelling in reasoning tasks [35][37] - The NSA method achieved a 100% information retrieval accuracy in long text comprehension tasks, demonstrating its effectiveness in handling extensive data [38][40] - Training speed improved significantly, with forward computation accelerated by 9 times and backward propagation by 6 times, while inference speed saw an impressive 11.6 times increase [44][45] Group 3: Market Implications - The advancements in NSA technology position DeepSeek as a potential leader in the AI application ecosystem, promising faster, more efficient, and cost-effective solutions for users [55][58] - The ability to process extensive documents and datasets without manual segmentation could revolutionize how users interact with AI, enhancing productivity and accessibility [54][59] - The competitive edge provided by NSA technology is expected to solidify DeepSeek's market position, transforming it from a price-driven player to a technology innovator [58][60]
刚刚,DeepSeek梁文锋NSA论文、北大杨耀东团队摘得ACL 2025最佳论文
3 6 Ke· 2025-07-31 03:40
Core Insights - The ACL conference, a leading event in computational linguistics and natural language processing (NLP), is set to take place in Vienna, Austria, from July 27 to August 1, 2025, marking its 63rd edition [1] - This year's conference saw a record number of submissions, exceeding 8,000 papers compared to 4,407 last year, with acceptance rates of 20.3% for main conference papers and 16.7% for findings [3] - Over half of the first authors of the submitted papers are from China (51.3%), a significant increase from 30.6% last year, while the second-largest group comes from the United States (14.0%) [3] Awards and Recognitions - A total of 4 best papers, 2 best social impact papers, 3 best resource papers, 3 best thematic papers, 26 outstanding papers, 2 best TACL papers, 1 best demo paper, and 47 SAC highlights were awarded this year [5] - The best paper awards were shared between teams from DeepSeek and Peking University, and other notable institutions including CISPA Helmholtz Center for Information Security, TCS Research, Microsoft, Stanford University, and Cornell Tech [8] Notable Papers - The paper "A Theory of Response Sampling in LLMs" explores the heuristic methods guiding sampling in large language models (LLMs) and highlights ethical concerns regarding decision-making biases [11] - "Fairness through Difference Awareness" introduces a framework for measuring group discrimination in LLMs, emphasizing the importance of group difference awareness in various contexts [13] - "Language Models Resist Alignment" reveals that large models possess an inherent elasticity mechanism that makes them resistant to alignment efforts, posing challenges for AI safety and alignment [16][17] - The paper "Native Sparse Attention" presents a new attention mechanism designed for efficient long-context modeling, demonstrating superior performance compared to existing sparse attention methods [24][28] Awards for Specific Papers - The best demo paper award went to "OLMoTrace," which can trace language model outputs back to trillions of training tokens, showcasing a significant advancement in understanding model behavior [32] - The best thematic paper award was given to "MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection," which proposes a new adaptive method for fine-tuning large models with minimal parameters [34] Lifetime Achievement and Service Awards - The ACL Lifetime Achievement Award was presented to Professor Kathy McKeown for her extensive contributions to the field of NLP over 43 years [57][60] - The Distinguished Service Award was awarded to Professor Julia B. Hirschberg for her long-standing service to ACL and contributions to the fields of NLP and speech processing [62]
大厂「AI」智能体,等待 DeepSeek 时刻
3 6 Ke· 2025-07-30 23:56
Core Insights - The AI industry remains dominated by major internet companies, with TikTok, Tencent, Alibaba, and Baidu leading the market, collectively holding a user base of over 46 billion [2][5][21] - The AI application market is primarily driven by internet enterprises, with 80% of the top 30 applications coming from these companies, and the four major groups accounting for 66.7% of the market share [2][4] - The focus of major companies this year is on accelerating the deployment of B-end AI agents in specific scenarios, emphasizing the need for both general capabilities and scenario-specific applications [5][21] Company Strategies - Tencent showcased a comprehensive strategy at WAIC, presenting over 10 AI agents across various verticals, including health management and marketing, indicating a broad approach to AI applications [6][21] - Alibaba's cloud platform, with over 200,000 customers and 700,000 agent applications, has emerged as a leader in the practical implementation of AI agents, demonstrating significant market penetration [8][21] - ByteDance has opted for an open-source approach with its Coze Studio and Coze Loop platforms, allowing developers to build and iterate on AI agents, which has garnered significant attention in the developer community [12][13] Market Trends - The growth of AI plugins is outpacing that of native apps, as traditional apps increasingly integrate AI capabilities, indicating a shift in how AI is being utilized across platforms [4][21] - The competition among major internet companies for AI agent commercialization is intensifying, with significant contracts awarded to various players, highlighting the competitive landscape [16][21] - The emergence of AI agents as personal intelligent partners rather than mere tools signifies a shift in market perception, with both B-end and C-end applications being explored [21]
DeepSeek冲刺北交所上市,未来5年战略投资算力租赁,构建AI基础设施生态
Sou Hu Cai Jing· 2025-07-30 07:50
Group 1 - DeepSeek, an AI unicorn, is set to initiate its IPO process on the Beijing Stock Exchange in November 2025, with a focus on computing power leasing as its core strategy for the next five years [1] - The company plans to invest 3 billion yuan in building a self-controlled high-performance computing (HPC) center and collaborate with domestic chip manufacturers to create customized AI computing power solutions [1] - DeepSeek has established strategic partnerships with domestic chip companies such as Huawei Ascend and Cambricon, aiming to support high computing power demand scenarios like large model training and autonomous driving simulation [3] Group 2 - Industry experts indicate that DeepSeek's listing will accelerate the localization of AI computing infrastructure in China and is expected to capture over 35% of the domestic market share within the next 3-5 years [3]
DeepSeek真的不行了吗
3 6 Ke· 2025-07-30 03:32
Core Viewpoint - The decline in DeepSeek's user data has cast a shadow over the prospects of domestic AI, but there is no need for excessive pessimism regarding the future of domestic AI [1][2] Group 1: DeepSeek's Performance - DeepSeek's average monthly downloads dropped from 81.13 million in Q1 to 22.59 million, a decrease of 72.2% [1] - The usage rate of DeepSeek fell from a high of 7.5% at the beginning of the year to 3% [1] - The decline in user data is attributed to the delayed release of the updated version R2 and the high hallucination rate of DeepSeek, which has deterred many users [1] Group 2: Broader Industry Context - Despite DeepSeek's challenges, other domestic internet giants and unicorns are actively investing in AI research and development, with models like Qwen, Wenxin, Quark, and Kimi maintaining strong global rankings [2] - China's advantages in the AI race include a large-scale market and diverse application scenarios, providing ample user behavior data and market demand [3] Group 3: Industry Challenges and Future Directions - The decline in DeepSeek's traffic raises industry-wide questions about maintaining technological leadership and achieving sustainable development through business models [3] - The future of the AI industry will depend on building an open, collaborative, and sustainable ecosystem rather than merely competing on model parameters [3][4] - Policymakers should allow multiple technological routes to develop simultaneously and recognize the value of real-world data generated from various sectors [4] Group 4: Importance of Innovation and Value Creation - The key to success lies in transforming technology into scene value, commercial value, and social value, which may signal the beginning of a "second growth curve" for China's AI [5]
DeepSeek流量暴跌,要凉了?是它幻觉太严重还是它在闷声发大财?
3 6 Ke· 2025-07-28 23:45
Core Insights - DeepSeek, once hailed as a "national-level" project, has seen a significant decline in its monthly downloads, dropping from 81.13 million in Q1 to 22.59 million, a decrease of 72.2% [1] - Users are increasingly frustrated with DeepSeek's tendency to generate "hallucinated" content, leading to discussions on social media about how to eliminate the "AI flavor" from its outputs [1][2] - The phenomenon of "AI flavor" is characterized by overly mechanical and formulaic responses, which users have begun to recognize and criticize [15] User Experiences - Users have reported instances where DeepSeek provided nonsensical or fabricated advice, such as suggesting irrelevant actions for personal issues or generating non-existent references [2][8][9] - The model's responses often include fabricated data and sources, leading to a lack of trust in its outputs [9][12] Underlying Issues - The decline in DeepSeek's performance is attributed to its reliance on rigid logical structures and formulaic language, which detracts from the quality of its responses [16] - The model's training data is heavily skewed towards English, with less than 5% of its corpus being high-quality Chinese content, limiting its effectiveness in generating diverse and nuanced outputs [22] - Content moderation and the expansion of sensitive word lists have further constrained the model's ability to produce creative and varied language [22] Recommendations for Improvement - Users are encouraged to develop skills to critically assess AI-generated content, including cross-referencing data and testing the model's logic [23] - Emphasizing the importance of human oversight in AI applications, the industry should focus on using AI as a tool for enhancing human creativity rather than as a replacement [24][25]
燧原科技发布DeepSeek一体机
news flash· 2025-07-28 03:21
Core Viewpoint - The company showcased its latest product, the DeepSeek integrated machine, at the WAIC conference, highlighting its low entry barrier and high efficiency, which attracted significant attention from attendees [1] Company Summary - The DeepSeek integrated machine aims to lower the application threshold of artificial intelligence technology while enhancing research and development efficiency for enterprises [1] Industry Summary - The introduction of the DeepSeek machine is expected to bring new solutions to the industry, potentially transforming how companies adopt and implement AI technologies [1]
半年不到DeepSeek已跌落神坛?透视DeepSeek暴跌背后的多重真相
Sou Hu Cai Jing· 2025-07-26 01:37
Core Insights - DeepSeek has experienced a dramatic decline in monthly downloads, dropping from over 80 million to around 20 million, a staggering decrease of 72.2% [1][6] - The decline in downloads does not necessarily equate to a loss of user base, as many users are now accessing DeepSeek's models through third-party platforms rather than the official app [2][6] - DeepSeek has shifted its market strategy from broad consumer outreach to focusing on enterprise services and specialized sectors, which has resulted in a decrease in consumer market visibility but improved reputation in professional circles [3][6] Group 1: Download Trends and User Behavior - The sharp drop in official app downloads suggests a potential user exodus, but many users are utilizing DeepSeek's API through integrated third-party applications [2][6] - This indicates that DeepSeek's presence may not have diminished but rather transformed into a more subtle integration within various productivity tools [2][6] Group 2: Strategic Adjustments - DeepSeek has strategically reduced its promotional budget for consumer markets, opting to concentrate on B2B segments such as enterprise services and educational institutions [3][6] - This shift reflects a clearer understanding of its business model and a focus on high-value applications, such as legal document automation and financial modeling [3][6] Group 3: User Experience and Market Expectations - Some users have reported that their experience with DeepSeek has not met their high expectations, leading to negative feedback and a decline in new user downloads [5][6] - The competitive landscape, with rapid advancements from rivals like GPT-4 and Claude, has put pressure on DeepSeek to enhance its model performance and ecosystem [4][5] Group 4: Future Outlook - DeepSeek is at a critical juncture, transitioning from a "viral" product to a "value-driven" platform, where the focus will shift to product quality and service delivery [5][6] - The company must stabilize its core technology and improve user experience to potentially regain market favor and achieve commercial success [6][8]
恒指公司:恒生科指过去一年涨超53% DeepSeek模型成关键催化剂
智通财经网· 2025-07-25 03:46
Group 1: Market Performance - The Hang Seng Tech Index recorded an increase of over 53% in the past year, outperforming the broader Hang Seng Composite Index by 12 percentage points [1] - As of July 18, 2025, the Tech Index has risen by 24% year-to-date, continuing its upward trend after a nearly 19% increase in 2024 [1] - The Tech Index's one-year annualized volatility is approximately 41%, compared to 28% for the Hang Seng Composite Index [1] Group 2: Exchange-Traded Products (ETPs) - The total assets under management (AUM) for ETPs tracking the Tech Index reached $26.3 billion as of June 30, 2025, reflecting a nearly 35% increase from the end of 2024 [2] - The cumulative AUM growth from 2021 to 2025 for ETPs tracking the Tech Index is 362% [2] - There are currently 29 ETPs based on the Tech Index listed across 13 different exchanges in the US, Europe, and Asia [2] Group 3: Futures Market - The average daily trading volume (ADV) for Tech Index futures reached approximately 172,000 contracts in the first half of 2025, a 44% increase from 2024 [3] - Since the launch of Tech Index futures in November 2020, the ADV has increased by 97 times, significantly boosting its market share in Hong Kong's overall futures market to 22% [3]
DeepSeek月均下载量暴跌72.2%!周鸿祎:梁文锋不屑于做APP,他把技术全都开源免费【附大模型行业市场分析】
Qian Zhan Wang· 2025-07-25 01:34
Core Insights - DeepSeek's monthly average downloads significantly dropped from 81.13 million in Q1 2025 to 22.59 million in Q2 2025, a decline of 72.2% [2] - The decline is attributed to user diversion to other applications that have integrated DeepSeek's open-source model, with 59.2% of lost users switching to Baidu App and 38.6% to Doubao App [2] - Major companies like Alibaba, ByteDance, and Baidu have launched cheaper competing APIs, further squeezing DeepSeek's market space [2] Company Overview - DeepSeek, developed by Deep Seek (Hangzhou) Technology Co., is an open-source AI product known for its low cost and high performance, with a training cost of only $6 million using 2048 NVIDIA H800 GPUs [3] - Despite the drop in downloads, DeepSeek's open-source strategy has contributed significantly to the industry's development [3] Industry Context - The AI model cost in China is significantly lower than that of international giants, with DeepSeek-R1's inference cost being about one-thirtieth of OpenAI's operational cost [5] - As of April 2024, approximately 305 large models have been launched in China, with 254 of them having over 1 billion parameters [4] Competitive Landscape - Baidu's Wenxin model 4.5 and X1 have been released, with the former outperforming GPT-4.5 in several tests and having an API call price only 1% of GPT-4.5's [5] - The competitive landscape includes various models such as Alibaba's Tongyi Qianwen, ByteDance's Doubao model, and others, each with unique features and pricing strategies [6] Technological Impact - AI technologies represented by DeepSeek are becoming core drivers of industry innovation, enhancing data integration, multi-modal analysis, and complex scenario simulation [7] - The lightweight nature, performance improvements, and rapid cost reductions of large models are accelerating their development and application in new industrialization [9]