Workflow
DeepSeek LLM
icon
Search documents
梁文锋代表DeepSeek,他代表梁文锋
量子位· 2025-11-15 02:08
Core Viewpoint - The article discusses the emergence of "Hangzhou Six Little Dragons" at the World Internet Conference in Wuzhen, highlighting the presence of key figures in AI and technology, particularly focusing on DeepSeek and its representative, Chen Deli, who expressed both optimism and concerns about the future impact of AI on society [1][3][41]. Group 1: DeepSeek and Its Representation - DeepSeek's founder Liang Wenfeng did not attend the conference; instead, researcher Chen Deli represented the company, marking a significant public appearance for DeepSeek [3][6][41]. - Chen Deli, who joined DeepSeek in 2023, has been involved in critical research areas such as language models and alignment mechanisms, contributing to several important publications [18][22][20]. - The article notes that Chen Deli's presence at the conference has made him the second public representative of DeepSeek after Liang Wenfeng, emphasizing his role as a spokesperson for the company's views on AI [41][42]. Group 2: AI Perspectives - Chen Deli expressed a mixed outlook on AI, stating that while there is a "honeymoon period" between humans and AI over the next three to five years, there are significant long-term concerns about AI potentially replacing most jobs in society [8][9]. - He highlighted that the current AI revolution differs fundamentally from previous industrial revolutions, as AI is beginning to possess its own "intelligence," which could surpass human capabilities in certain areas [10][11]. - The potential for AI to disrupt existing social order and economic structures is a major concern, with Chen suggesting that technology companies may need to act as "guardians" to mitigate negative impacts [12][13]. Group 3: Value Alignment in AI - During his presentation, Chen Deli introduced the concept of "value alignment decoupling," proposing that core values should be unified while allowing users to customize diverse values, ensuring safety and adaptability to societal diversity [25][24]. - This approach aims to address the rigidity of traditional large models, which often embed fixed values that do not reflect the complexity of human society [24][25]. - The idea of "harmony in diversity" encapsulates this new perspective on AI value alignment, suggesting a more flexible and user-centric approach to AI development [26][25].
DeepSeek开源引领AI普惠化浪潮
Wind万得· 2025-03-02 22:40
Core Insights - The article discusses the rapid advancements in AI, particularly focusing on DeepSeek's open-source strategy and the release of OpenAI's GPT-4.5 model, highlighting the competitive landscape in the AI large model sector [1][9]. Group 1: DeepSeek's Open-Source Strategy - DeepSeek, established in 2023, has released several products, including DeepSeek R1, which offers performance comparable to leading closed-source models at a significantly lower training cost of approximately $557.6 million [2][5]. - The open-source initiative by DeepSeek, including the release of code libraries like FlashMLA and DeepEP, aims to lower the development barrier for AI models and enhance computational efficiency [5][6]. - The performance of DeepSeek R1 has led to a rapid user growth of 100 million within just seven days post-launch, marking it as the fastest-growing AI application globally [7]. Group 2: Global AI Large Model Progress - The AI large model sector is experiencing significant growth, with DeepSeek's low-cost models challenging existing players like Kimi, which saw only a 28% increase in active users compared to DeepSeek's 750% growth [7]. - OpenAI's GPT-4.5, released on February 28, 2025, is touted as the largest and most knowledgeable chat model to date, with a high cost structure that raises questions about its performance relative to its price [9][10]. - The competitive landscape is shifting, with DeepSeek's open-source approach prompting other companies, including OpenAI, to consider similar strategies to maintain market relevance [13]. Group 3: AI Large Model Investment Dynamics - The emergence of low-cost, high-performance models like those from DeepSeek is reshaping investment dynamics, allowing smaller firms to enter the market and focus on innovation rather than heavy capital investment [14][15]. - The article notes a trend where investment focus is shifting from infrastructure to application scenarios, with significant funding opportunities in vertical applications such as finance and healthcare [15]. - Recent funding events in the AI large model sector indicate a growing interest, with several companies securing substantial investments, reflecting the market's evolving landscape [16][17].
Deepseek背景综述及在金融领域应用场景初探
China Post Securities· 2025-02-26 11:07
Quantitative Models and Construction Methods Model Name: DeepSeek-R1 - **Model Construction Idea**: The DeepSeek-R1 model leverages a mixture of experts (MoE) architecture and dynamic routing technology to reduce inference costs while maintaining high performance[16] - **Model Construction Process**: - **Mixture of Experts (MoE)**: Integrates multiple "expert" models to enhance overall model performance. A gating network determines which expert(s) should handle specific inputs[27] - **Group Relative Policy Optimization (GRPO)**: Eliminates the need for a separate critic model in reinforcement learning, reducing training costs by using group scores to estimate the baseline[31] - **Self-evolution Process**: The model improves its reasoning capabilities through reinforcement learning, exhibiting complex behaviors like reflection and exploration of alternative approaches[39][41] - **Cold Start**: Introduces high-quality long CoT data to stabilize the model during the initial training phase[42] - **Model Evaluation**: The model demonstrates significant cost efficiency and high performance, making it a groundbreaking development in AI applications[16][43] Model Name: DeepSeek-V2 - **Model Construction Idea**: The DeepSeek-V2 model is a powerful MoE language model designed with innovative architectures like Multi-head Latent Attention (MLA)[23] - **Model Construction Process**: - **Multi-head Latent Attention (MLA)**: Improves performance over traditional Multi-head Attention (MHA) by reducing KV cache, enhancing inference efficiency[25] - **Mixture of Experts (MoE)**: Similar to DeepSeek-R1, it uses a gating network to activate specific experts based on input, optimizing resource usage and performance[27] - **Model Evaluation**: The model shows advantages in performance, training cost, and inference efficiency, making it a strong, economical, and efficient language model[23][27] Model Name: DeepSeek-V3 - **Model Construction Idea**: The DeepSeek-V3 model aims to enhance open-source model performance and push towards general artificial intelligence[33] - **Model Construction Process**: - **Multi-Token Prediction (MTP)**: Enhances model performance by predicting multiple future tokens at each position, increasing training signal density[34] - **FP8 Mixed Precision Training**: Improves computational efficiency and reduces memory usage while maintaining model accuracy by using lower precision data types[36] - **Model Evaluation**: The model effectively balances computational efficiency and performance, making it suitable for large-scale model training[33][36] Model Backtesting Results - **DeepSeek-R1**: Demonstrates significant cost efficiency, achieving performance comparable to ChatGPT-01 with much lower training costs[43] - **DeepSeek-V2**: Shows superior performance and efficiency in training and inference compared to traditional models[23][27] - **DeepSeek-V3**: Achieves high computational efficiency and maintains model accuracy, making it effective for large-scale training[33][36] Quantitative Factors and Construction Methods Factor Name: Scaling Laws - **Factor Construction Idea**: Describes the predictable relationship between model performance and the scale of model parameters, training data, and computational resources[21] - **Factor Construction Process**: - **Scaling Laws**: As model parameters, training data, and computational resources increase, model performance improves in a predictable manner[21] - **Data Quality**: High-quality data shifts the optimal allocation strategy towards model expansion[22] - **Factor Evaluation**: Provides a strong guideline for resource planning and model performance optimization[21][22] Factor Backtesting Results - **Scaling Laws**: Demonstrates a predictable improvement in model performance with increased resources, validating the factor's effectiveness in guiding model development[21][22]
快看!这就是DeepSeek背后的公司
梧桐树下V· 2025-01-29 03:16
| © 企查查 企业主页 | | --- | | 杭州深度求索人工智能基础技术研 存续 | | 究有限公司 | | 21万+ 91330105MACPN4X08Y ¥ 发票抬头 | | 简介:DeepSeek成立于2023年,是一家通用人工智能模... 展开 | | 法定代表人 注册资本 成立日期 | | 製作 1000万元 2023-07-17 | | 企查查行业 规模 品丁 2023年 | | 信息系统集成服务 微型 XS 4人 | | & 0571-85377238 | | 9 浙江省杭州市拱墅区环城北路169号汇金国际大厦西1幢120 | | 1室 | | 宁波程图个业管理 | | 梁文章 服 咨询合伙 ... 大股东 | | 东 | | 持股比例 99.00% 持股比例 1.00% 2 | | 投资企业2家 关联企业15家 2 | | 裴活 王南军 | | 퀘 + 등 执行董事兼. 监事 | | 2 关联企业3家 关联企业2家 | 文/梧桐晓驴 DeepSeek爆火,晓驴好奇地去查了一下开发、运营DeepSeek的公司情况。 "企查查"显示:杭州深度求索人工智能基础技术研究有限公司,英文名Hangz ...