DeepSeek LLM

Search documents
DeepSeek开源引领AI普惠化浪潮
Wind万得· 2025-03-02 22:40
Core Insights - The article discusses the rapid advancements in AI, particularly focusing on DeepSeek's open-source strategy and the release of OpenAI's GPT-4.5 model, highlighting the competitive landscape in the AI large model sector [1][9]. Group 1: DeepSeek's Open-Source Strategy - DeepSeek, established in 2023, has released several products, including DeepSeek R1, which offers performance comparable to leading closed-source models at a significantly lower training cost of approximately $557.6 million [2][5]. - The open-source initiative by DeepSeek, including the release of code libraries like FlashMLA and DeepEP, aims to lower the development barrier for AI models and enhance computational efficiency [5][6]. - The performance of DeepSeek R1 has led to a rapid user growth of 100 million within just seven days post-launch, marking it as the fastest-growing AI application globally [7]. Group 2: Global AI Large Model Progress - The AI large model sector is experiencing significant growth, with DeepSeek's low-cost models challenging existing players like Kimi, which saw only a 28% increase in active users compared to DeepSeek's 750% growth [7]. - OpenAI's GPT-4.5, released on February 28, 2025, is touted as the largest and most knowledgeable chat model to date, with a high cost structure that raises questions about its performance relative to its price [9][10]. - The competitive landscape is shifting, with DeepSeek's open-source approach prompting other companies, including OpenAI, to consider similar strategies to maintain market relevance [13]. Group 3: AI Large Model Investment Dynamics - The emergence of low-cost, high-performance models like those from DeepSeek is reshaping investment dynamics, allowing smaller firms to enter the market and focus on innovation rather than heavy capital investment [14][15]. - The article notes a trend where investment focus is shifting from infrastructure to application scenarios, with significant funding opportunities in vertical applications such as finance and healthcare [15]. - Recent funding events in the AI large model sector indicate a growing interest, with several companies securing substantial investments, reflecting the market's evolving landscape [16][17].
Deepseek背景综述及在金融领域应用场景初探
China Post Securities· 2025-02-26 11:07
Quantitative Models and Construction Methods Model Name: DeepSeek-R1 - **Model Construction Idea**: The DeepSeek-R1 model leverages a mixture of experts (MoE) architecture and dynamic routing technology to reduce inference costs while maintaining high performance[16] - **Model Construction Process**: - **Mixture of Experts (MoE)**: Integrates multiple "expert" models to enhance overall model performance. A gating network determines which expert(s) should handle specific inputs[27] - **Group Relative Policy Optimization (GRPO)**: Eliminates the need for a separate critic model in reinforcement learning, reducing training costs by using group scores to estimate the baseline[31] - **Self-evolution Process**: The model improves its reasoning capabilities through reinforcement learning, exhibiting complex behaviors like reflection and exploration of alternative approaches[39][41] - **Cold Start**: Introduces high-quality long CoT data to stabilize the model during the initial training phase[42] - **Model Evaluation**: The model demonstrates significant cost efficiency and high performance, making it a groundbreaking development in AI applications[16][43] Model Name: DeepSeek-V2 - **Model Construction Idea**: The DeepSeek-V2 model is a powerful MoE language model designed with innovative architectures like Multi-head Latent Attention (MLA)[23] - **Model Construction Process**: - **Multi-head Latent Attention (MLA)**: Improves performance over traditional Multi-head Attention (MHA) by reducing KV cache, enhancing inference efficiency[25] - **Mixture of Experts (MoE)**: Similar to DeepSeek-R1, it uses a gating network to activate specific experts based on input, optimizing resource usage and performance[27] - **Model Evaluation**: The model shows advantages in performance, training cost, and inference efficiency, making it a strong, economical, and efficient language model[23][27] Model Name: DeepSeek-V3 - **Model Construction Idea**: The DeepSeek-V3 model aims to enhance open-source model performance and push towards general artificial intelligence[33] - **Model Construction Process**: - **Multi-Token Prediction (MTP)**: Enhances model performance by predicting multiple future tokens at each position, increasing training signal density[34] - **FP8 Mixed Precision Training**: Improves computational efficiency and reduces memory usage while maintaining model accuracy by using lower precision data types[36] - **Model Evaluation**: The model effectively balances computational efficiency and performance, making it suitable for large-scale model training[33][36] Model Backtesting Results - **DeepSeek-R1**: Demonstrates significant cost efficiency, achieving performance comparable to ChatGPT-01 with much lower training costs[43] - **DeepSeek-V2**: Shows superior performance and efficiency in training and inference compared to traditional models[23][27] - **DeepSeek-V3**: Achieves high computational efficiency and maintains model accuracy, making it effective for large-scale training[33][36] Quantitative Factors and Construction Methods Factor Name: Scaling Laws - **Factor Construction Idea**: Describes the predictable relationship between model performance and the scale of model parameters, training data, and computational resources[21] - **Factor Construction Process**: - **Scaling Laws**: As model parameters, training data, and computational resources increase, model performance improves in a predictable manner[21] - **Data Quality**: High-quality data shifts the optimal allocation strategy towards model expansion[22] - **Factor Evaluation**: Provides a strong guideline for resource planning and model performance optimization[21][22] Factor Backtesting Results - **Scaling Laws**: Demonstrates a predictable improvement in model performance with increased resources, validating the factor's effectiveness in guiding model development[21][22]
快看!这就是DeepSeek背后的公司
梧桐树下V· 2025-01-29 03:16
| © 企查查 企业主页 | | --- | | 杭州深度求索人工智能基础技术研 存续 | | 究有限公司 | | 21万+ 91330105MACPN4X08Y ¥ 发票抬头 | | 简介:DeepSeek成立于2023年,是一家通用人工智能模... 展开 | | 法定代表人 注册资本 成立日期 | | 製作 1000万元 2023-07-17 | | 企查查行业 规模 品丁 2023年 | | 信息系统集成服务 微型 XS 4人 | | & 0571-85377238 | | 9 浙江省杭州市拱墅区环城北路169号汇金国际大厦西1幢120 | | 1室 | | 宁波程图个业管理 | | 梁文章 服 咨询合伙 ... 大股东 | | 东 | | 持股比例 99.00% 持股比例 1.00% 2 | | 投资企业2家 关联企业15家 2 | | 裴活 王南军 | | 퀘 + 등 执行董事兼. 监事 | | 2 关联企业3家 关联企业2家 | 文/梧桐晓驴 DeepSeek爆火,晓驴好奇地去查了一下开发、运营DeepSeek的公司情况。 "企查查"显示:杭州深度求索人工智能基础技术研究有限公司,英文名Hangz ...