Workflow
Mixtral
icon
Search documents
“DeepSeek-V3基于我们的架构打造”,欧版OpenAI CEO逆天发言被喷了
3 6 Ke· 2026-01-26 07:44
Core Viewpoint - The discussion centers around the competitive landscape in the AI field, particularly focusing on the contrasting approaches of Mistral and DeepSeek in developing sparse mixture of experts (MoE) models, with Mistral's CEO acknowledging China's strong position in AI and the significance of open-source models [1][4]. Group 1: Company Perspectives - Mistral's CEO, Arthur Mensch, claims that open-source models are a strategy for progress rather than competition, highlighting their early release of open-source models [1]. - The recent release of DeepSeek-V3 is built on Mistral's proposed architecture, indicating a collaborative yet competitive environment in AI development [1][4]. - There is skepticism among the audience regarding Mistral's claims, with some suggesting that Mistral's recent models may have borrowed heavily from DeepSeek's architecture [4][13]. Group 2: Technical Comparisons - Both DeepSeek and Mistral's Mixtral focus on sparse MoE systems, aiming to reduce computational costs while enhancing model capabilities, but they differ fundamentally in their approaches [9]. - Mixtral emphasizes engineering principles, showcasing the effectiveness of a robust base model combined with mature MoE technology, while DeepSeek focuses on algorithmic innovation to address issues in traditional MoE systems [9][12]. - DeepSeek introduces a fine-grained expert segmentation approach, allowing for more flexible combinations of experts, which contrasts with Mixtral's flat knowledge distribution among experts [11][12]. Group 3: Community Reactions - The community has reacted critically to Mistral's statements, with some users expressing disbelief and pointing out the similarities between Mistral's and DeepSeek's architectures [2][17]. - There is a sentiment that Mistral, once a pioneer in the open-source AI space, is now perceived as having lost its innovative edge, with DeepSeek gaining more influence in the sparse MoE and MLA technologies [14][17]. - The competitive race for foundational models is expected to continue, with DeepSeek reportedly targeting significant releases in the near future [19].
“DeepSeek-V3基于我们的架构打造”,欧版OpenAI CEO逆天发言被喷了
量子位· 2026-01-26 04:45
鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI "DeepSeek-V3是在Mistral提出的架构上构建的。" 欧洲版OpenAI CEO此言一出,炸了锅了。 网友们的反应be like: 这还是温和派,还有更直接的吐槽:Mistral在胡说八道些什么…… 还没吃上瓜的家人们别着急,咱们从头捋一捋这事儿: 在最近一次访谈中,当被问到如何看待中国开源AI的强势发展时,Mistral联合创始人、CEO Arthur Mensch这样回应: 中国在AI领域实力强劲。我们是最早发布开源模型的公司之一,而他们发现这是一个很好的策略。 开源不是真正的竞争,大家在彼此的基础上不断进步。 比如我们在2024年初发布了首个稀疏混合专家模型(MoE),DeepSeek-V3以及之后的版本都是在此基础上构建的。它们采用的是相 同的架构,而我们把重建这种架构所需的一切都公开了。 Arthur Mensch很自信,但网友们听完表示:桥豆麻袋,这不对劲。 且不说DeepSeek MoE论文的发布时间和Arthur Mensch提到的Mixtral论文相差 仅3天 : △ 图源:@Sebastian Raschka 认真细扒起来, ...
Llama论文作者“出逃”,14人团队仅剩3人,法国独角兽Mistral成最大赢家
3 6 Ke· 2025-05-27 08:57
Core Insights - Mistral, an AI startup based in Paris, is attracting talent from Meta, particularly from the team behind the Llama model, indicating a shift in the competitive landscape of AI development [1][4][14] - The exodus of researchers from Meta's AI team, particularly those involved in Llama, highlights a growing discontent with Meta's strategic direction and a desire for more innovative opportunities [3][9][12] - Mistral has quickly established itself as a competitor to Meta, leveraging the expertise of former Meta employees to develop models that meet market demands for deployable AI solutions [14][19] Talent Migration - The departure of Llama team members began in early 2023 and has continued into 2025, with key figures like Guillaume Lample and Timothée Lacroix founding Mistral AI [6][8] - Many of the departing researchers had significant tenure at Meta, averaging over five years, indicating a deeper ideological shift rather than mere job changes [9] Meta's Strategic Challenges - Meta's initial success with Llama has not translated into sustained innovation, as feedback on subsequent models like Llama 3 and Llama 4 has been increasingly critical [11][12] - The leadership change within Meta's AI research division, particularly the departure of Joelle Pineau, has led to a shift in focus from open research to application and efficiency, causing further discontent among researchers [13] Mistral's Growth and Challenges - Mistral achieved over $100 million in seed funding shortly after its founding and has rapidly developed multiple AI models targeting various applications [17] - Despite its high valuation of $6 billion, Mistral faces challenges in monetization and global expansion, with revenue still in the tens of millions and a primary focus on the European market [19][20]
喝点VC|a16z关于DeepSeek的内部复盘:推理模型革新与20倍算力挑战下的AI模型新格局
Z Potentials· 2025-03-23 05:10
Core Insights - The article discusses the emergence and significance of DeepSeek, a new high-performance reasoning model from China, highlighting its open-source nature and the implications for the AI landscape [3][4][12]. Group 1: DeepSeek Overview - DeepSeek has gained attention for its performance on AI model rankings, raising both interest and concerns [3]. - The model's open-source release of weights and technical details provides valuable insights into reasoning models and their future development [4][12]. Group 2: Training Process - The training of DeepSeek involves three main steps: pre-training on vast datasets, supervised fine-tuning (SFT) with human-generated examples, and reinforcement learning with human feedback (RLHF) [6][9][10]. - The training process is designed to enhance the model's ability to provide accurate and contextually relevant answers, moving beyond simple question-answering to more complex reasoning [11][12]. Group 3: Innovations and Techniques - DeepSeek R1 represents a culmination of various innovations, including self-learning capabilities and multi-stage training processes that improve reasoning abilities [11][13][14]. - The model employs a mixture of experts (MoE) architecture, which allows for efficient training and high performance in reasoning tasks [15][30]. Group 4: Performance and Cost - The cost of training DeepSeek V3 was approximately $5.5 million, with the transition to R1 being less expensive due to the focus on reasoning and smaller-scale SFT [27][29]. - The article notes that the performance of reasoning models has significantly improved, with DeepSeek R1 demonstrating capabilities comparable to leading models in the industry [31][35]. Group 5: Future Implications - The rise of reasoning models like DeepSeek indicates a shift in the AI landscape, necessitating increased computational resources for inference and testing [31][34]. - The open-source nature of these models fosters innovation and collaboration within the AI community, potentially accelerating advancements in the field [36][39].