Workflow
Seek .(SKLTY)
icon
Search documents
MiniMax追着DeepSeek打
Jing Ji Guan Cha Wang· 2025-06-18 11:32
Core Viewpoint - MiniMax has launched its self-developed MiniMax M1 model, which competes directly with DeepSeek R1 and Google's Gemini 2.5 Pro in terms of key technical specifications, architecture design, context processing capabilities, and training costs [1][2]. Group 1: Model Specifications - MiniMax M1 supports a context length of 1 million tokens, which is 8 times larger than DeepSeek R1's 128,000 tokens and only slightly behind Google's Gemini 2.5 Pro [1]. - The total parameter count for MiniMax M1 is 456 billion, with 45.9 billion parameters activated per token, while DeepSeek R1 has a total of 671 billion parameters but activates only 37 billion per token [1]. Group 2: Cost Efficiency - MiniMax M1 consumes only 25% of the floating-point operations compared to DeepSeek R1 when generating 100,000 tokens, and requires less than half the computational power for inference tasks of 64,000 tokens [2]. - The training cost for MiniMax M1 was only $535,000, significantly lower than the initial expectations and much less than the $5-6 million GPU cost for training DeepSeek R1 [2]. Group 3: Pricing Strategy - MiniMax M1 has a tiered pricing model for its API services based on the number of input or output tokens, with the first tier charging 0.8 yuan per million input tokens and 8 yuan per million output tokens, which is lower than DeepSeek R1's pricing [3]. - The pricing for the first two tiers of MiniMax M1 is lower than that of DeepSeek R1, and the third tier for long text is currently not covered by DeepSeek [3]. Group 4: Technology Innovations - MiniMax M1's capabilities are supported by two core technologies: the linear attention mechanism (Lightning Attention) and the reinforcement learning algorithm CISPO, which enhances efficiency and stability in training [2].
科、创两板打通未盈利企业IPO之路, 一揽子新政精准服务“DeepSeek时刻”
21世纪经济报道 实习生 张长荣 记者 崔文静 北京报道 "将进一步全面深化资本市场改革开放,推动科技 创新和产业创新融合发展迈上新台阶。"中国证监会主席吴清在6月18日召开的2025陆家嘴论坛上表示。 在本次论坛上,吴清宣布多项重磅措施,包括重启未盈利企业适用科创板第五套标准上市及6项科创板 改革措施,在创业板正式启用第三套标准,统筹推进投融资综合改革和投资者权益保护等。 吴清提到,当前,新一轮科技革命和产业变革加速演进。从我国看,科技创新正在从点状突破向系统集 成加快推进,诸多领域都迎来激动人心的"DeepSeek时刻"。 "科技创新、产业创新和资本市场发展相辅相成、相互成就。"吴清表示,一方面,资本市场具有独特的 风险共担、利益共享的激励相容机制。另一方面,资本市场通过对关键要素和资产定价,可以激发企业 家精神和人才创新创造活力,更好服务传统产业升级、新兴产业壮大和未来产业培育。而资本市场在有 力服务科技创新和产业转型升级过程中,也反过来促进了自身结构、效率和投资价值的不断改善。 将推出进一步深化科创板改革的"1+6"政策措施 科技创新的不断推进对加快构建与之相适配的金融服务体系提出了更高要求。 吴清表 ...
MiniMax新模型对标DeepSeek;豆包上线AI播客;美参院通过稳定币法案
Guan Cha Zhe Wang· 2025-06-18 00:49
Group 1: MiniMax and AI Developments - MiniMax announced the release of its first open-source inference model, MiniMax-M1, which competes with leading models like DeepSeek-R1 and Qwen3 [1] - The training process for MiniMax-M1 was completed in just three weeks using 512 H800 GPUs, with a computing rental cost of approximately $534,700, which is an order of magnitude lower than initial expectations [1] Group 2: AI Features and Upgrades - Doubao launched an AI podcast feature on its desktop version, allowing users to generate dialogue podcasts from uploaded PDFs or web links [3][4] - Baidu introduced the industry's first dual digital human interactive live broadcast room, enhancing marketing conversion and user experience [4] Group 3: Corporate Changes and AI Strategy - Apple’s AI and machine learning strategy senior vice president, John Giannandrea, is reportedly being sidelined due to slow progress in AI projects and misalignment with other executives [5] - Amazon's CEO Andy Jassy indicated that the company expects a reduction in its workforce in the coming years as AI tools and smart agents become more prevalent [6] Group 4: Financial Moves and Investments - SoftBank raised approximately $4.8 billion by selling shares of T-Mobile, which will fund its ambitious AI plans, including a potential investment of up to $30 billion in OpenAI [6] - Shanghai Zhaoxin Integrated Circuit Co., Ltd. received approval for its IPO on the Sci-Tech Innovation Board, aiming to raise 4.169 billion yuan for various processor projects [8] Group 5: Regulatory Developments - The U.S. Senate passed the "Genius Act," establishing a regulatory framework for stablecoins, marking a significant step in cryptocurrency legislation [7] - JD.com aims to apply for stablecoin licenses in major currency countries to reduce cross-border payment costs by 90% and improve efficiency [7] Group 6: Market Trends and Innovations - Miniso's founder discussed a four-step methodology for IP operation, emphasizing the importance of having a closed-loop system for successful product distribution [9] - China's new generation crewed spacecraft "Dream Boat" successfully completed a zero-height escape flight test, marking a significant milestone in the country's lunar exploration efforts [9]
DeepSeek R1-0528在WebDev竞技场与Claude Opus 4并列第一
news flash· 2025-06-17 23:00
Core Insights - The latest ranking from LMArena highlights DeepSeek R1-0528 as a top performer, sharing the first position with Google Gemini 2.5 0605 and Claude opus 4 [1] Group 1 - DeepSeek R1-0528 excels in overall performance, ranking first alongside Google Gemini 2.5 0605 and Claude opus 4 [1] - In specific categories, DeepSeek ranks 6th in comprehensive text capabilities, 2nd in programming, 4th in high-difficulty prompts, and 5th in mathematics [1] - The model is noted for being the strongest open-source model currently available, under the MIT open-source license [1]
Kimi超过DeepSeek的新模型被指“套壳”Qwen?到底怎么回事儿
Hu Xiu· 2025-06-17 12:15
Core Viewpoint - The release of the open-source model Kimi-Dev-72B by Moonlight Dark Side has set a new record in software engineering task benchmarks, achieving a score of 60.4% on SWE-bench Verified, surpassing several competitors including DeepSeek [1][3]. Model Development - Kimi-Dev-72B is based on the Qwen/Qwen2.5-72B model, indicating it is not a completely original model but rather a fine-tuned version utilizing a large dataset of GitHub issues and PR submissions for training [2][3]. - The innovative aspect of Kimi-Dev lies in its training methodology, which employs large-scale reinforcement learning to autonomously fix real code repository issues within a Docker environment [3]. Licensing and Compliance - Kimi-Dev-72B is released under the MIT license, but it must comply with the original licensing restrictions of Qwen-2.5-72B, which is governed by the Qwen LICENSE AGREEMENT [4][5]. - The licensing controversy stems from questions about whether Moonlight Dark Side obtained special permission to use Qwen-2.5-72B, as the licensing agreement stipulates commercial licensing requirements when monthly active users exceed 100 million [6][7]. Community Response - The Qwen team clarified that they did not grant permission for the use of Qwen-2.5-72B, but later described the issue as a "legacy problem" related to their evolving licensing strategy [8][10]. - The Qwen team has transitioned to a more open licensing model with the upcoming Qwen3 series, adopting the Apache 2.0 protocol for all models, which aims to foster a more open and active AI ecosystem [12][13]. Industry Implications - The case illustrates a shift in the AI industry towards open-source collaboration, moving from restrictive licensing to more open models to encourage developer engagement and innovation [16][18]. - The rising trend of "second innovation" based on strong foundational models highlights the importance of differentiation in value creation within the open-source ecosystem [16].
MiniMax发布开源混合架构推理模型M1,M1所需的算力仅为DeepSeek R1的约30%
news flash· 2025-06-17 08:32
Core Insights - MiniMax, an AI unicorn based in Shanghai, has officially launched the open-source inference model MiniMax-M1 (referred to as "M1") [1] - M1 is claimed to be the world's first large-scale mixed attention inference model with open weights [1] - The model combines the Mixture-of-Experts (MoE) architecture with Lightning Attention, achieving significant breakthroughs in performance and inference efficiency [1] - Test data indicates that the M1 series surpasses most closed-source models in long context understanding and code generation productivity scenarios, with only a slight gap behind top closed-source systems [1]
MiniMax开源首个推理模型,456B参数,性能超DeepSeek-R1,技术报告公开
3 6 Ke· 2025-06-17 08:15
Core Insights - MiniMax has launched the world's first open-source large-scale hybrid architecture inference model, MiniMax-M1, with a five-day continuous update plan [2] Model Specifications - The M1 model has a parameter scale of 456 billion, activating 45.9 billion parameters per token, supporting 1 million context inputs and the longest 80,000 token inference output in the industry, which is 8 times that of DeepSeek-R1 [4] - Two versions of the MiniMax-M1 model were trained with thinking budgets of 40k and 80k [4] Training and Cost - The training utilized 512 H800 units over three weeks, costing approximately $537,400 (around 3.859 million RMB), which is an order of magnitude lower than initial cost expectations [7] - The M1 model is available for unlimited free use on the MiniMax app and web [7] API Pricing Structure - The API pricing for M1 is tiered based on input length: - 0-32k input: 0.8 RMB/million tokens input, 8 RMB/million tokens output - 32k-128k input: 1.2 RMB/million tokens input, 16 RMB/million tokens output - 128k-1M input: 2.4 RMB/million tokens input, 24 RMB/million tokens output [7][11] - Compared to DeepSeek-R1, M1's first tier input price is 80% and output price is 50% of DeepSeek-R1's, while the second tier input price is 1.2 times higher [9] Performance Evaluation - MiniMax-M1 outperforms other models like DeepSeek-R1 and Qwen3-235B in complex software engineering, tool usage, and long context tasks [13][14] - In the MRCR test, M1's performance is slightly lower than Gemini 2.5 Pro but better than other models [13] - In the SWE-bench Verified test set, M1-40k and M1-80k perform slightly worse than DeepSeek-R1-0528 but better than other open-source models [14] Technical Innovations - M1 employs a mixed expert (MoE) architecture and a lightning attention mechanism, allowing efficient scaling for long input and complex tasks [16] - The model utilizes large-scale reinforcement learning (RL) for training, with a new CISPO algorithm that enhances performance by optimizing importance sampling weights [16][17] Future Directions - MiniMax emphasizes the need for "Language-Rich Mediator" agents to handle complex scenarios requiring dynamic resource allocation and multi-round reasoning [19]
MiniMax发布推理模型对标DeepSeek,算力成本仅约53万美元
Di Yi Cai Jing· 2025-06-17 07:26
Core Insights - MiniMax, one of the "Six Little Dragons," has announced significant updates, starting with the release of its first open-source inference model, MiniMax-M1 [1] - MiniMax-M1 has shown competitive performance in benchmark tests, comparable to leading overseas models like DeepSeek-R1 and Qwen3 [3] - The model's training was completed in just three weeks using 512 H800 GPUs, with a total computing cost of only $534,700, which is an order of magnitude lower than initially expected [3][8] Performance Metrics - MiniMax-M1's context window length is 1 million tokens, which is eight times that of DeepSeek R1 and matches Google's Gemini 2.5 Pro, allowing superior performance in long-context understanding tasks [5] - In the TAU-bench evaluation, MiniMax-M1 outperformed DeepSeek-R1-0528 and Google's Gemini 2.5 Pro, ranking just below OpenAI o3 and Claude 4 Opus globally [7] - The model excels in coding capabilities, significantly surpassing most open-source models, with only a slight gap behind the latest DeepSeek R1 [7] Innovations and Cost Efficiency - MiniMax-M1 utilizes a hybrid architecture based on a lightning attention mechanism, enhancing efficiency in long-text input and deep reasoning tasks [7] - The introduction of the CISPO reinforcement learning algorithm has resulted in faster convergence performance compared to Byte's recent DAPO algorithm, contributing to the low training cost [8] - MiniMax's pricing strategy is tiered based on input length, with costs ranging from $0.8 to $2.4 per million tokens for input and $8 to $24 for output, offering competitive pricing against DeepSeek [8] Competitive Landscape - Concurrently, another competitor, Moonlight, has released its programming model Kimi-Dev-72B, which reportedly achieved the highest open-source model level in SWE-bench tests, surpassing the new DeepSeek-R1 [8] - However, Kimi-Dev-72B faced scrutiny for potential overfitting, as it generated less code than required for certain tasks, raising questions about its performance reliability [9] - The AI industry is witnessing renewed competition among the "Six Little Dragons," with MiniMax expected to release further updates in the coming days, potentially impacting the multi-modal AI landscape [9]
一财社论:中国创新药“DeepSeek”时刻来临了吗
Di Yi Cai Jing· 2025-06-16 13:32
Core Insights - The rise of China's innovative pharmaceutical industry indicates that true innovation comes from companies adapting to market demands, leading to unexpected successes [1][4] - By 2025, China's innovative drugs are expected to enter a period of significant development, with multiple strategic collaborations and transactions recently announced [1][2] Industry Performance - As of June 13, the CSI Innovative Drug Industry Index in A-shares has increased by 17.34% year-to-date, while the Hang Seng Biotechnology Index has surged by 61.96% [1] - In Q1 of this year, the total transaction amount for mergers and acquisitions involving Chinese pharmaceutical companies reached $36.9 billion, accounting for over half of the global total of $67.5 billion [2] Factors Driving Growth - The open access to overseas markets allows Chinese biopharmaceutical companies to effectively participate in the international innovative drug ecosystem, benefiting from better asset valuation and a robust intellectual property protection system [3] - A favorable domestic environment for innovation, including relaxed policies for clinical trials and a talent surplus, has reduced overall costs for Chinese innovative drug companies [3] - The formation of a specialized and collaborative innovation ecosystem in China's biopharmaceutical sector has enhanced focus and significantly lowered innovation costs [3] Challenges and Future Outlook - Despite recent successes, the domestic innovative drug industry still faces challenges, including pricing pressures, insufficient capital market support, and intense competition [4] - To maintain international competitiveness, the industry must enhance domestic market capabilities while ensuring that innovative assets circulate effectively between domestic and international markets [4] - Shanghai's recent initiatives to strengthen intellectual property protection and foster collaboration in the biopharmaceutical sector serve as a model for reducing innovation costs and improving medical outcome conversion rates [5] Sustainability of Innovation - For China to achieve a sustainable "DeepSeek" moment in the innovative drug sector, it is crucial to create and maintain an environment that fosters innovation without excessive intervention [6]
摩根士丹利:DeepSeek R2:AI推理新一代重量级模型?
摩根· 2025-06-16 03:16
Investment Rating - The report provides a cautious outlook on the technology sector in Asia Pacific, particularly focusing on the developments surrounding DeepSeek's R2 model [7]. Core Insights - DeepSeek's R2 model is anticipated to redefine AI development, pricing, and reliance on domestic AI chip supply chains in China, serving as a potential catalyst for accelerating AI application deployment [1][2]. - The R2 model is expected to achieve significant advancements in multilingual reasoning and code generation, offering a hybrid model with lower power consumption and smaller parameter scale, while being cost-effective compared to its predecessor R1 [2][9]. - The model's efficiency is projected to lower computational requirements, facilitating AI commercialization and expanding total demand, potentially disrupting the AI market [2][10]. Summary by Sections R2 Model Overview - R2 represents the second major iteration of DeepSeek's reasoning model, promising improvements in multilingual reasoning and code generation, with a focus on efficiency and cost reduction [2][9]. - The model is designed to be multimodal, featuring enhanced visual capabilities and a significant reduction in operational costs compared to R1 [2][13]. Supply Chain Developments - The R2 model is supported by a robust ecosystem of Chinese companies, leveraging Huawei's Ascend 910B chip cluster for training, which signifies a shift towards a localized supply chain [3][17]. - DeepSeek aims to reduce dependency on external chip manufacturers, contrasting with the previous reliance on NVIDIA GPUs for training the R1 model [17][20]. Market Impact - The report suggests that DeepSeek's advancements will benefit local GPU, GDDR, and China's HBM sectors, indicating a positive outlook for these industries amidst a broader AI market recovery [20][22]. - The performance of DeepSeek's models, particularly in the context of increasing computational demands during inference, is expected to drive further innovation and resource allocation within the AI ecosystem [20][23]. Competitive Landscape - DeepSeek's approach emphasizes software-driven resource optimization rather than hardware dependency, which could lead to significant cost reductions and efficient training of large models [23][24]. - The report highlights the competitive pressure on NVIDIA from Huawei's Ascend chips, which are designed to match NVIDIA's performance while being domestically produced [17][20].