Workflow
Seek .(SKLTY)
icon
Search documents
火山引擎总裁谭待:豆包大模型1.6综合成本只有DeepSeek R1的三分之一
Xi Niu Cai Jing· 2025-06-19 08:17
Core Insights - ByteDance's Volcano Engine showcased its strength in AI technology and commitment to AI cloud services during the Force Original Power Conference [2] - The company announced several product updates, including the Doubao Model 1.6 and the Seedance 1.0 pro video generation model, along with upgrades to its AI cloud-native services [2] Product Updates - The Doubao 1.6 model supports multimodal understanding and graphical interface operations, enabling it to handle real-world tasks such as hotel bookings and receipt organization [2] - The Seedance 1.0 pro model allows for text and image input to generate high-quality 1080P videos with seamless multi-angle transitions and stable motion [2] - Doubao 1.6 introduces a pricing model based on "input length" intervals, offering a cost that is only one-third of the previous Doubao 1.5 deep thinking model or DeepSeek R1 [2] Market Positioning - Volcano Engine does not engage in "loss-leading" strategies to gain market share, maintaining a focus on ensuring gross margins [3] - The company emphasizes that while initial low pricing can attract users, the key factors for user attraction now lie in model performance, stability, and throughput [3]
金融助力科创,涌现更多“DeepSeek时刻”
6月18日上午,中国证监会主席吴清在2025陆家嘴论坛上宣布,将在科创板推出深化改革的"1+6"政策措 施,更加精准服务技术有较大突破、商业前景广阔、持续研发投入大的优质科技企业。 风险资本在我国过去数十年的科技产业发展中起到重要的作用。但在2023~2024年期间,由于美联储提 高利率以及全球经济衰退风险加剧,流入我国的美元风险资本减少。与此同时,国内IPO节奏阶段性收 紧,一些企业赴美上市定价过低,港股市场中小股票流动性不足,在一定程度上影响了部分创投机构通 过IPO退出与资金回流,国内风险投资活动有所减少。 但是,在我国当前经济发展阶段,以人工智能、生物科技、商业航天、低空经济等为代表的前沿技术与 新兴产业蓬勃发展,技术突破正向市场应用加快转化,这对加快构建与之相适配的金融服务体系提出了 新的更高要求。因此,及时重启第五套上市标准并扩大适用范围,积极吸引包括社保基金、保险资金、 产业资本等在内的各类社会资本参与私募股权投资,才能更好促进科技创新和产业创新融合发展,实现 科技、资本、产业良性循环。 我国诸多初创企业在互联网革命期间最终成长为巨头,比如高盛最近推出的中国十大民营上市公司中有 一半是互联网公司 ...
巧用DeepSeek构建多元资产配置框架!“最会用AI做研究的策略首席”王开教你”新套路”
Hua Er Jie Jian Wen· 2025-06-18 12:42
Core Insights - The emergence of DeepSeek in 2025 is revolutionizing the financial industry by enhancing market prediction models with its dynamic self-correction capabilities and advanced data mining abilities [1][10] - Traditional market prediction models often suffer from fixed weight configurations, leading to distorted judgment results, which DeepSeek aims to address [1][10] Group 1: Impact on Financial Industry - DeepSeek's dynamic self-correction ability optimizes weight based on historical data and current realities, improving prediction accuracy [1] - The model's data mining capabilities allow for the discovery of more relevant data, breaking linear thinking and avoiding "black box" issues [1] - DeepSeek enhances overall strategy intelligence through its powerful reasoning and complex decision-making capabilities [1] Group 2: Educational Initiatives - Guosen Securities has reported a 0.27% increase in annualized returns and a 1.08-fold increase in the Sharpe ratio after integrating DeepSeek into their simulation trading [3] - A masterclass titled "DeepSeek Restructures Strategy Investment Paradigm" has been launched to educate users on utilizing DeepSeek for investment [3][7] - The course, led by Wang Kai, covers various topics including asset allocation optimization, risk parity strategies, and understanding policy semantics [3][11] Group 3: Course Content and Structure - The masterclass is divided into eleven parts, focusing on practical techniques for asset allocation and investment strategies using DeepSeek [3][11] - Key topics include the application of AI in multi-asset frameworks, recreating classic investment portfolios, and understanding market timing and sector rotation [11][12] - The course aims to provide insights into the behavior logic behind key financial institution statements and the implications for investment strategies [11][12]
MiniMax追着DeepSeek打
Jing Ji Guan Cha Wang· 2025-06-18 11:32
Core Viewpoint - MiniMax has launched its self-developed MiniMax M1 model, which competes directly with DeepSeek R1 and Google's Gemini 2.5 Pro in terms of key technical specifications, architecture design, context processing capabilities, and training costs [1][2]. Group 1: Model Specifications - MiniMax M1 supports a context length of 1 million tokens, which is 8 times larger than DeepSeek R1's 128,000 tokens and only slightly behind Google's Gemini 2.5 Pro [1]. - The total parameter count for MiniMax M1 is 456 billion, with 45.9 billion parameters activated per token, while DeepSeek R1 has a total of 671 billion parameters but activates only 37 billion per token [1]. Group 2: Cost Efficiency - MiniMax M1 consumes only 25% of the floating-point operations compared to DeepSeek R1 when generating 100,000 tokens, and requires less than half the computational power for inference tasks of 64,000 tokens [2]. - The training cost for MiniMax M1 was only $535,000, significantly lower than the initial expectations and much less than the $5-6 million GPU cost for training DeepSeek R1 [2]. Group 3: Pricing Strategy - MiniMax M1 has a tiered pricing model for its API services based on the number of input or output tokens, with the first tier charging 0.8 yuan per million input tokens and 8 yuan per million output tokens, which is lower than DeepSeek R1's pricing [3]. - The pricing for the first two tiers of MiniMax M1 is lower than that of DeepSeek R1, and the third tier for long text is currently not covered by DeepSeek [3]. Group 4: Technology Innovations - MiniMax M1's capabilities are supported by two core technologies: the linear attention mechanism (Lightning Attention) and the reinforcement learning algorithm CISPO, which enhances efficiency and stability in training [2].
科、创两板打通未盈利企业IPO之路, 一揽子新政精准服务“DeepSeek时刻”
21世纪经济报道 实习生 张长荣 记者 崔文静 北京报道 "将进一步全面深化资本市场改革开放,推动科技 创新和产业创新融合发展迈上新台阶。"中国证监会主席吴清在6月18日召开的2025陆家嘴论坛上表示。 在本次论坛上,吴清宣布多项重磅措施,包括重启未盈利企业适用科创板第五套标准上市及6项科创板 改革措施,在创业板正式启用第三套标准,统筹推进投融资综合改革和投资者权益保护等。 吴清提到,当前,新一轮科技革命和产业变革加速演进。从我国看,科技创新正在从点状突破向系统集 成加快推进,诸多领域都迎来激动人心的"DeepSeek时刻"。 "科技创新、产业创新和资本市场发展相辅相成、相互成就。"吴清表示,一方面,资本市场具有独特的 风险共担、利益共享的激励相容机制。另一方面,资本市场通过对关键要素和资产定价,可以激发企业 家精神和人才创新创造活力,更好服务传统产业升级、新兴产业壮大和未来产业培育。而资本市场在有 力服务科技创新和产业转型升级过程中,也反过来促进了自身结构、效率和投资价值的不断改善。 将推出进一步深化科创板改革的"1+6"政策措施 科技创新的不断推进对加快构建与之相适配的金融服务体系提出了更高要求。 吴清表 ...
MiniMax新模型对标DeepSeek;豆包上线AI播客;美参院通过稳定币法案
Guan Cha Zhe Wang· 2025-06-18 00:49
Group 1: MiniMax and AI Developments - MiniMax announced the release of its first open-source inference model, MiniMax-M1, which competes with leading models like DeepSeek-R1 and Qwen3 [1] - The training process for MiniMax-M1 was completed in just three weeks using 512 H800 GPUs, with a computing rental cost of approximately $534,700, which is an order of magnitude lower than initial expectations [1] Group 2: AI Features and Upgrades - Doubao launched an AI podcast feature on its desktop version, allowing users to generate dialogue podcasts from uploaded PDFs or web links [3][4] - Baidu introduced the industry's first dual digital human interactive live broadcast room, enhancing marketing conversion and user experience [4] Group 3: Corporate Changes and AI Strategy - Apple’s AI and machine learning strategy senior vice president, John Giannandrea, is reportedly being sidelined due to slow progress in AI projects and misalignment with other executives [5] - Amazon's CEO Andy Jassy indicated that the company expects a reduction in its workforce in the coming years as AI tools and smart agents become more prevalent [6] Group 4: Financial Moves and Investments - SoftBank raised approximately $4.8 billion by selling shares of T-Mobile, which will fund its ambitious AI plans, including a potential investment of up to $30 billion in OpenAI [6] - Shanghai Zhaoxin Integrated Circuit Co., Ltd. received approval for its IPO on the Sci-Tech Innovation Board, aiming to raise 4.169 billion yuan for various processor projects [8] Group 5: Regulatory Developments - The U.S. Senate passed the "Genius Act," establishing a regulatory framework for stablecoins, marking a significant step in cryptocurrency legislation [7] - JD.com aims to apply for stablecoin licenses in major currency countries to reduce cross-border payment costs by 90% and improve efficiency [7] Group 6: Market Trends and Innovations - Miniso's founder discussed a four-step methodology for IP operation, emphasizing the importance of having a closed-loop system for successful product distribution [9] - China's new generation crewed spacecraft "Dream Boat" successfully completed a zero-height escape flight test, marking a significant milestone in the country's lunar exploration efforts [9]
DeepSeek R1-0528在WebDev竞技场与Claude Opus 4并列第一
news flash· 2025-06-17 23:00
Core Insights - The latest ranking from LMArena highlights DeepSeek R1-0528 as a top performer, sharing the first position with Google Gemini 2.5 0605 and Claude opus 4 [1] Group 1 - DeepSeek R1-0528 excels in overall performance, ranking first alongside Google Gemini 2.5 0605 and Claude opus 4 [1] - In specific categories, DeepSeek ranks 6th in comprehensive text capabilities, 2nd in programming, 4th in high-difficulty prompts, and 5th in mathematics [1] - The model is noted for being the strongest open-source model currently available, under the MIT open-source license [1]
Kimi超过DeepSeek的新模型被指“套壳”Qwen?到底怎么回事儿
Hu Xiu· 2025-06-17 12:15
Core Viewpoint - The release of the open-source model Kimi-Dev-72B by Moonlight Dark Side has set a new record in software engineering task benchmarks, achieving a score of 60.4% on SWE-bench Verified, surpassing several competitors including DeepSeek [1][3]. Model Development - Kimi-Dev-72B is based on the Qwen/Qwen2.5-72B model, indicating it is not a completely original model but rather a fine-tuned version utilizing a large dataset of GitHub issues and PR submissions for training [2][3]. - The innovative aspect of Kimi-Dev lies in its training methodology, which employs large-scale reinforcement learning to autonomously fix real code repository issues within a Docker environment [3]. Licensing and Compliance - Kimi-Dev-72B is released under the MIT license, but it must comply with the original licensing restrictions of Qwen-2.5-72B, which is governed by the Qwen LICENSE AGREEMENT [4][5]. - The licensing controversy stems from questions about whether Moonlight Dark Side obtained special permission to use Qwen-2.5-72B, as the licensing agreement stipulates commercial licensing requirements when monthly active users exceed 100 million [6][7]. Community Response - The Qwen team clarified that they did not grant permission for the use of Qwen-2.5-72B, but later described the issue as a "legacy problem" related to their evolving licensing strategy [8][10]. - The Qwen team has transitioned to a more open licensing model with the upcoming Qwen3 series, adopting the Apache 2.0 protocol for all models, which aims to foster a more open and active AI ecosystem [12][13]. Industry Implications - The case illustrates a shift in the AI industry towards open-source collaboration, moving from restrictive licensing to more open models to encourage developer engagement and innovation [16][18]. - The rising trend of "second innovation" based on strong foundational models highlights the importance of differentiation in value creation within the open-source ecosystem [16].
MiniMax发布开源混合架构推理模型M1,M1所需的算力仅为DeepSeek R1的约30%
news flash· 2025-06-17 08:32
Core Insights - MiniMax, an AI unicorn based in Shanghai, has officially launched the open-source inference model MiniMax-M1 (referred to as "M1") [1] - M1 is claimed to be the world's first large-scale mixed attention inference model with open weights [1] - The model combines the Mixture-of-Experts (MoE) architecture with Lightning Attention, achieving significant breakthroughs in performance and inference efficiency [1] - Test data indicates that the M1 series surpasses most closed-source models in long context understanding and code generation productivity scenarios, with only a slight gap behind top closed-source systems [1]
MiniMax开源首个推理模型,456B参数,性能超DeepSeek-R1,技术报告公开
3 6 Ke· 2025-06-17 08:15
Core Insights - MiniMax has launched the world's first open-source large-scale hybrid architecture inference model, MiniMax-M1, with a five-day continuous update plan [2] Model Specifications - The M1 model has a parameter scale of 456 billion, activating 45.9 billion parameters per token, supporting 1 million context inputs and the longest 80,000 token inference output in the industry, which is 8 times that of DeepSeek-R1 [4] - Two versions of the MiniMax-M1 model were trained with thinking budgets of 40k and 80k [4] Training and Cost - The training utilized 512 H800 units over three weeks, costing approximately $537,400 (around 3.859 million RMB), which is an order of magnitude lower than initial cost expectations [7] - The M1 model is available for unlimited free use on the MiniMax app and web [7] API Pricing Structure - The API pricing for M1 is tiered based on input length: - 0-32k input: 0.8 RMB/million tokens input, 8 RMB/million tokens output - 32k-128k input: 1.2 RMB/million tokens input, 16 RMB/million tokens output - 128k-1M input: 2.4 RMB/million tokens input, 24 RMB/million tokens output [7][11] - Compared to DeepSeek-R1, M1's first tier input price is 80% and output price is 50% of DeepSeek-R1's, while the second tier input price is 1.2 times higher [9] Performance Evaluation - MiniMax-M1 outperforms other models like DeepSeek-R1 and Qwen3-235B in complex software engineering, tool usage, and long context tasks [13][14] - In the MRCR test, M1's performance is slightly lower than Gemini 2.5 Pro but better than other models [13] - In the SWE-bench Verified test set, M1-40k and M1-80k perform slightly worse than DeepSeek-R1-0528 but better than other open-source models [14] Technical Innovations - M1 employs a mixed expert (MoE) architecture and a lightning attention mechanism, allowing efficient scaling for long input and complex tasks [16] - The model utilizes large-scale reinforcement learning (RL) for training, with a new CISPO algorithm that enhances performance by optimizing importance sampling weights [16][17] Future Directions - MiniMax emphasizes the need for "Language-Rich Mediator" agents to handle complex scenarios requiring dynamic resource allocation and multi-round reasoning [19]