DeepSeek

Search documents
梁文锋和杨植麟再“撞车”
虎嗅APP· 2025-05-04 08:29
Core Viewpoint - The article discusses the competitive landscape of large model development in China, focusing on the advancements and challenges faced by companies like DeepSeek and Kimi, as well as the impact of larger tech firms like Alibaba and Tencent on the market [2][4][12]. Group 1: Model Developments - DeepSeek launched its new model, DeepSeek-Prover-V2, with a parameter scale of 671 billion, significantly larger than the previous version's 7 billion, resulting in improved efficiency and accuracy in mathematical tasks [2][9]. - Kimi, developed by the Moonlight team, also released a model for formal theorem proving, with a smaller parameter scale of 1.5 billion and 7 billion, achieving an 80.7% pass rate in miniF2F tests [2][3]. - The evolution of DeepSeek's models is synchronized, with a timeline of updates from Prover series models starting in March 2024 to the latest Prover-V2 in April 2025 [8][9]. Group 2: Competitive Landscape - DeepSeek faces increasing competition from Alibaba's new model Qwen3, which is touted as a hybrid reasoning model with superior performance despite having only one-third the parameters of DeepSeek's R1 model [14][15]. - Kimi has seen rapid growth, reaching 20 million monthly active users within a year, but is now challenged by Tencent's Yuanbao, which has surpassed Kimi in user numbers due to aggressive marketing [12][13]. - The article highlights the need for multiple leading models in the Chinese market, suggesting that competition and innovation should be encouraged rather than focusing on a single dominant player [14][15]. Group 3: Future Directions - DeepSeek's founder has indicated a focus on three paths for achieving AGI: mathematics and code, multimodal learning, and natural language processing, viewing mathematics as a verifiable system for high intelligence [7]. - The upcoming R2 model is expected to enhance reinforcement learning capabilities, while the V4 model may involve a longer development cycle due to significant changes in pre-training methods [10][11].
DeepSeek开源的文件系统,是如何提升大模型效率的?
机器之心· 2025-05-04 04:57
Core Viewpoint - DeepSeek has open-sourced a high-performance distributed file system called 3FS, aimed at addressing the challenges of AI training and inference workloads, significantly enhancing data access efficiency for large models [3][4]. Group 1: Overview of 3FS - 3FS (Fire-Flyer File System) is designed to leverage modern SSDs and RDMA networks to accelerate data access operations on the DeepSeek platform [7]. - The system can achieve an aggregate read throughput of 6.6 TiB/s across a 180-node cluster, improving efficiency in data preprocessing, dataset loading, checkpoint saving/loading, embedding vector search, and KVCache lookup for large models [3]. Group 2: Distributed File System Functionality - A distributed file system deceives applications into thinking they are interacting with a local file system, allowing for seamless operations across multiple machines [9][10]. - The advantages of distributed file systems include handling massive data (up to PB level), high throughput beyond single-machine capabilities, fault tolerance, and redundancy [11]. Group 3: Components of 3FS - 3FS consists of four main node types: parallel processing framework, machine learning training pipeline, internal large code/data repository, and industry-specific applications [12]. - The components include: - **Meta**: Manages metadata such as file locations and attributes [19]. - **Mgmtd**: Controls cluster configuration and node discovery [19]. - **Storage**: Manages actual file data on physical disks [30]. - **Client**: Communicates with other nodes to perform file operations [19]. Group 4: CRAQ Protocol - CRAQ (Chain Replication with Apportioned Queries) is a protocol used in 3FS to ensure strong consistency and fault tolerance [36]. - Write operations are processed sequentially along a chain of nodes, with each entry marked as "dirty" until it is committed and marked as "clean" [38][41]. - The performance of CRAQ varies based on workload, with write throughput and latency being limited by the slowest node in the chain [47]. Group 5: Comparison with Other Systems - 3FS shares common components with other distributed file systems but differs in its implementation and performance characteristics [54]. - The system's performance is still under evaluation, with limited benchmarking available for comparison with single-node systems and other distributed file systems [55].
好工作和好男人一样,不在市面上流通
36氪· 2025-05-03 10:25
以下文章来源于职场Bonus ,作者文华 自媒体生活表面光鲜,背地里是无可抑制的焦虑:捉摸不透的平台规则,起起伏伏的流量以及按百计算的变现。 她决定回去上班,毕竟"旱涝保收",生活作息也能被校对回"正轨"。 在求职平台投了100来封简历,曾经吃香的电商运营背景,现在居然鲜有问津。 职场Bonus . 职场看红利,事业不焦虑。36氪职场风向报道,像投资人一样寻找未来职业机遇。 每个时代都有自己的红利。 文 | 文华 来源| 职场Bonus(ID:ZhiChangHonhLi) 封面来源 | 《东京女子图鉴》剧照 米蓝找工作大半年了。 两年前,她从某电商大厂离职,做起了小红书博主,勤勤恳恳积累到3万粉丝。 能收到回复的,要么是电商的外包岗,要么是询问要不要做保险经纪人。 可投的岗位,来来回回就那些,从年前挂到年后。 "可能,好工作和好男人一样,都不在市面上流通" 。 米蓝不得不放下脸面,转而找曾经的大厂前同事们,寻求内推。但热情的应答后,总是不了了之。 "可能是gap太久了",也可能是"快30岁了",米蓝猜测,"公司肯定选择更年轻的,还不用承担婚育成本"。 幸亏还有自媒体事业兜底,工作慢慢找吧,她也佛了。 Wen ...
一周热榜精选:非农意外表现强劲,美日关税谈判未有共识!
Jin Shi Shu Ju· 2025-05-02 13:25
Market Overview - The US dollar index is expected to record a second consecutive week of gains, benefiting from eased concerns over the global trade war, recovering above the 100 mark for the first time since April 16 [1] - Spot gold has recorded a second consecutive week of declines, trading at $3344 per ounce due to reduced safe-haven demand and profit-taking ahead of the Labor Day holiday [1] Currency Performance - Non-USD currencies such as the euro and Australian dollar have seen gains against the US dollar for the fourth consecutive month due to the dollar's decline [3] Oil Market - International oil prices have dropped significantly, with Brent crude oil down approximately 18% for April, influenced by the US-led trade war impacting economic growth and energy demand [6] - Saudi Arabia has expressed reluctance to further cut supply to support oil prices, leading to a sharp decline in oil prices, although a subsequent threat from Trump regarding sanctions on Iranian oil buyers caused a rebound [6] Stock Market - The S&P 500 index has achieved its best eight-day performance in over three years, driven by strong earnings from tech companies like Microsoft and Meta, alleviating fears over tariff impacts [10] - Overall, the Dow Jones Industrial Average fell by 3.17% in April, marking its third consecutive monthly decline, while the Nasdaq rose by 0.85% [10] Investment Bank Insights - Deutsche Bank noted that despite market recovery, US assets still face resistance from foreign buyers [13] - Morgan Stanley highlighted uncertainty in tariff policies and the independence of the Federal Reserve, which may lead to reduced foreign investment in the US [13] - Barclays recommended investors to re-establish long positions in five-year US Treasuries [13] Economic Data - The US economy showed signs of fatigue, with consumer spending growth at a two-year low and a surprising contraction in GDP for Q1 2025 [16][17] - Non-farm payroll data for April showed an increase of 177,000 jobs, exceeding expectations, while the unemployment rate remained at 4.2% [17][18] Trade Developments - Trump signed an executive order exempting imported cars and parts from steel and aluminum tariffs, aiming to alleviate pressure on the US auto industry [19] - Ongoing trade negotiations with Japan have yet to reach consensus, with Japan opposing US proposals on tariffs [19][20] Ukraine and Mineral Agreement - The US and Ukraine have signed a mineral agreement to establish a reconstruction investment fund, emphasizing joint energy development without addressing Ukraine's debt issues [21][22] Oil Sanctions on Iran - The US has intensified sanctions on Iranian oil, warning countries and individuals to cease purchases or face secondary sanctions [23] Saudi Oil Supply Strategy - Saudi Arabia has indicated a shift in strategy, no longer willing to cut oil supply to support prices, potentially increasing production to gain market share [24] Corporate Developments - Elon Musk is gradually stepping back from his role in the White House, while Tesla's board remains confident in his leadership despite stock price declines [25] - The Bank of Japan maintained its interest rate but lowered GDP growth forecasts due to global trade uncertainties [26][27] Gold Demand - The World Gold Council reported that global gold demand in Q1 2025 reached its highest level since 2016, driven by significant inflows into gold ETFs [28]
做空英伟达的时机到了么?
美股研究社· 2025-05-02 10:26
Core Viewpoint - The market reaction to DeepSeek's rise should not lead to the unreasonable selling of Nvidia stocks, as the situation is not as dire as perceived [1]. Group 1: Market Perception and Competition - Prior to the release of DeepSeek's R1 model, there was a widespread belief that China lagged significantly behind the US in AI, with Eric Schmidt stating a 2-3 year lead for the US due to chip bans and investment disparities [2]. - DeepSeek's previous models failed to gain traction, but the R1 model demonstrated that advanced models could be developed using older GPUs, which could lead to increased GPU demand due to wider AI adoption [3]. - Nvidia's sales distribution shows that only 47% of its revenue comes from the US, indicating the importance of other regions like Singapore, which serves as a billing hub rather than a primary shipping destination [6][7]. Group 2: Risks and Developments - The ban on Nvidia's H20 and A100 chips for China poses a risk, as DeepSeek reportedly owns around 10,000 A100 chips, acquired through significant investments from the High-Flyer Quant Fund [9]. - China is investing heavily in developing its own chips to reduce reliance on Nvidia, which could potentially account for about 20% of Nvidia's sales if successful [10]. - DeepSeek is reportedly using Huawei's Ascend 910B chips for its upcoming R2 model, which could disrupt Nvidia's market position if confirmed [12][15]. Group 3: Future Implications - If DeepSeek announces the use of Huawei chips for R2, it could lead to a significant drop in Nvidia's stock price, similar to the reaction following the R1 release [16]. - The potential for Nvidia's stock to decline is high, given the current market dynamics and the possibility of DeepSeek's shift to local chip suppliers [17].
AI圈惊天丑闻,Meta作弊刷分实锤?顶级榜单曝黑幕,斯坦福MIT痛斥
猿大侠· 2025-05-02 04:23
转自:新智元 编辑:编辑部 ZJH 【导读】 刚刚,LMArena陷入了巨大争议,斯坦福MIT和Ai2等的研究者联手发论文痛斥,这个排行榜已经被Meta 等公司利用暗中操作排名!Karpathy也下场帮忙锤了一把。而LMArena官方立马回应:论文存在多处错误,指控不 实。 已经有越来越多的人发现:大模型排行榜LMArena,可能已经被大厂们玩坏了! 就在最近,来自Cohere、普林斯顿、斯坦福、滑铁卢、MIT和Ai2等机构的研究者,联手祭出一篇新论文,列出详尽论 据,痛斥AI公司利用LMArena作弊刷分,踩着其他竞争对手上位。 论文地址:https://arxiv.org/abs/2504.20879 与此同时,AI大佬、OpenAI创始成员Andrej Karpathy也直接下场,分享了一段自己的亲身经历。 前一段时间,Gemini模型一度在LMArena排名第一,远超第二名。 但Karpathy切换使用后,感觉还不如他之前用的模型。 相反,大约在同一时间,他的个人体验是Claude 3.5是最好的,但在LMArena上的排名却很低。 | Rank* (UB) A | Model | Arena Sco ...
港股异动 | AI概念股活跃 国内人工智能大模型领域近期动作不断 机构看好行业迎来黄金发展期
智通财经网· 2025-05-02 03:08
智通财经APP获悉,AI概念股早盘普遍活跃,截至发稿,金山软件(03888)涨4.92%,报40.5港元;小米 集团-W(01810)涨4.3%,报52.1港元;阿里巴巴-W(09988)涨3.23%,报121.3港元;万国数据-SW(09698) 涨3.16%,报26.1港元;微盟集团(02013)涨2.38%,报1.72港元;腾讯控股(00700)涨2.01%,报486.8港 元。 消息面上,国内人工智能大模型领域近期动作频频。4月30日,DeepSeek在AI开源社区Hugging Face上 发布了新模型DeepSeek-Prover-V2-671B。该模型采用DeepSeek-V3架构,参数规模高达6710亿,使用 MoE模式,并具备61层Transformer层和7168维隐藏层。此前,小米大模型团队通过"Xiaomi MiMo"公众 号宣布推出专注推理能力的开源大模型Xiaomi MiMo。4月29日,阿里巴巴开源新一代通义千问模型 Qwen3。阶跃星辰、百度、可灵等也有新模型相继推出。 平安证券近期研报指出,当前,我国大模型产业发展势头良好。以DeepSeek系列大模型为代表的国产 大模型性能 ...
宝马中国宣布接入DeepSeek,宝马妥协了?
3 6 Ke· 2025-05-02 02:21
Core Viewpoint - BMW China is embracing local AI technology by integrating DeepSeek, marking a significant step in its digital transformation strategy and enhancing its AI capabilities in the Chinese market [1][3][6] Group 1: BMW's AI Integration - BMW has announced the integration of DeepSeek into its operations, which will enhance the BMW Intelligent Personal Assistant and improve human-machine interaction in new models starting from Q3 2025 [1][2] - The collaboration with DeepSeek follows BMW's earlier partnership with Alibaba to develop AI language models, showcasing BMW's commitment to local AI ecosystem development [1][3] Group 2: Strategic Importance of Local AI - This move signifies BMW's recognition of the importance of local AI technologies and its willingness to adapt to the rapidly evolving Chinese automotive market [3][4] - BMW's previous initiatives, such as the launch of a 360-degree AI strategy and the development of intelligent systems like "Car Expert" and "Travel Companion," reflect its ongoing efforts to enhance its smart vehicle offerings [3][4] Group 3: Challenges and Opportunities - Despite its historical strengths in manufacturing and brand image, BMW faces challenges in keeping pace with the increasing demand for smart and connected vehicles [4][5] - The partnership with DeepSeek is seen as a strategic decision to accelerate BMW's digital transformation and leverage the advanced technologies and innovative models from Chinese tech companies [4][6]
互联网大厂五一前密集开源新模型,布局各异谁将留在牌桌?
Nan Fang Du Shi Bao· 2025-05-01 14:12
Core Insights - Major domestic AI model companies are rapidly open-sourcing their models ahead of the May Day holiday, with Alibaba releasing Qwen3, Xiaomi launching Xiaomi MiMo, and DeepSeek introducing DeepSeek-Prover-V2 [1][2][5] Alibaba - Alibaba's Qwen3 features two MoE models with 30B and 235B parameters, and six dense models ranging from 0.6B to 32B, achieving state-of-the-art performance in its category [2] - Qwen3 is the first "hybrid reasoning model" in China, integrating fast and deep thinking capabilities, significantly reducing computational power consumption [5] - Alibaba has consistently open-sourced various models this year, including the 14B video generation model and the 7B multimodal model, aiming to leverage open-source models for AI applications while monetizing its cloud services [6] Xiaomi - Xiaomi's MiMo model, with only 7B parameters, outperformed OpenAI's closed-source model o1-mini in public benchmarks for mathematical reasoning and coding competitions [6] - This marks Xiaomi's first foray into open-sourcing its models, developed by its newly established Core team [6] DeepSeek - DeepSeek has released two versions of DeepSeek-Prover-V2, focusing on mathematical theorem proving and achieving significant performance improvements in benchmark tests [8] - The new models support extensive context inputs and are based on previous versions, showcasing a commitment to enhancing reasoning capabilities [8] Industry Trends - The open-sourcing of models by these companies is seen as a strategic move to enhance competitiveness against closed-source models from companies like OpenAI and Anthropic, which still hold a slight performance edge [9][10] - Industry experts predict a consolidation in the AI model sector, with DeepSeek, Alibaba, and ByteDance emerging as the leading players in China, while the U.S. market remains competitive with companies like xAI and OpenAI [10][11] - The open-source models are expected to democratize AI technology, making it more accessible and promoting innovation across various industries [9][10]
AI圈顶级榜单曝黑幕,Meta作弊刷分实锤?
虎嗅APP· 2025-05-01 13:51
本文来自微信公众号: 新智元 ,作者:新智元,编辑:ZJH,原文标题:《AI圈惊天丑闻,Meta作弊刷分实锤?顶级榜单曝黑幕,斯坦福MIT痛 斥》,题图来自:AI生成 有越来越多的人发现:大模型排行榜LMArena,可能已经被大厂们玩坏了! 就在最近,来自Cohere、普林斯顿、斯坦福、滑铁卢、MIT和Ai2等机构的研究者,联手祭出一篇新论文,列出详尽论据,痛斥AI公司利用LMArena作 弊刷分,踩着其他竞争对手上位。 论文地址: https://arxiv.org/abs/2504.20879 与此同时,AI大佬、OpenAI创始成员Andrej Karpathy也直接下场,分享了一段自己的亲身经历。 前一段时间,Gemini模型一度在LMArena排名第一,远超第二名。 但Karpathy切换使用后,感觉还不如他之前用的模型。 相反,大约在同一时间,他的个人体验是Claude 3.5是最好的,但在LMArena上的排名却很低。 | Rank* (UB) | A Model | Azena | A 95% CI | ﻪ Votes | 4 Organization | 4 License A | | -- ...