Workflow
DeepSeek V3模型
icon
Search documents
知情人士:DeepSeek将于2月发布其最新旗舰AI模型
Xin Lang Cai Jing· 2026-01-09 13:33
据两位直接知情人士透露,深度求索(DeepSeek)预计将在未来几周内推出新一代旗舰级人工智能模 型,该模型主打强劲的代码生成能力。 两位知情人士表示,这款代号为V4的新模型,是DeepSeek于2024年12月发布的V3模型的迭代版本。 DeepSeek员工基于公司内部基准开展的初步测试显示,该模型在代码生成领域的表现优于Anthropic、 Claude、生成式预训练变换器系列(OpenAI GPT)等现有主流模型。 上述人士透露,DeepSeek计划于2月中旬农历新年前后推出V4模型,不过具体时间仍可能调整。 截至目前,DeepSeek未就此事回应置评请求。 两位知情人士表示,V4模型在超长代码提示词的处理与解析方面实现了技术突破,这对于从事复杂软 件项目开发的工程师而言,具备显著的应用优势。此外,该模型在训练全流程中对数据模式的理解能力 也得到优化,且未出现性能衰减的问题。 人工智能模型的训练需要基于海量数据集开展反复学习,但在多轮次训练过程中,数据模式的识别精度 往往会出现衰减。拥有大规模人工智能芯片集群的研发机构,通常可以通过增加训练轮次来解决这一问 题。 其中一位知情人士表示,用户或许会发现,V ...
知情人士:DeepSeek将于2月发布其最新旗舰AI模型。
Xin Lang Cai Jing· 2026-01-09 13:23
据两位直接知情人士透露,深度求索(DeepSeek)预计将在未来几周内推出新一代旗舰级人工智能模 型,该模型主打强劲的代码生成能力。 两位知情人士表示,这款代号为V4的新模型,是DeepSeek于2024年12月发布的V3模型的迭代版本。 DeepSeek员工基于公司内部基准开展的初步测试显示,该模型在代码生成领域的表现优于Anthropic、 Claude、生成式预训练变换器系列(OpenAI GPT)等现有主流模型。 上述人士透露,DeepSeek计划于2月中旬农历新年前后推出V4模型,不过具体时间仍可能调整。 V3模型的推出帮助DeepSeek在全球人工智能领域崭露头角,而R1模型的发布则震动了硅谷与华尔街, 一举将DeepSeek推向全球舞台。R1是一款开源"推理型"模型,其设计逻辑是在给出答案前,先针对用 户的查询需求进行深度"思考",以此解决复杂问题。该模型之所以引发广泛关注,是因为相较于美国研 发的头部模型,DeepSeek投入的训练成本相对较低,但模型性能却十分亮眼。 在国内市场,DeepSeek还推出了一款融合R1与V3双模型能力的聊天机器人,该产品迅速走红。 据两位直接知情人士透露,深度求 ...
免费还是收费?互联网的赚钱套路,模式的本质分野
Sou Hu Cai Jing· 2025-12-07 21:28
从Adobe到Salesforce,人家的付费文化沉淀了几十年,Netflix、Spotify的用户甚至会主动花钱去广告, 就图个干净省心。 用户心思不一样,企业赚钱的路子自然天差地别。咱们国内互联网公司玩得最溜的,是"前台免费拉 人,后台赚钱补窟窿"。 国内企业在这方面堪称"成本控制大师":错峰调度减少服务器压力,缓存优化降低带宽开销, DeepSeek的V3模型研发只花了557万美元,还不到同级西方模型的十分之一。 模式分野的底层逻辑 说白了就是先把用户圈进来,挤走那些收费的对手,后期再靠企业API、高级功能慢慢变现,这在互联 网初期绝对是最懂本土市场的招儿。西方企业走的则是"分层收费,卖的就是好体验"的路子。 ChatGPT不光收个人用户的会员费,更靠B端的API调用、定制方案赚大头。高定价背后是真金白银的 成本——全球服务的服务器集群、顶尖团队的工资、天文数字的电费,都得靠付费覆盖。 但这种模式的好处是,收费本身就筛出了高质量用户,加上人家成熟的信用卡支付、强知识产权意识, 这条路走得很稳。 这还不算完,政策这只手也在推波助澜。2024年OpenAI封了中国的API服务,国产模型立马借"备案 制"抢市 ...
大学讲堂| 未可知 x 路易斯大学: 杜雨博士《AI与未来叙事》跨文化传播课程
近日,未可知人工智能研究院院长杜雨博士受邀走进意大利路易斯・圭多・卡利大学(Luiss ROMA)跨文化传播课堂,以 " AI 与叙事的未来:导航 媒体、新闻与战略传播新纪元 " 为主题展开专题分享。 作为意大利知名的综合性大学,Luiss ROMA 以社会科学、商科与传播领域研究见长,其跨文化传播课程聚焦全球视野下的媒介变革与交流创新,此次 邀请杜雨博士分享,旨在为师生搭建中西方 AI 传播实践的对话桥梁。 ▲ 戳蓝 色字关注我们! 作为人工智能领域的资深专家、中国生成式人工智能数据应用合规指南起草人,杜雨博士的分享围绕 AI 在中国的发展实践、对商业传播的转型影响及 传媒行业的创新应用三大核心展开,为现场师生呈现了一场兼具全球视野与本土洞察的知识盛宴。 中国 AI 发展 从技术浪潮到生态重构,DeepSeek 成破局关键 分享伊始,杜雨博士系统梳理了中国 AI 产业的发展脉络。 他指出,中国 AI 行业已历经 "计算机视觉四小龙" 与 "大语言模型六小虎" 两轮发展热潮,大语言模型的崛起直接推动中国 AI 市场规模持续扩容,目 前全球市场份额已达 20%。 在 "AI+" 国家战略引领下,互联网、电信、金 ...
一文了解DeepSeek和OpenAI:企业家为什么需要认知型创新?
混沌学园· 2025-06-10 11:07
Core Viewpoint - The article emphasizes the transformative impact of AI technology on business innovation and the necessity for companies to adapt their strategies to remain competitive in the evolving landscape of AI [1][2]. Group 1: OpenAI's Emergence - OpenAI was founded in 2015 by Elon Musk and Sam Altman with the mission to counteract the monopolistic power of major tech companies in AI, aiming for an open and safe AI for all [9][10][12]. - The introduction of the Transformer architecture by Google in 2017 revolutionized language processing, enabling models to understand context better and significantly improving training speed [13][15]. - OpenAI's belief in the Scaling Law led to unprecedented investments in AI, resulting in the development of groundbreaking language models that exhibit emergent capabilities [17][19]. Group 2: ChatGPT and Human-Machine Interaction - The launch of ChatGPT marked a significant shift in human-machine interaction, allowing users to communicate in natural language rather than through complex commands, thus lowering the barrier to AI usage [22][24]. - ChatGPT's success not only established a user base for future AI applications but also reshaped perceptions of human-AI collaboration, showcasing vast potential for future developments [25]. Group 3: DeepSeek's Strategic Approach - DeepSeek adopted a "Limited Scaling Law" strategy, focusing on maximizing efficiency and performance with limited resources, contrasting with the resource-heavy approaches of larger AI firms [32][34]. - The company achieved high performance at low costs through innovative model architecture and training methods, emphasizing quality data selection and algorithm efficiency [36][38]. - DeepSeek's R1 model, released in January 2025, demonstrated advanced reasoning capabilities without human feedback, marking a significant advancement in AI technology [45][48]. Group 4: Organizational Innovation in AI - DeepSeek's organizational model promotes an AI Lab paradigm that fosters emergent innovation, allowing for open collaboration and resource sharing among researchers [54][56]. - The dynamic team structure and self-organizing management style encourage creativity and rapid iteration, essential for success in the unpredictable field of AI [58][62]. - The company's approach challenges traditional hierarchical models, advocating for a culture that empowers individuals to explore and innovate freely [64][70]. Group 5: Breaking the "Thought Stamp" - DeepSeek's achievements highlight a shift in mindset among Chinese entrepreneurs, demonstrating that original foundational research in AI is possible within China [75][78]. - The article calls for a departure from the belief that Chinese companies should only focus on application and commercialization, urging a commitment to long-term foundational research and innovation [80][82].
小红书开源1420亿参数大模型,部分性能与阿里Qwen3模型相当
Tai Mei Ti A P P· 2025-06-10 01:07
Core Insights - Xiaohongshu has recently open-sourced its first self-developed large model, dots.llm1, through platforms like Github and Hugging Face [2][9] - The model has been trained using 11.2 trillion high-quality tokens, significantly outperforming the open-source TxT360 data [5] - Xiaohongshu's valuation has surged from $20 billion to $26 billion as of March 2023, surpassing the market values of companies like Bilibili and Zhihu [9] Model Performance - Dots.llm1 features a mixture of experts (MoE) model with 142 billion parameters, activating only 14 billion during inference to reduce costs while maintaining performance [3][5] - In various benchmarks, dots.llm1 shows competitive performance against Alibaba's Qwen models, particularly excelling in Chinese language tasks [7][8] - The model achieved a score of 92.6 on CLUEWSC and 92.2 on C-Eval, indicating industry-leading performance in Chinese semantic understanding [7] Training Efficiency - The hi lab team has implemented advanced training techniques, achieving a 14% improvement in forward computation and a 6.68% improvement in backward computation compared to NVIDIA's Transformer Engine [5] - Future plans include integrating more efficient architectural designs and exploring sparse MoE layers to enhance computational efficiency [10] Strategic Direction - Xiaohongshu is shifting focus from being merely a content community and live e-commerce platform to actively developing AI technologies, particularly large language models [9][10] - The company aims to deepen its understanding of optimal training data and explore methods to achieve human-like learning efficiency [11]
DeepSeek核心高管离职创业,瞄准Agent赛道|独家
Hu Xiu· 2025-06-09 08:24
Core Insights - A core executive from DeepSeek has left the company to start a new venture focused on the Agent sector, with plans to launch a product by Christmas 2025 [1] - The executive, previously serving as the CTO, left during a peak period for DeepSeek, raising questions about the timing of the departure [1][2] - The AI industry is witnessing a trend of high-level talent leaving established companies to pursue entrepreneurial opportunities, often leveraging their previous experience and reputation to secure funding [2][3] Company Developments - DeepSeek has recently released and open-sourced its V3 model and R1 inference model, marking a significant period of activity for the company [1] - There are ongoing speculations regarding DeepSeek's potential financing or IPO plans, especially following the recruitment of several financial positions [4] - Despite the recruitment of a CFO, insiders suggest that this is not related to immediate financing or IPO plans, indicating a cautious approach from DeepSeek's leadership [4] Industry Trends - The rapid pace of technological iteration in the AI sector creates numerous opportunities for startups, particularly for those with experienced talent from leading companies [3] - The scarcity of AI talent with core technical expertise makes these individuals highly competitive in the entrepreneurial landscape [3] - The trend of executives leaving large firms to innovate in more flexible environments is becoming a common occurrence in the AI industry [3]
DeepSeek再出手!R1升级版性能大提升,美国对手慌了?
Jin Shi Shu Ju· 2025-05-30 03:52
Core Insights - DeepSeek's R1 model has undergone a minor version upgrade, enhancing semantic understanding, complex logical reasoning, and long text processing stability [1] - The upgraded model shows significant improvements in understanding capabilities and programming skills, capable of generating over 1000 lines of error-free code [1] - The R1 model's cost-effectiveness is highlighted, being priced at 1/11 of Claude-3.7-Sonnet and 1/277 of GPT-4.5, while being open-source for commercial use [1] Group 1 - The R1 model has gained global attention since its January release, outperforming Western competitors and causing a drop in tech stocks [2] - Following the release of the V3 model, interest in DeepSeek has shifted towards the anticipated R2 model, which is expected to utilize a mixture of experts model with 1.2 trillion parameters [2] - The latest version R1-0528 has sparked renewed media interest, showcasing competitive performance against OpenAI's models in code generation [2] Group 2 - DeepSeek's low-cost, high-performance R1 model has positively influenced the Chinese tech stock market and reflects optimistic market expectations regarding China's AI capabilities [2] - The upgrade has also shown improvements in reducing hallucinations, indicating that DeepSeek is not only catching up but competing with top models [1]
早餐 | 2025年5月16日
news flash· 2025-05-15 23:16
Group 1 - Federal Reserve Chairman Powell indicated a reassessment of key components of the 2020 monetary policy framework, suggesting that long-term interest rates may rise and "supply shocks" could become the new normal [1] - The U.S. April PPI increased by 2.4% year-on-year, which was below expectations, while the month-on-month change was -0.5%, marking the largest decline in five years [1] - U.S. April retail sales rose by 0.1% month-on-month, slightly exceeding expectations, but signs of weak consumer spending are emerging [1] Group 2 - Trump signed a $200 billion commercial agreement with the UAE to collaborate on building a 5GW data center in the UAE [1] - Qatar's sovereign wealth fund plans to invest $500 billion in the U.S. over the next decade as part of a "gift package" from Trump [1] - Iran expressed willingness to reach an agreement with the U.S., with a senior advisor stating that Iran would commit to never developing nuclear weapons in exchange for the lifting of U.S. sanctions [1] Group 3 - Hamas officials stated that they would hand over control of the Gaza Strip if a permanent ceasefire is achieved [1] - Alibaba's Q4 revenue grew by 7% year-on-year, which was below expectations, while Alibaba Cloud accelerated growth at 18%, and AI revenue has seen triple-digit growth for seven consecutive quarters [1] - Meta announced a delay in the release of its flagship AI model Behemoth, resulting in a more than 3% drop in its stock price [1] Group 4 - CoreWeave received a 7% stake from Nvidia and is set to provide $4 billion in cloud computing capacity to OpenAI [1] - Berkshire Hathaway significantly reduced its bank stock holdings in Q1, completely exiting its position in Citigroup, while maintaining its stake in Apple and doubling its holdings in a beer manufacturer, with some positions remaining confidential [1] - Walmart's Q1 sales increased by 2.5%, slightly below expectations, with the CFO warning that tariff price increases may begin this month [1]
谷歌前CEO称,中美差距已终结
Sou Hu Cai Jing· 2025-05-09 06:41
Core Insights - The article highlights a significant shift in the perception of China's technological capabilities, with former Google CEO Eric Schmidt acknowledging that China has transitioned from a "follower" to a "runner" and even a "leader" in advanced technologies like AI [1][3]. Group 1: Technological Advancements - China has made notable breakthroughs in various sectors, including AI models, electric vehicles, and humanoid robots, despite U.S. sanctions on chip exports and technology [3][4]. - The DeepSeek V3 model has shown global leadership in non-inference testing, and companies like Xiaomi have successfully mass-produced electric vehicles, indicating a robust technological ecosystem [3][4]. Group 2: Resilience and Innovation - U.S. sanctions have inadvertently accelerated China's self-research, industry iteration, and talent development, leading to a more resilient and pragmatic technological ecosystem [3][6]. - China's ability to rapidly commercialize and scale technologies at lower costs is a key advantage, allowing for swift adoption of innovations across various sectors [4][6]. Group 3: Global Leadership Dynamics - Schmidt warns that the U.S. must abandon its complacent belief in its natural technological superiority, as historical shifts in technological leadership have altered global power dynamics [6][9]. - China aims to capture 45% of the global manufacturing market by 2030, supported by a complete industrial chain, a dense talent pool, and a large domestic market [6][9]. Group 4: Perception Shift - The West is transitioning from viewing itself as a technological leader to recognizing a crisis of innovation, as China's manufacturing is now seen as resilient and efficient rather than merely a cheap alternative [7][9]. - Schmidt's acknowledgment that "the U.S. must learn from China" signifies a recognition of China's technological achievements and the need for the U.S. to adapt [9].