Workflow
DeepSeek R1
icon
Search documents
腾讯研究院AI速递 20251013
腾讯研究院· 2025-10-12 20:56
https://mp.weixin.qq.com/s/Re5rAiw_ULG54p_dwvVDqA 二、 硅谷百亿大佬Chamath弃用美国AI,带头"倒戈"中国模型 1. 硅谷顶级投资人Chamath Palihapitiya公开表示其公司已将大量工作负载转向中国Kimi K2模型,因性能足够强且 比OpenAI和Anthropic便宜太多; 一、 陶哲轩测GPT-5 Pro,小、宏观尺度很赞,中尺度有点垮 1. 陶哲轩使用ChatGPT-5 Pro挑战数学开放问题,发现AI在小尺度(具体计算推导)和宏观尺度(整体问题结构把 握)表现出色; 2. 中尺度层面(策略选择、方向判断)AI帮助有限甚至产生干扰,因过度认同用户思路而未能指出关键假设错误; 生成式AI 3. AI成功推导出Minkowski第一积分公式等工具,但在复杂非圆几何形态分析上存在明显局限,问题仍是开放状态。 2. Vercel、Cursor、Perplexity等美国开发者生态重要平台已集成Kimi K2,开发者用代码进行"投票"成为市场证 明; 3. State of AI Report 2025 也 首次将中国AI从"追赶者"提升为"平 ...
《时代》公布 2025 年度最佳发明:OpenAI 零入选,国产霸榜
3 6 Ke· 2025-10-10 11:51
今天,《时代》公布了「2025 最佳发明」榜单,评选了 300 个本年度最值得一提的创新发明。 在这份榜单里,除了风靡全球的国产 AI 模型 DeepSeek R1,还有能帮我们洗碗做家务的机器人,能实时翻译的耳机,把数据中心搬到桌面的 AI 超算小主 机…… 这些涵盖将近 40 个分类的发明放在一起,就像是一部未来生活的预告片,提示我们日常会被怎样重塑。 AI 不再只是少数人的工具,而是开始无处不在; 机器人不再停留在实验室,真实地出现在厨房里; 医疗、农业、娱乐等领域,那些看似微小的创新,也在悄悄改变着我们的日常。 话不多说,我们一起来看看 2025 年的世界,究竟哪些发明吸引着世界的目光。 除了 AI,还是 AI 说到创新,现在一定有 AI 这个关键词,手机进化为 AI 手机,手机里的 App 更是全部清一色的加入了 AI 功能。日常的学习、工作和生活,都开始面对着 AI 智慧医疗、AI 智慧城市、AI XX助手等等。 而要说到让 AI 真正走入寻常百姓家,人人唾手可得的产品,必然离不开 DeepSeek R1。 DeepSeek R1:一个低成本的大语言模型 DeepSeek R1 是在今年年初,特朗 ...
微信开始内测「批量撤回消息」功能
3 6 Ke· 2025-10-10 07:56
有媒体报道说微信开始内测「批量撤回消息」的功能,我看了下自己几个手机上的版本,都还没有被灰度到。 所谓批量撤回消息,其实是撤回消息功能的升级版,支持一键撤回一定时间内的消息。 功能操作上,会在原本撤回弹出菜单里新增一个选项,点击后会一次性撤回两分钟内的所有消息。 举个例子,有些人发消息时习惯一句话分成几段发出去,比如一句完整的话是「今天下午一起去游泳吧」,然后被拆分成两句发出去,分别是「今天下 午」和「一起去游泳吧」。 我想了下,这个功能可能对这么一类场景有帮助。 如果想撤回这句完整的意思就得连续操作撤回两次,要是正好卡在两分钟的节点上,那有些内容就无法撤回了。 因此,撤回本次发送的全部消息就能一键清除所有分段消息,从便利性来看是有所提升的。 特别是在一些讨论场景中,其中一方巴拉巴拉说了好多句,结果回过神来来发现说得不合适,于是撤回重来,那么这个功能就非常有价值。 还有就是在发文字配合图片的场景中,撤回文字和图片之前要分两步操作,现在也是一键搞定。 不过,对于大多数普通场景来说,单次撤回功能已经够用了。 微信开始内测的这个「批量撤回消息」功能属于常规功能延伸,属于有了会更好,但没有也不影响的优先级。 即便是删 ...
听说,大家都在梭后训练?最佳指南来了
机器之心· 2025-10-09 02:24
Core Insights - The article emphasizes the shift in focus from pre-training to post-training in large language models (LLMs), highlighting the diminishing returns of scaling laws as model sizes reach hundreds of billions of parameters [2][3][11]. Group 1: Importance of Post-Training - Post-training is recognized as a crucial phase for enhancing the reasoning capabilities of models like OpenAI's series, DeepSeek R1, and Google Gemini, marking it as a necessary step towards advanced intelligence [3][11]. - The article introduces various innovative post-training methods such as Reinforcement Learning from Human Feedback (RLHF), Reinforcement Learning from AI Feedback (RLAIF), and Reinforcement Learning with Verifiable Rewards (RLVR) [2][3][12]. Group 2: Transition from Pre-Training to Post-Training - The evolution from pre-training to instruction fine-tuning is discussed, where foundational models are trained on large datasets to predict the next token, but often lack practical utility in real-world applications [7][8]. - Post-training aims to align model behavior with user expectations, focusing on quality over quantity in the datasets used, which are typically smaller but more refined compared to pre-training datasets [11][24]. Group 3: Supervised Fine-Tuning (SFT) - Supervised Fine-Tuning (SFT) is described as a process that transforms a pre-trained model into one that can follow user instructions effectively, relying on high-quality instruction-answer pairs [21][24]. - The quality of the SFT dataset is critical, as even a small number of low-quality samples can negatively impact the model's performance [25][26]. Group 4: Reinforcement Learning Techniques - Reinforcement Learning (RL) is highlighted as a complex yet effective method for model fine-tuning, with various reward mechanisms such as RLHF, RLAIF, and RLVR being employed to enhance model performance [39][41]. - The article outlines the importance of reward models in RLHF, which are trained using human preference data to guide model outputs [44][46]. Group 5: Evaluation of Post-Training Models - The evaluation of post-training models is multifaceted, requiring a combination of automated and human assessments to capture various quality aspects [57][58]. - Automated evaluations are cost-effective and quick, while human evaluations provide a more subjective quality measure, especially for nuanced tasks [59][60].
马斯克转发字节Seed&哥大商学院新基准:大模型搞金融,连查个股价都能出错
Sou Hu Cai Jing· 2025-09-21 02:34
Core Insights - The article discusses the launch of FinSearchComp, an open-source financial search and reasoning benchmark developed by ByteDance's Seed team in collaboration with Columbia Business School, aimed at evaluating AI's performance in financial analysis tasks [1][3][5] Evaluation Results - The best-performing model, Grok 4 (web), achieved an accuracy of 68.9% on the global dataset, which is still 6.1 percentage points behind human experts. In the Greater China dataset, Doubao (web) led with an accuracy of 53.3%, falling short by over 34 percentage points compared to human experts' 88.3% [1][11] Task Design - FinSearchComp includes three progressively challenging task categories that reflect the complexity of financial analysts' daily work: 1. Time-sensitive data fetching, focusing on real-time data like stock prices [7] 2. Simple historical lookup, requiring fixed-point fact retrieval [7] 3. Complex historical investigation, demanding multi-period aggregation and analysis [7] Data Reliability - The benchmark's quality is supported by ByteDance's Xpert platform, which provides expert knowledge and high-quality AI training data. The project involved 70 financial experts, ensuring data reliability through cross-validation from official sources and professional financial databases [9][10] Importance of Search Capability - The evaluation highlighted the critical role of search capabilities, with models equipped with web search functionality showing significant performance improvements across tasks. Models without search capabilities scored zero on time-sensitive tasks, emphasizing the necessity of real-time data access for accurate financial analysis [12][11] Industry Implications - The findings suggest that while AI can assist in financial data retrieval, it still has considerable room for improvement. The article advocates for the establishment of a comprehensive evaluation system for financial AI, akin to a "driving license" for AI products, to ensure reliability before they can fully replace human analysts [13]
DeepSeek R1论文登上Nature封面;OpenAI顶尖人才出走;英伟达英特尔宿敌握手言和| 混沌AI一周焦点
混沌学园· 2025-09-19 11:58
Core Insights - Nvidia and Intel have formed a strategic partnership, with Nvidia investing $5 billion to acquire a 5% stake in Intel, leading to a 22% surge in Intel's stock price. The collaboration focuses on developing customized products for data centers and PCs, marking a significant shift in their competitive relationship [3][6][21]. Group 1: Technological Breakthroughs - DeepSeek's R1 model has been published in Nature, showcasing a training cost of approximately $294,000 and emphasizing the use of reinforcement learning instead of extensive manual data, setting a new standard for scientific transparency in AI [4][5]. - World Labs, founded by Fei-Fei Li, has released the Marble model, capable of generating persistent 3D worlds from a single image or text prompt, offering significant advancements in environmental modeling for various industries [9][10]. - Math's AI agent, Gauss, has achieved a formal proof of a complex mathematical theorem in just three weeks, a task that took a renowned mathematician 18 months, indicating a major leap in AI's capabilities in research [15][17]. Group 2: Industry Trends - The AI talent gap in China has exceeded 5 million, with a supply-demand ratio of 1:10, prompting a shift towards vertical AI competitions as a means to identify and cultivate practical talent [7][24]. - OpenAI's report reveals that ChatGPT has over 700 million weekly active users, primarily using the platform for practical guidance, information search, and writing, indicating its growing global reach and application [12][13][22]. - The 2025 Smart Expo highlighted five key sectors: intelligent robotics, low-altitude economy, smart homes, intelligent driving, and digital cities, showcasing the competitive landscape among major tech companies [14][16]. Group 3: Product Developments - The launch of the "ZhiYue Agent All-in-One Machine" aims to address information management challenges for CEOs, providing a localized solution that integrates with existing enterprise systems [10][23]. - Meituan has introduced its first life service agent, "Xiao Mei," which simplifies user interactions across various services, indicating a shift towards more personalized AI applications in local services [17][18]. - ByteDance has released Seedream 4.0, a powerful image model integrated into its AI creative agent, revolutionizing e-commerce marketing content production [19][20]. Group 4: Strategic Movements - The departure of prominent AI researcher Yao Shunyu from OpenAI signifies a shift in focus from enhancing model capabilities to creating valuable real-world applications, suggesting new opportunities for startups [19][22]. - The collaboration between Nvidia and Intel is seen as a restructuring of the core value network in the chip industry, indicating a move towards ecosystem collaboration rather than zero-sum competition [21][24].
DeepSeek论文登上《自然》封面,R1成为首个严格学术审查大模型
Xin Lang Cai Jing· 2025-09-18 02:23
Core Insights - DeepSeek's R1 model has been recognized as the first major language model to be peer-reviewed and published in the prestigious journal Nature, marking a significant milestone in AI research [1][2] - The R1 model achieved over 10.9 million downloads on Hugging Face, making it the most popular open-source inference model globally [2] - DeepSeek's innovative approach utilizes pure reinforcement learning to enhance reasoning capabilities, diverging from traditional human-imitation methods [2][3] Company Developments - DeepSeek's R1 model was developed with a training cost of only $294,000, significantly lower than the costs associated with training AI models by OpenAI and Google, which can reach millions [2] - The company released an upgraded version, DeepSeek-V3.1, which features a mixed reasoning architecture, improved thinking efficiency, and enhanced agent capabilities [3] - DeepSeek was founded in 2023 in Hangzhou, backed by the quantitative firm Huansquare, with a team composed of experts from top universities and international institutions [3] Industry Context - The publication of DeepSeek's research is seen as a critical step in addressing the rampant speculation and unverified claims within the AI industry, emphasizing the importance of independent peer review [3] - The recognition of DeepSeek's work by Nature highlights China's advancements in foundational research in large models, contributing to the global AI landscape [2]
大模型碰到真难题了,测了500道,o3 Pro仅通过15%
机器之心· 2025-09-14 03:07
Core Insights - The article discusses the development of a new benchmark called UQ (Unsolved Questions) to evaluate the capabilities of large language models, focusing on unsolved problems that reflect real-world challenges [2][3][5] - UQ consists of 500 challenging questions sourced from the Stack Exchange community, designed to assess reasoning, factual accuracy, and browsing capabilities of models [3][8] - The study highlights the limitations of existing benchmarks, which often prioritize difficulty over real-world applicability, and proposes a continuous evaluation method through community validation [1][5] Group 1 - UQ is a test set of 500 unsolved questions covering various topics, including computer science, mathematics, and history, aimed at evaluating model performance in a realistic context [3][8] - The selection process for UQ involved multiple filtering stages, reducing an initial pool of approximately 3 million questions to 500 through rule-based, model-based, and manual reviews [10][11] - The best-performing model in the UQ validation only succeeded in answering 15% of the questions, indicating the high difficulty level of the benchmark [5][7] Group 2 - The UQ validation process employs a composite verification strategy that leverages the strengths of different models to assess candidate answers without requiring standard answers [14][26] - The study found that using a composite validator significantly reduces self-bias and over-optimism in model evaluations, which is a common issue when models assess their own performance [24][25][26] - Results showed that a stronger answer generation model does not necessarily correlate with better answer validation performance, highlighting the complexity of model capabilities [27][28]
214亿!这位90后AI天才,太炸
混沌学园· 2025-09-13 11:57
Core Viewpoint - The article discusses the rise and challenges faced by Yang Zhilin, the founder of Moonshot AI, highlighting his journey from a top student to a prominent figure in the AI industry, and the competitive landscape shaped by DeepSeek's emergence. Group 1: Company Overview - Moonshot AI, founded by Yang Zhilin, focuses on developing advanced AI models, particularly the Kimi assistant, which supports long text inputs and has gained significant attention in the AI community [39][40]. - The company achieved a valuation of $3.3 billion by 2024, driven by its innovative AI solutions and substantial user engagement [42]. Group 2: Industry Context - The AI landscape in China has become increasingly competitive, with the emergence of DeepSeek disrupting the market and challenging existing players like Moonshot AI [45][56]. - DeepSeek's rapid success demonstrated the importance of cost efficiency and open-source strategies in gaining market share, contrasting with Moonshot AI's initial focus on advertising and user acquisition [57][58]. Group 3: Financial Performance - Moonshot AI's Kimi assistant saw a significant increase in monthly active users, rising from 4 million to 12.82 million within six months due to aggressive advertising spending [53]. - Despite the initial growth, the company faced challenges in maintaining its market position as competition intensified, leading to a decline in Kimi's market share [46][52]. Group 4: Technological Advancements - The release of Kimi K2 marked a significant technological advancement, being the first model with over a trillion parameters, which revitalized interest in Moonshot AI [63]. - Kimi K2's performance in evaluations positioned it among the top AI models globally, surpassing competitors and regaining attention in the tech community [64]. Group 5: Leadership and Vision - Yang Zhilin's leadership style emphasizes a blend of technical expertise and creative vision, drawing inspiration from his background in music and the arts [70][84]. - The company's culture reflects a commitment to innovation and a desire to push the boundaries of AI technology, aligning with Yang's long-term vision of transforming the industry [86].
GPT-5 为啥不 “胡说” 了?OpenAI 新论文讲透了
腾讯研究院· 2025-09-12 08:58
Core Viewpoint - The article discusses the advancements and challenges of OpenAI's GPT-5, particularly focusing on the significant reduction in hallucination rates compared to previous models, while also highlighting the underlying mechanisms and implications of these changes [5][6][25]. Group 1: Hallucination Rates and Mechanisms - GPT-5 has a hallucination rate that is approximately 45% lower than GPT-4 and about 80% lower than OpenAI's earlier models [6]. - The reduction in hallucination rates is attributed to enhanced reinforcement learning techniques that allow models to refine their reasoning processes and recognize their errors [8][9]. - The paper published by OpenAI indicates that hallucinations are an inevitable byproduct of the statistical learning nature of language models, making it more challenging to generate reliable information than to assess its reliability [12][16]. Group 2: Theoretical Framework - OpenAI introduces a theoretical "Is-It-Valid" (IIV) judgment mechanism that determines the validity of generated sentences based on their internal probabilities [13]. - The model's tendency to generate plausible-sounding but incorrect information is exacerbated by data sparsity, complexity, and noise in training data [14][16]. - The mathematical conclusion presented in the paper suggests that the error rate of generative models is at least double that of the IIV judgment errors, indicating a compounding effect of judgment mistakes on hallucinations [15][16]. Group 3: Post-Training Challenges - Post-training processes have not effectively mitigated hallucinations, as current evaluation metrics tend to reward models for providing confident but potentially incorrect answers [18][24]. - The article critiques the binary scoring systems used in mainstream AI evaluations, which penalize uncertainty and discourage models from expressing "I don't know" [21][24]. - The reinforcement learning processes that utilize binary reward paths may inadvertently promote overconfidence in models, leading to increased hallucination rates [27][29]. Group 4: Future Directions and Solutions - The article suggests that introducing a penalty-based scoring mechanism during post-training could help models better calibrate their confidence levels and reduce hallucinations [33]. - A shift from a score-optimization focus to a truth-oriented approach is proposed as a potential solution to the hallucination problem [34].