Workflow
推理模型
icon
Search documents
DeepSeek用的GRPO有那么特别吗?万字长文分析四篇精品论文
机器之心· 2025-05-24 03:13
Core Insights - The article discusses recent advancements in reasoning models, particularly focusing on GRPO and its improved algorithms, highlighting the rapid evolution of AI in the context of reinforcement learning and reasoning [1][2][3]. Group 1: Key Papers and Models - Kimi k1.5 is a newly released reasoning model that employs reinforcement learning techniques and emphasizes long context extension and improved strategy optimization [10][17]. - OpenReasonerZero is the first complete reproduction of reinforcement learning training on a foundational model, showcasing significant results [34][36]. - DAPO explores improvements to GRPO to better adapt to reasoning training, presenting a large-scale open-source LLM reinforcement learning system [48][54]. Group 2: GRPO and Its Characteristics - GRPO is closely related to PPO (Proximal Policy Optimization) and shares similarities with RLOO (REINFORCE Leave One Out), indicating that many leading research works do not utilize GRPO [11][12][9]. - The core understanding is that current RL algorithms are highly similar in implementation, with GRPO being popular but not fundamentally revolutionary [15][6]. - GRPO includes clever modifications specifically for reasoning training rather than traditional RLHF scenarios, focusing on generating multiple answers for reasoning tasks [13][12]. Group 3: Training Techniques and Strategies - Kimi k1.5's training involves supervised fine-tuning (SFT) and emphasizes behavior patterns such as planning, evaluation, reflection, and exploration [23][24]. - The training methods include a sequence strategy that starts with simpler tasks and gradually increases complexity, akin to human learning processes [27][28]. - The paper discusses the importance of data distribution and the quality of prompts in ensuring effective reinforcement learning [22][41]. Group 4: DAPO Improvements - DAPO introduces two distinct clipping hyperparameters to enhance the learning dynamics and efficiency of the model [54][60]. - It also emphasizes dynamic sampling by removing samples with flat rewards from the batch to improve learning speed [63]. - The use of token-level loss rather than per-response loss is proposed to better manage learning dynamics and avoid issues with long responses [64][66]. Group 5: Dr. GRPO Modifications - Dr. GRPO aims to improve learning dynamics by modifying GRPO to achieve stronger performance with shorter generated lengths [76][79]. - The modifications include normalizing advantages across all tokens in a response, which helps in managing the learning signal effectively [80][81]. - The paper highlights the importance of high-quality data engineering in absorbing the effects of these changes, emphasizing the need for a balanced distribution of problem difficulty [82][89].
Google不革自己的命,AI搜索们也已经凉凉了?
创业邦· 2025-05-24 03:10
Core Viewpoint - Google is transitioning to AI-driven search modes to address the competitive threat posed by AI chatbots, which have significantly reduced its market share in search from over 90% to an estimated 65%-70% [7][9][31]. Group 1: Google and AI Search Transition - Google announced the launch of its AI Mode, powered by Gemini, which allows for natural language interaction and structured answers, moving away from traditional keyword-based searches [4][7]. - In 2024, Google's search business is projected to generate $175 billion, accounting for over half of its total revenue, highlighting the financial stakes involved in this transition [7]. - The urgency for Google to adapt stems from the increasing competition from AI chatbots that are capturing user traffic, prompting a strategic shift in its search approach [7][9]. Group 2: Market Dynamics and Competitor Analysis - The AI search engine Perplexity saw its user traffic grow from 45 million to 129 million, a 186% increase, but faced significant financial challenges, including a net loss of $68 million in 2024 [9][12]. - The overall funding for AI search products has decreased, with only 10 products raising a total of $893 million from August 2024 to April 2025, compared to 15 products raising $1.28 billion in the previous period [15][16]. - The competitive landscape is shifting, with established players like Google and Perplexity facing pressure from new entrants and the need for differentiation in a crowded market [31][32]. Group 3: Emerging Trends in AI Search - The trend is moving towards smaller, more specialized AI search engines that cater to specific industries or use cases, rather than attempting to replicate a general search engine like Google [17][31]. - New AI search products are focusing on niche areas such as health, law, and video content, which may provide a competitive edge against generalist platforms [34][51]. - The integration of reasoning models in AI search products is expected to enhance user experience and reduce inaccuracies, a significant improvement over previous models that struggled with "hallucination" issues [26][30]. Group 4: Financial and Operational Challenges - The financial viability of AI search startups is under scrutiny, as many are unable to convert user engagement into sustainable revenue, leading to a cautious investment environment [31][53]. - Google is exploring monetization strategies for its AI search, but there are concerns that the new AI formats may reduce click-through rates for traditional search ads [53].
Google不革自己的命,AI搜索们也已经凉凉了?
Hu Xiu· 2025-05-23 03:23
Group 1 - Google announced the launch of an advanced AI search mode driven by Gemini at the Google I/O developer conference, moving from a "keyword + link list" approach to "natural language interaction + structured answers" [1] - In 2024, Google's search business contributed $175 billion, accounting for over half of its total revenue, indicating that the transition to AI search may impact this revenue stream [2] - Bernstein research suggests that Google's search market share may have dropped from over 90% to 65%-70% due to the rise of AI ChatBots, prompting Google to act [3] Group 2 - The entry of Google into AI search is seen as a response to the threat posed by Chatbots that are consuming traffic, indicating a challenging environment for new AI search players [4] - Perplexity's user traffic increased from 45 million to 129 million over the past year, a growth of 186%, but its actual revenue was only $34 million due to frequent discounts, leading to a net loss of $68 million in 2024 [9] - The funding landscape for AI search products has changed significantly, with only 10 products raising a total of $893 million from August 2024 to April 2025, compared to 15 products raising $1.28 billion in the previous period [12][14] Group 3 - The overall trend in AI search engines is shifting towards smaller, more specialized products, moving away from the idea of creating a new Google Search [17] - Major players like Microsoft, OpenAI, and Google have integrated AI search functionalities into their existing platforms, making it difficult for standalone AI search products to compete [18][26] - The introduction of reasoning models has improved user experience in search functionalities, but many AI search products have not differentiated themselves sufficiently, leading to a decline in user engagement [26][30] Group 4 - New AI search products are focusing on niche markets, such as health, legal, and video search, to carve out a unique space in the competitive landscape [50] - Companies like Consensus and Twelve Labs are developing specialized search engines targeting specific user needs, such as medical research and video content [32][43] - The commercial viability of AI search products remains a significant challenge, with Google exploring ways to monetize its AI search mode while facing potential declines in click-through rates for traditional ads [51]
Claude 4发布!AI编程新基准、连续编码7小时,混合模型、上下文能力大突破
Founder Park· 2025-05-23 01:42
文章转载自「新智元」。 今天凌晨的 Anthropic 开发者大会上,Claude 4 登场。 CEO Dario Amodei亲自上阵,携Claude Opus 4和 Claude Sonnet 4亮相,再次将编码、高级推理和AI智能体,推向全新的标 准。 其中,Claude Opus 4是全球顶尖的编码模型,擅长复杂、长时间运行的任务,在AI智能体工作流方面性能极为出色。 而Claude Sonnet 4,则是对Sonnet 3.7 的重大升级,编码和推理能力都更出色,还能更精准地响应指令。 同时,Claude把这段时间积攒的一系列产品,通通一口气发布了—— Claude Opus 4和Sonnet 4混合模型的两种模式 :几乎即时的响应和用于更深度推理的扩展思考。 扩展思考与工具使用(测试版) :两款模型均可在扩展思考过程中使用工具(例如网络搜索),使Claude能在推理与工具使 用间灵活切换,从而优化响应质量。 新的模型能力 :两款模型均可并行使用工具,更精确地遵循指令,并且(当开发者授予其访问本地文件的权限时)展现出显 著增强的记忆能力,能提取、保存关键信息,以保持连续性,并随时间积累隐性知识。 C ...
全球最强编码模型 Claude 4 震撼发布:自主编码7小时、给出一句指令30秒内搞定任务,丝滑无Bug
AI前线· 2025-05-22 19:57
Core Insights - Anthropic has officially launched the Claude 4 series, which includes Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents [1][3] Model Performance - Claude Opus 4 is described as the most powerful AI model from Anthropic, capable of running tasks for several hours autonomously, outperforming competitors like Google's Gemini 2.5 Pro and OpenAI's models in coding tasks [6][8] - In benchmark tests, Claude Opus 4 achieved 72.5% in SWE-bench and 43.2% in Terminal-bench, leading the field in coding efficiency [10][11] - Claude Sonnet 4, a more cost-effective model, offers excellent coding and reasoning capabilities, achieving 72.7% in SWE-bench, while reducing the likelihood of shortcuts by 65% compared to its predecessor [13][14] Memory and Tool Usage - Claude Opus 4 significantly enhances memory capabilities, allowing it to create and maintain "memory files" for long-term tasks, improving coherence and execution performance [11][20] - Both models can utilize tools during reasoning processes, enhancing their ability to follow instructions accurately and build implicit knowledge over time [19][20] API and Integration - The new models are available on Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI, with pricing consistent with previous models [15] - Anthropic has also released Claude Code, a command-line tool that integrates with GitHub Actions and development environments like VS Code, facilitating seamless pair programming [17] Market Context - The AI industry is shifting towards reasoning models, with a notable increase in their usage, growing from 2% to 10% of all AI interactions within four months [31][35] - The competitive landscape is intensifying, with major players like OpenAI and Google also releasing advanced models, each showcasing unique strengths [36]
一场对话,我们细扒了下文心大模型背后的技术
量子位· 2025-05-22 12:34
Core Viewpoint - The article discusses the advancements in large models, particularly focusing on the performance of Baidu's Wenxin models, which have achieved high ratings in recent evaluations, indicating their strong capabilities in reasoning and multimodal integration [1][2]. Group 1: Model Performance and Evaluation - The China Academy of Information and Communications Technology (CAICT) recently evaluated large model reasoning capabilities, with Wenxin X1 Turbo achieving the highest rating of "4+" in 24 assessment categories [1]. - Wenxin X1 Turbo scored 16 items at 5 points, 7 items at 4 points, and 1 item at 3 points, making it the only large model in China to pass this evaluation [1]. Group 2: Technological Innovations - Wenxin models emphasize two key areas: multimodal integration and deep reasoning, with the introduction of technologies such as multimodal mixed training and self-feedback enhancement [6][11]. - The multimodal mixed training approach unifies text, image, and video modalities, improving training efficiency by nearly 2 times and enhancing multimodal understanding by over 30% [8]. - The self-feedback enhancement framework allows the model to self-improve, addressing challenges in data production and significantly reducing model hallucinations [13]. Group 3: Application Scenarios - In practical applications, Wenxin X1 Turbo demonstrates its capabilities in solving physics problems and generating code, with AI-generated code now accounting for over 40% of new code added daily [42][44]. - The technology supports over 100,000 digital human anchors, achieving a 31% conversion rate in live broadcasts and reducing broadcast costs by 80% [48]. Group 4: Market Potential and Future Directions - The global online education market is projected to reach 899.16 billion yuan by 2029, with large models playing a crucial role in this growth [49]. - The digital human market is expected to reach 48.06 billion yuan this year, nearly quadrupling from 2022, indicating significant opportunities for large model applications [49]. Group 5: Long-term Strategy and Vision - Baidu's approach to large models emphasizes continuous technological exploration and deepening, focusing on long-term value rather than short-term trends [57][58]. - The company maintains a dynamic perspective on the rapid evolution of technology, aiming to prepare for future industry transformations [58].
锦秋基金臧天宇:2025年AI创投趋势
锦秋集· 2025-05-14 10:02
Core Insights - The article discusses the investment trends in the AI sector, highlighting a shift from foundational models to application layers as the core focus for investment opportunities [1][7][11]. Group 1: Domestic AI Investment Trends - JinQiu Capital's investment portfolio serves as a small sample window to observe domestic AI investment trends [2]. - Approximately 60% of the projects are concentrated in the application layer, driven by improved model intelligence and significantly reduced invocation costs [6][7]. - The investment focus has shifted from foundational models, particularly large language models (LLMs), to application-oriented projects as foundational model capabilities mature [6][7]. Group 2: Key Investment Areas - The application layer is the primary focus, with nearly 40% of investments in Agent AI, 20% in creative tools, and another 20% in content and emotional consumption [8]. - Bottom-layer computing power and Physical AI are also critical areas, with investments aimed at enhancing model training and inference capabilities [9][10]. - The middle layer/toolchain investments are limited, focusing on large model security and reinforcement learning infrastructure [10]. Group 3: Trends in AI Intelligence and Cost - The continuous improvement of AI intelligence and the decreasing cost of acquiring this intelligence are the two core trends driving investment decisions [12][13]. - The industry has shifted focus from pre-training scaling laws to optimizing post-training phases, leading to the emergence of "Test Time Scaling" [14][15]. - The "Agent AI" era is characterized by the development of various agents to address practical operational issues [15]. Group 4: Cost Reduction in AI - A significant decrease in token costs has been observed, with prices dropping to as low as 0.8 RMB per million tokens, making applications economically viable [19][20]. - The cost of reasoning models remains a challenge due to their higher token consumption, necessitating further innovations to reduce inference costs [21][22]. - Innovations in underlying computing architectures, such as processing-in-memory and optical computing, are expected to drive long-term cost reductions [23][24]. Group 5: Opportunities in the Application Layer - The combination of improved intelligence and reduced costs has led to a surge in entrepreneurial activity within the application layer [26]. - The AI era presents new variables, including richer information and service offerings, as well as more precise recommendations evolving into proactive services [29][30]. - The marginal cost of content creation and service execution has significantly decreased, enabling scalable and distributable service models [31][33]. Group 6: Future of Physical AI - The potential for achieving general-purpose robots in the Physical AI domain is highlighted as a key area for future development [37]. - Data remains a core challenge for the development of general-purpose robots, necessitating collaborative optimization of hardware and software [40].
推理大模型1年内就会撞墙,性能无法再扩展几个数量级 | FrontierMath团队最新研究
量子位· 2025-05-13 07:11
衡宇 发自 凹非寺 量子位 | 公众号 QbitAI 与之伴随而来的还有另一个消息: 如果推理模型保持「每3-5个月都以10倍速度增长」,那么推理训练所需的算力可能会大幅收敛。 就像DeepSeek-R1之于OpenAI o1-preview那样。 一年之内,大模型推理训练可能就会撞墙。 以上结论来自Epoch AI。 这是一个专注于人工智能研究和基准测试的非营利组织,之前名动一时的FrontierMath基准测试 (评估AI模型数学推理能力) 就出自它家。 看了这个结果,有围观网友都着急了: 既然在o3基础上再scaling非常困难,那 为啥咱不探索模块化架构或针对特定任务的专用模型呢? "效率"比"研究过剩"更重要! 推理训练还有scalable的空间 OpenAI的o1是推理模型的开山之作。 OpenAI表示,与o1相比,训练o3所需的算力提升了10倍——提升部分几乎都花在了训练阶段。 OpenAI没有公开o1、o3的具体细节,但可以从DeepSeek-R1、微软Phi-4-reasoning、英伟达Llama-Nemotron等其它推理模型。它们 所需的推理训练阶段算力耕地,但可以根据它们进行推演。 ...
阶跃星辰姜大昕:多模态目前还没有出现GPT-4时刻
Hu Xiu· 2025-05-08 11:50
Core Viewpoint - The multi-modal model industry has not yet reached a "GPT-4 moment," as the lack of an integrated understanding-generating architecture is a significant bottleneck for development [1][3]. Company Overview - The company, founded by CEO Jiang Daxin in 2023, focuses on multi-modal models and has undergone internal restructuring to form a "generation-understanding" team from previously separate groups [1][2]. - The company currently employs over 400 people, with 80% in technical roles, fostering a collaborative and open work environment [2]. Technological Insights - The understanding-generating integrated architecture is deemed crucial for the evolution of multi-modal models, allowing for pre-training with vast amounts of image and video data [1][3]. - The company emphasizes the importance of multi-modal capabilities for achieving Artificial General Intelligence (AGI), asserting that any shortcomings in this area could delay progress [12][31]. Market Position and Competition - The company has completed a Series B funding round of several hundred million dollars and is one of the few in the "AI six tigers" that has not abandoned pre-training [3][36]. - The competitive landscape is intense, with major players like OpenAI, Google, and Meta releasing numerous new models, highlighting the urgency for innovation [3][4]. Future Directions - The company plans to enhance its models by integrating reasoning capabilities and long-chain thinking, which are essential for solving complex problems [13][18]. - Future developments will focus on achieving a scalable understanding-generating architecture in the visual domain, which is currently a significant challenge [26][28]. Application Strategy - The company adopts a dual strategy of "super models plus super applications," aiming to leverage multi-modal capabilities and reasoning skills in its applications [31][32]. - The focus on intelligent terminal agents is seen as a key area for growth, with the potential to enhance user experience and task completion through better contextual understanding [32][34].
Sebastian Raschka 新书《从头开始推理》抢先看,揭秘推理模型基础
机器之心· 2025-05-02 04:39
Core Viewpoint - The article discusses the advancements in reasoning capabilities of large language models (LLMs) and introduces the book "Reasoning From Scratch" by Sebastian Raschka, which aims to provide practical insights into building reasoning models from the ground up [2][5][59]. Group 1: Definition and Importance of Reasoning in LLMs - Reasoning in the context of LLMs refers to the model's ability to generate intermediate steps before arriving at a final answer, often described as chain-of-thought (CoT) reasoning [8][10]. - The distinction between reasoning and pattern matching is crucial, as traditional LLMs primarily rely on statistical correlations rather than logical reasoning [23][25]. - Understanding reasoning methods is essential for enhancing LLMs' capabilities to tackle complex tasks, such as solving logical puzzles or multi-step arithmetic problems [5][39]. Group 2: Training Process of LLMs - The typical training process for LLMs consists of two main phases: pre-training and fine-tuning [16][19]. - During pre-training, LLMs are trained on vast amounts of unlabelled text (up to several terabytes) to learn language patterns, which can cost millions of dollars and take months [17][21]. - Fine-tuning involves supervised fine-tuning (SFT) and preference fine-tuning to improve the model's ability to respond to user queries [20][21]. Group 3: Pattern Matching vs. Logical Reasoning - LLMs learn to predict the next token based on statistical patterns in the training data, which allows them to generate coherent text but lacks true understanding [23][24]. - In contrast, logical reasoning requires the ability to derive conclusions step-by-step, identifying contradictions and causal relationships [25][26]. - The article highlights that most LLMs do not actively identify contradictions but instead rely on learned patterns from training data [30][34]. Group 4: Enhancing Reasoning Capabilities - The reasoning capabilities of LLMs gained significant attention with the release of OpenAI's o1 model, which emphasizes a more human-like thought process [41][43]. - Enhancements to LLM reasoning can be achieved through inference-time compute scaling, reinforcement learning, and knowledge distillation [44][46][48]. - These methods aim to improve the model's reasoning ability without retraining the underlying model weights [46][48]. Group 5: Importance of Building Reasoning Models from Scratch - Building reasoning models from scratch provides valuable insights into the capabilities, limitations, and computational trade-offs of LLMs [50][57]. - The shift towards reasoning models reflects a broader trend in the AI industry, emphasizing the need for models that can handle complex tasks effectively [52][55]. - Understanding the underlying mechanisms of LLMs and reasoning models is crucial for optimizing their performance in various applications [57].