大语言模型(LLMs)

Search documents
Redis 之父亲证:人类程序员仍力压 LLM!网友锐评:那是你没见过平庸码农被 AI 吊打的样子
程序员的那些事· 2025-05-30 07:10
Core Viewpoint - The article emphasizes that human programmers possess superior capabilities compared to large language models (LLMs), despite the usefulness of AI tools in assisting with programming tasks [3][10]. Group 1: Human vs. AI Capabilities - The article discusses a scenario where a complex bug in Redis was addressed, highlighting the limitations of LLMs in generating innovative solutions compared to human creativity [5][10]. - It is noted that while LLMs can assist in problem-solving, they often lack the ability to think outside conventional frameworks, which is a significant advantage of human programmers [10]. Group 2: Practical Applications of LLMs - The author shares experiences of using LLMs for code review and idea validation, indicating that these tools can enhance productivity but cannot fully replace the nuanced understanding required in software engineering [3][10]. - The article mentions that LLMs can serve as a sounding board for ideas, providing feedback that can help refine thought processes [13]. Group 3: Software Engineering Complexity - The article points out that software engineering encompasses much more than just coding, including understanding client needs and requirements, which LLMs are currently ill-equipped to handle [14]. - It emphasizes the social attributes of software engineering, where human interaction and comprehension of client demands play a crucial role [14].
《科学智能白皮书2025》发布,中国引领AI应用型创新领域
Di Yi Cai Jing· 2025-05-26 13:27
Core Insights - By 2024, China's AI-related paper citation volume is expected to account for 40.2% of the global total, rapidly catching up to the United States at 42.9% [1][8] - The report titled "Scientific Intelligence White Paper 2025" analyzes the integration of AI and scientific research across seven major research fields, covering 28 directions and nearly 90 key issues [1] - The report highlights the dual promotion and deep integration of AI innovation and scientific research, termed "AI for Science" [1] Research Trends - The number of global AI journal papers has surged nearly threefold over the past decade, from 308,900 to 954,500, with an average annual growth rate of 14% [7] - The share of core AI fields, such as algorithms and machine learning, has decreased from 44% to 38%, while the share of scientific intelligence has increased by 6 percentage points, with an annual growth rate rising from 10% before 2020 to 19% after [7] - China’s AI publication volume increased from 60,100 in 2015 to 300,400 in 2024, representing 29% of the global total [7][8] Citation Impact - The citation volume of AI-related papers in the U.S. reached 302,200 in 2020, while China's citations rose from 10,300 in 2015 to 144,800 in 2020, surpassing the EU for the first time in 2021 [8] - By 2024, China is projected to account for 41.6% of global AI citations in patents, policy documents, and clinical trials, significantly leading the field [8] Country-Specific Trends - China has a leading position in the intersection of AI with earth and environmental sciences, and has surpassed in AI with mathematics, material sciences, and humanities since 2019 [9] - The U.S. and EU maintain advantages in AI and life sciences, with China ranking third in this area [9] - India shows significant progress across all fields, currently ranking third in earth and environmental sciences, engineering, and humanities [9]
谷歌DeepMind:大模型也很任性,知道最优路径偏要撞南墙
机器之心· 2025-05-05 03:40
Core Insights - The article investigates the common failure modes of Large Language Models (LLMs) in decision-making scenarios, specifically focusing on greediness, frequency bias, and the knowing-doing gap [2][15]. - It proposes a reinforcement learning fine-tuning method (RLFT) to enhance the decision-making capabilities of LLMs by addressing these shortcomings [2][8]. Group 1: Failure Modes - LLMs exhibit suboptimal exploration and a knowing-doing gap, which prevents effective translation of knowledge into action [2][15]. - The three identified failure modes are: 1. Greediness, where LLMs overly favor actions that have previously shown the best performance [15]. 2. Frequency bias, where LLMs tend to repeat high-frequency actions regardless of their reward differences [5][18]. 3. Knowing-doing gap, where LLMs understand task requirements but fail to execute optimal actions due to a preference for greedy choices [7][20]. Group 2: Model Performance - Small-scale LLMs (2B) are significantly affected by frequency bias, leading to a lack of exploration, with up to 55% of actions remaining unexplored [4][18]. - Large-scale LLMs (27B) show reduced frequency bias but still exhibit greedy behavior, limiting their overall performance [6][18]. - The average action coverage for the largest models was only 45%, indicating a substantial gap compared to optimal strategies [17]. Group 3: Reinforcement Learning Fine-Tuning - The RLFT method adjusts the reasoning process of LLMs based on rewards obtained from environmental interactions, promoting the selection of actions that yield higher rewards [8][22]. - Results indicate that RLFT significantly reduces regret values in various environments, improving LLM performance compared to random baselines [22]. - RLFT effectively mitigates greediness by encouraging exploration, thus enhancing decision-making capabilities [22].
基于奖励驱动和自组织演化机制,全新框架ReSo重塑复杂推理任务中的智能协作
机器之心· 2025-04-27 10:40
本文由上海人工智能实验室,悉尼大学,牛津大学联合完成。第一作者周恒为上海 ailab 实习生和 Independent Researcher 耿鹤嘉。通讯作者为上海人工智能实验 室青年科学家白磊和牛津大学访问学者,悉尼大学博士生尹榛菲,团队其他成员还有 ailab 实习生薛翔元。 ReSo 框架( Re ward-driven & S elf- o rganizing)为复杂推理任务中的多智能体系统(MAS)提供了全新解法,在处理复杂任务时,先分解生成任务图,再为每个 子任务匹配最佳 agent。将任务图生成与奖励驱动的两阶段智能体选择过程相结合,该方法不仅提升了多智能体协作的效率,还为增强多智能体的推理能力开辟了 新路径。 研究背景:LLM 推理能力的掣肘与突破口 近年来, 增加推理时间(Inference Time Scaling) 被广泛认为是提升大语言模型(Large Language Models, LLMs)推理能力的重要途径之一。一方面,通过在训 练后阶段引入强化学习与奖励模型,可优化单一模型的推理路径,使其在回答前生成中间步骤,表现出更强的逻辑链构建能力;另一方面,也有研究尝试构建 多 智能体 ...
中科院领衔万字长文,全面系统梳理多模态LLM对齐算法
量子位· 2025-03-23 11:12
CASIA等 投稿 量子位 | 公众号 QbitAI 万字长文,对多模态LLM中对齐算法进行全面系统性回顾! 从现有 对齐算法涵盖的应用场景 ,到 构建对齐数据集的核心因素 ,再到 用于评估对齐算法的 基准 ,还有 对齐算法未来潜在发展方向 , 全都梳理了一遍。 大语言模型 (LLMs) 能够通过简单的提示完成多种任务,且无需进行任务特定的训练。然而,这些模型主要处理文本数据,对于多模态数 据的处理存在局限。 由于世界本质上是多模态的,包括视觉、听觉和文本等数据,研究者开始在LLM的基础上开发多模态大语言模型 (MLLMs) ,以处理更复 杂的数据形式。 然而,现有的MLLMs仍面临一系列挑战,尤其是在真实性、安全性、推理能力和与人类偏好对齐方面,这些问题尚未得到充分解决。 因此,针对这些问题的对齐算法应运而生,成为解决这些挑战的有效途径。 本文这项研究的主要贡献是对多模态大型语言模型 (MLLMs) 中的对齐算法进行全面的系统性回顾。 具体而言,探讨了以下四个关键问题: 现有对齐算法的应用 场景: 文章通过分类当前的对齐算法,清晰地展示了它们在不同应用领域的适用性,并为研究者提供了一个统一的 符号系统,帮助 ...