Workflow
Large Language Model (LLM)
icon
Search documents
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35AI Processing
IDEAL团队 投稿 量子位 | 公众号 QbitAI 大幅缓解LLM偏科,只需调整SFT训练集的组成。 本来不擅长coding的Llama 3.1-8B,代码能力明显提升。 上海交大&上海AI Lab联合团队提出创新方法 IDEAL ,可显著提升LLM在多种不同领域上的综合性能。 此外,研究还有一些重要发现,比如: 具体来看—— SFT后LLM部分能力甚至退化 大型语言模型 (LLM) 凭借其强大的理解和逻辑推理能力,在多个领域展现了惊人的能力。除了模型参数量的增大, 高质量的数据是公认的LLM性能提升最关键的影响因素。 当对模型进行监督微调(SFT)时,研究人员发现 LLM在多任务场景下常出现"偏科"现象 ——部分能力突出而部分 能力并未涨进,甚至退化。这种不平衡的现象导致大模型在不同的领域上能力不同,进而影响用户体验。 上海交大和上海AI Lab的研究者迅速将目光聚焦到SFT训练的训练集上,是否可以通过调整训练集的组成来缓解LLM 偏科的情况?直觉上来看,直接将LLM的弱势科目的训练数据增加一倍,就可以让最后的结果发生变化。但是,由于 训练数据之间的耦合关系,研究者通过建模量化每个领域数据对于最终结果的 ...
Claude 4 核心成员:Agent RL,RLVR 新范式,Inference 算力瓶颈
海外独角兽· 2025-05-28 12:14
Core Insights - Anthropic has released Claude 4, a cutting-edge coding model and the strongest agentic model capable of continuous programming for 7 hours [3] - The development of reinforcement learning (RL) is expected to significantly enhance model training by 2025, allowing models to achieve expert-level performance with appropriate feedback mechanisms [7][9] - The paradigm of Reinforcement Learning with Verifiable Rewards (RLVR) has been validated in programming and mathematics, where clear feedback signals are readily available [3][7] Group 1: Computer Use Challenges - By the end of this year, agents capable of replacing junior programmers are anticipated to emerge, with significant advancements expected in computer use [7][9] - The complexity of tasks and the duration of tasks are two dimensions for measuring model capability, with long-duration tasks still needing validation [9][11] - The unique challenge of computer use lies in its difficulty to embed into feedback loops compared to coding and mathematics, but with sufficient resources, it can be overcome [11][12] Group 2: Agent RL - Agents currently handle tasks for a few minutes but struggle with longer, more complex tasks due to insufficient context or the need for exploration [17] - The next phase of model development may eliminate the need for human-in-the-loop, allowing models to operate more autonomously [18] - Providing agents with clear feedback loops is crucial for their performance, as demonstrated by the progress made in RL from Verifiable Rewards [20][21] Group 3: Reward and Self-Awareness - The pursuit of rewards significantly influences a model's personality and goals, potentially leading to self-awareness [30][31] - Experiments show that models can internalize behaviors based on the rewards they receive, affecting their actions and responses [31][32] - The challenge lies in defining appropriate long-term goals for models, as misalignment can lead to unintended behaviors [33] Group 4: Inference Computing Bottleneck - A significant shortage of inference computing power is anticipated by 2028, with current global capacity at approximately 10 million H100 equivalent devices [4][39] - The growth rate of AI computing power is around 2.5 times annually, but a bottleneck is expected due to wafer production limits [39][40] - Current resources can still significantly enhance model capabilities, particularly in RL, indicating a promising future for computational investments [40] Group 5: LLM vs. AlphaZero - Large Language Models (LLMs) are seen as more aligned with the path to Artificial General Intelligence (AGI) compared to AlphaZero, which lacks real-world feedback signals [6][44] - The evolution of models from GPT-2 to GPT-4 demonstrates improved generalization capabilities, suggesting that further computational investments in RL will yield similar advancements [44][47]
为什么 AI Agent 需要自己的浏览器?
海外独角兽· 2025-04-08 11:05
编译:Xeriano 编辑:Cage 浏览器的使用者正在逐渐从人类用户转移到 AI Agent ,Agent 与互联网环境互动的底层设施也因此 正在变得越来越重要。传统浏览器无法满足 AI Agent 自动化抓取、交互和实时数据处理的需求。 Browserbase 的创始人 Paul Klein 早在 23 年底就敏锐地洞察到 AI Agent 亟需一个全新的交互载体 ——一个"为 AI 而生"的云端浏览器。这个浏览器不仅要解决现有工具的性能和部署问题,更核心的 是要利用 LLM 和 VLM 赋予浏览器理解和适应网页变化的能力,让 AI Agent 能用更接近自然语言的 方式与之交互,稳定地完成任务。 Browserbase 是一家成立一年多的 headless browser 服务提供商,以云服务的形式为 AI Agent 公司提 供 scalable、高可用性的浏览器服务。近期,Browserbase 又推出了 StageHand,一种利用 LLM 使得 开发者可以用自然语言与网页进行交互的框架,进一步拓展了其在 headless browser 领域的影响。 本文基于创始人早期备忘录进行了编译,详细阐述 ...
My Top Artificial Intelligence (AI) Stocks to Buy Right Now
The Motley Fool· 2025-03-31 07:51
Importantly, Alphabet isn't running and hiding from generative AI. Instead, the company is embracing it. Chatbot Arena ranks Google Gemini version 2.5 Pro as the No. 1 overall large language model (LLM) as well as the best at math, instruction following, creative writing, handling longer queries, and more. Gemini is already incorporated into Google Search through AI Overviews, which is driving higher search usage and user satisfaction. Thanks in part to Gemini, Google Cloud is the fastest-growing cloud serv ...
Has AMD's "Nvidia Moment" Finally Arrived?
The Motley Fool· 2025-03-18 10:05
Core Insights - AMD is gaining traction in the GPU market, particularly in the data center segment, indicating a potential shift in competitive dynamics against Nvidia [5][9][12] - The rise of large language models (LLMs) has significantly increased the demand for GPUs, which are essential for processing large volumes of data [2][3] - Nvidia currently holds a dominant position in the GPU market with approximately 90% market share, benefiting from first-mover advantages and high pricing power [4][6] AMD's Market Position - AMD has recently secured contracts with major tech companies like Microsoft, Meta, and Oracle, showcasing its ability to penetrate the market [9][12] - The introduction of AMD's MI300X accelerators positions the company as a cost-competitive alternative to Nvidia, appealing to companies looking to optimize AI infrastructure costs [8][9] - Despite a 47% decline in share price over the past year, AMD's valuation is considered attractive, trading at a forward P/E multiple of 22, the lowest in over a year [11] Future Growth Potential - AMD's early successes in acquiring significant clients suggest a promising trajectory for sustained growth in the GPU sector [10][12] - The company does not need to surpass Nvidia to be viewed as a viable investment; maintaining a competitive growth rate could attract growth investors [12][13] - There is optimism that AMD could experience a growth trajectory similar to Nvidia, particularly as the AI boom continues to evolve [14]
TrendForce:英伟达已成IC设计霸主
半导体芯闻· 2025-03-17 10:42
Core Insights - The article highlights the significant growth in the semiconductor industry driven by the AI boom, with the top ten IC design companies projected to generate a combined revenue of approximately $249.8 billion in 2024, marking a 49% year-over-year increase [1][5]. Group 1: Market Overview - The AI trend is leading to a monopolistic situation in the semiconductor IC industry, as high-end chips require substantial capital and advanced technology, creating high entry barriers for new players [2]. - NVIDIA is expected to dominate the market with a projected revenue of $124.4 billion in 2024, reflecting a staggering 125% growth, capturing 50% of the top ten companies' revenue [5]. Group 2: Key Players and Performance - Broadcom is anticipated to achieve a semiconductor revenue of $30.6 billion in 2024, an 8% increase, with over 30% of its semiconductor solutions coming from AI chips [2]. - AMD's revenue is projected to reach $25.8 billion in 2024, a 14% increase, driven by significant growth in its server CPU business, which is expected to grow by 94% [3]. - Qualcomm's revenue is expected to be $34.9 billion in 2024, a 13% increase, as it focuses on AI PC and edge computing devices [3]. - MediaTek is projected to generate $16.5 billion in revenue in 2024, a 19% increase, with expectations of a 65% penetration rate in the 5G smartphone market by 2025 [3]. Group 3: Rankings and Revenue Changes - Realtek is expected to achieve a revenue of approximately $3.5 billion in 2024, a 16% increase, with growth driven by PC and automotive-related shipments [4]. - Will Semiconductor's revenue is projected to reach $3.0 billion in 2024, a 21% increase, benefiting from the rising demand for high-end CIS in Android smartphones and electric vehicle applications [4]. - MPS is anticipated to generate $2.2 billion in revenue in 2024, a 21% increase, due to its PMIC products entering the AI server supply chain [4].
快看!这就是DeepSeek背后的公司
梧桐树下V· 2025-01-29 03:16
| © 企查查 企业主页 | | --- | | 杭州深度求索人工智能基础技术研 存续 | | 究有限公司 | | 21万+ 91330105MACPN4X08Y ¥ 发票抬头 | | 简介:DeepSeek成立于2023年,是一家通用人工智能模... 展开 | | 法定代表人 注册资本 成立日期 | | 製作 1000万元 2023-07-17 | | 企查查行业 规模 品丁 2023年 | | 信息系统集成服务 微型 XS 4人 | | & 0571-85377238 | | 9 浙江省杭州市拱墅区环城北路169号汇金国际大厦西1幢120 | | 1室 | | 宁波程图个业管理 | | 梁文章 服 咨询合伙 ... 大股东 | | 东 | | 持股比例 99.00% 持股比例 1.00% 2 | | 投资企业2家 关联企业15家 2 | | 裴活 王南军 | | 퀘 + 등 执行董事兼. 监事 | | 2 关联企业3家 关联企业2家 | 文/梧桐晓驴 DeepSeek爆火,晓驴好奇地去查了一下开发、运营DeepSeek的公司情况。 "企查查"显示:杭州深度求索人工智能基础技术研究有限公司,英文名Hangz ...