Large language model

Search documents
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35
IDEAL团队 投稿 量子位 | 公众号 QbitAI 大幅缓解LLM偏科,只需调整SFT训练集的组成。 本来不擅长coding的Llama 3.1-8B,代码能力明显提升。 上海交大&上海AI Lab联合团队提出创新方法 IDEAL ,可显著提升LLM在多种不同领域上的综合性能。 此外,研究还有一些重要发现,比如: 具体来看—— IDEAL方法 问题建模: 首先按照不同 的领域准备高质量的训练数据集: , 并给出对应的用于验证的验证集: 。通过在训练集上面训练模型θ,获得训练集上的最优参数:θ 论文 希望在验证 集上的损失达到最小。为了能够方便的调整训练集,论文引入了对应的变量β ,并将这个优化问题 显示地建模了出来: SFT后LLM部分能力甚至退化 大型语言模型 (LLM) 凭借其强大的理解和逻辑推理能力,在多个领域展现了惊人的能力。除了模型参数量的增大, 高质量的数据是公认的LLM性能提升最关键的影响因素。 当对模型进行监督微调(SFT)时,研究人员发现 LLM在多任务场景下常出现"偏科"现象 ——部分能力突出而部分 能力并未涨进,甚至退化。这种不平衡的现象导致大模型在不同的领域上能力不同,进而影响用户 ...
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35AI Processing
IDEAL团队 投稿 量子位 | 公众号 QbitAI 大幅缓解LLM偏科,只需调整SFT训练集的组成。 本来不擅长coding的Llama 3.1-8B,代码能力明显提升。 上海交大&上海AI Lab联合团队提出创新方法 IDEAL ,可显著提升LLM在多种不同领域上的综合性能。 此外,研究还有一些重要发现,比如: 具体来看—— SFT后LLM部分能力甚至退化 大型语言模型 (LLM) 凭借其强大的理解和逻辑推理能力,在多个领域展现了惊人的能力。除了模型参数量的增大, 高质量的数据是公认的LLM性能提升最关键的影响因素。 当对模型进行监督微调(SFT)时,研究人员发现 LLM在多任务场景下常出现"偏科"现象 ——部分能力突出而部分 能力并未涨进,甚至退化。这种不平衡的现象导致大模型在不同的领域上能力不同,进而影响用户体验。 上海交大和上海AI Lab的研究者迅速将目光聚焦到SFT训练的训练集上,是否可以通过调整训练集的组成来缓解LLM 偏科的情况?直觉上来看,直接将LLM的弱势科目的训练数据增加一倍,就可以让最后的结果发生变化。但是,由于 训练数据之间的耦合关系,研究者通过建模量化每个领域数据对于最终结果的 ...
JFrog (FROG) 2025 Conference Transcript
2025-06-05 18:00
JFrog (FROG) 2025 Conference June 05, 2025 01:00 PM ET Speaker0 Everybody, my name is Koji Keda. I am one of the software analysts here at Bank of America. Welcome to day three of our technology conference. I am absolutely thrilled to have JFrog doing a fireside chat with us. We have the CFO, Ed Grabshied, and we also have IR, Jeff Schreiner here. So thank you so much for being here. We appreciate it. I always ask the obligatory introductory comments or question of, you know, what is JFrog? What do you guys ...
Is AI Duolingo's Biggest Risk or Biggest Catalyst?
The Motley Fool· 2025-06-02 09:02
Gamified mobile learning company Duolingo (DUOL 1.35%) has been a market-crushing investment. The stock closed its first day of trading in 2021 at $139 per share. As of this writing, just four years later, it's trading at over $520 per share, up 276% and trading near an all-time high. For perspective, the S&P 500 is up just 34% during this time.Duolingo is outrunning the market for at least two simple reasons. First, it's growing like a weed. Second, it's actually quite a profitable business. The end result ...
Concord Healthcare Announces Official Release of the Proton Therapy Large Model
Prnewswire· 2025-05-29 20:30
BEIJING, May 29, 2025 /PRNewswire/ -- Concord Healthcare Group Co., Ltd. ("Concord Healthcare"), a subsidiary of Concord Medical Services Holdings Limited (the "Company") (NYSE: CCM), which subsidiary is listed on the Main Board of The Stock Exchange of Hong Kong Limited (the "HKSE") under the stock code 2453.HK, announced that Concord Healthcare had made important progress in precise tumor diagnosis and treatment technology. Concord Healthcare's self-developed large language model ("LLM") in the vertical f ...
Cerence(CRNC) - 2025 FY - Earnings Call Transcript
2025-05-29 15:50
Cerence (CRNC) FY 2025 Conference May 29, 2025 10:50 AM ET Speaker0 Morning, everybody. We're continuing on with our next session. Very pleased to have Cerence with us at the conference for a fireside chat. Lots of very interesting topics to talk about in in automotive, AI. From the company, we're very pleased to have Tony Rodriguez, executive vice president and CFO, of of the company. We'll keep it to a fireside chat session. If you do have any questions in the room, feel free to raise your hand. We'll get ...
JFrog(FROG) - 2025 FY - Earnings Call Transcript
2025-05-28 16:25
JFrog (FROG) FY 2025 Conference May 28, 2025 11:25 AM ET Speaker0 Okay, thank you. Everybody, I'm Andrew Sherman, software analyst at TD Cowen. Pleased to have Jeff Schreiner, Head of IR at JFrog. Thank you for being here. Speaker1 Thank you for having us. Speaker0 And a quick mention for Xdl II. We appreciate your vote if you think we've earned it. Thank you very much. So it's been a busy past year for JFrog. Let's just recap the quarter, which was a good one, and some new interesting stuff going on. So th ...
Claude 4 核心成员:Agent RL,RLVR 新范式,Inference 算力瓶颈
海外独角兽· 2025-05-28 12:14
编译:haozhen 编辑:Siqi 海外独角兽原创编译 转载请注明 Anthropic 在上周五发布了 Claude 4,这是目前最前沿的 Coding 模型,也是最强的 Agentic 模型,可 以连续编程 7 个小时。本文是对 Anthropic 两位核心研究员 Sholto Douglas 和 Trenton Bricken 最新访 谈的编译,其中,Sholto 专注于 RL scaling,Trenton 则在做机制可解释性的研究: • 2025 年在模型训练上,最大的变化是 RL 终于有效,只要有合适的反馈机制,模型就能达到专家级 人类的表现和可靠性; • 今年年底会出现可以替代初级程序员的 Agent,到明年这个时候软件工程类的 Agent 将会在实际任 务中创造价值; • 可验证奖励强化学习 RLVR 的范式已在编程和数学领域得到证明,因为这些领域很容易获得此类清 晰的信号; • 模型自我意识的发展关键在于 reward。因为模型会以某种方式追求 reward,而这种追求会深刻地影 响模型的"人格"和个性,最终带来自我意识; • 让 AI 获得诺贝尔奖比获普利策小说奖更容易,因为要让模型具备像 ...
Building Scalable Foundations for Large Language Models
DDN· 2025-05-27 22:00
[Music] Hello everyone wherever you are in the world today. Welcome to this DDN technical webinar. I'm Joel Kaufman, senior technical product specialist for DDN. And today I'm talking with Kevin Cochran, chief marketing officer of Vulture. And as the topic says on screen, we are going to be discussing how to build scalable foundations for large language models and frankly for most any type of AI. Kevin, welcome to the webinar. Great. So great to be here Joel. Looking forward to today's webinar and discussio ...
OSS to Attend NVIDIA GTC Paris 2025
Globenewswire· 2025-05-27 13:00
Core Insights - One Stop Systems, Inc. (OSS) is participating in the NVIDIA GTC Paris Conference on June 11-12, 2025, showcasing its rugged, enterprise-class compute solutions for AI and machine learning applications [1][2] - OSS emphasizes its long-term partnership with NVIDIA and the importance of the GTC Paris event for demonstrating its specialized AI computing solutions [2] - The conference will feature discussions on various AI applications, including generative AI and robotics, attracting developers and business leaders [3] Company Overview - OSS is a leader in AI-enabled solutions designed for edge environments, providing enterprise-class compute and storage products that perform in challenging conditions [4] - The company's product offerings include ruggedized servers, compute accelerators, and storage solutions, utilized across industries such as autonomous trucking, defense, and aerospace [5][6] - OSS addresses the entire AI workflow, from data acquisition to deep learning and inference, positioning itself in the rapidly growing edge computing market [6] Event Details - The NVIDIA GTC Paris Conference is organized in partnership with VivaTech 2025, focusing on real-world applications of AI and accelerated computing [3] - OSS will have a presence at Booth E07, with representatives from its European subsidiary, Bressner, also exhibiting at the conference [2]