Workflow
训练
icon
Search documents
7B模型“情商”比肩GPT-4o,腾讯突破开放域RL难题,得分直翻5倍
量子位· 2025-07-18 06:16
Core Insights - The article discusses the challenges and solutions in optimizing large models for emotional intelligence in multi-turn dialogues using Reinforcement Learning (RL) [2][4][5] - The proposed RLVER framework integrates a user simulator that acts as both the interaction environment and the reward source, addressing the three main challenges of RL in this context [2][5][11] Group 1: Challenges in RL for Emotional Intelligence - The three main challenges identified are: 1. Environmental challenge: Creating a realistic and diverse interaction environment for the model [2][4] 2. Reward challenge: Converting subjective user satisfaction into stable, long-term rewards [2][11] 3. Training challenge: Achieving stable and efficient multi-turn online RL training on large language models (LLMs) [2][4] Group 2: RLVER Framework - The RLVER framework utilizes a user simulator that embodies diverse user profiles and interaction scenarios, allowing for a rich and dynamic learning environment [7][8] - This simulator updates its emotional state based on the model's responses, providing personalized feedback that enhances the model's learning experience [9][10] Group 3: Performance Outcomes - The Qwen2.5-7B model, trained using RLVER, achieved a score of 79.2 on the Sentient-Benchmark, a significant increase from 13.3, positioning it alongside top commercial models like GPT-4o and Gemini 2.5 Pro [16][17] - The model maintained its general capabilities in areas like mathematics and coding, avoiding "catastrophic forgetting" [17] Group 4: Insights from Training - The introduction of explicit "think-then-say" prompts improved the model's ability to understand and respond empathetically, leading to two distinct paths towards empathy: "thinking models" and "reactive models" [20][21] - The choice of optimization algorithms (PPO vs. GRPO) revealed that focusing on specific dimensions of emotional intelligence can yield better overall performance [23][27] Group 5: User Simulator Insights - The RLVER team created two types of user simulators, with findings indicating that a more forgiving environment (Vanilla simulator) is beneficial for early-stage model growth compared to a more challenging environment [29][30] - Models with explicit thinking structures demonstrated greater robustness in challenging environments, suggesting that reasoning capabilities can mitigate training instability [33]
大数据ETF(159739)上涨超1%,H20芯片恢复对华销售,大模型训练迎来利好
Xin Lang Cai Jing· 2025-07-16 02:31
大数据ETF紧密跟踪中证云计算与大数据主题指数,中证云计算与大数据主题指数选取50只业务涉及提 供云计算服务、大数据服务以及上述服务相关硬件设备的上市公司证券作为指数样本,以反映云计算与 大数据主题上市公司证券的整体表现。 数据显示,截至2025年6月30日,中证云计算与大数据主题指数(930851)前十大权重股分别为科大讯飞 (002230)、中际旭创(300308)、新易盛(300502)、中科曙光(603019)、金山办公(688111)、浪潮信息 (000977)、恒生电子(600570)、紫光股份(000938)、润和软件(300339)、润泽科技(300442),前十大权重 股合计占比51.84%。 大数据ETF(159739),场外联接A:021090;联接C:021091;联接I:022882。 截至2025年7月16日 10:08,中证云计算与大数据主题指数(930851)强势上涨1.68%,成分股新易盛 (300502)上涨12.90%,云天励飞(688343)上涨5.35%,税友股份(603171)上涨4.34%,中国长城(000066), 中际旭创(300308)等个股跟涨。大数据ET ...
无论在哪里工作,请戒掉你的“弱者气息”
洞见· 2025-07-15 10:15
洞见 ( DJ00123987 ) —— 不一样的观点,不一样的故事, 3000 万人订阅的微信大号。点击标题下 蓝字 " 洞见 " 关注,我们将为您提供有价值、有意思的延伸阅读。 作者: MK 来源: 每晚一卷书 (ID: JYXZ89896) 把精力往回收,把计划转变成行动。 ♬ 点上方播放按钮可收听洞见主播佳音朗读音频 英国伦敦大学学院曾主持过一项研究,他们对1970年出生的1.7万新生儿,进行了几十年的追 踪调查。 过程中他们发现,职业失败的人,往往都具有一些共性: 比如精力差、不爱学习、恐惧责任、心理脆弱等等。 这些人身上都散发着一种令人沮丧的"弱者气息",不必外人干涉,就会让自己陷入泥沼。 作家水木然也曾指出, 职场中,最要不得的就是弱者气息。 戒掉以下几种坏毛病,无论在哪儿上班,你都能如鱼得水。 01 行动懒,能摸鱼就绝不干活儿。 网上曾流传过一套《上班摸鱼手册》,教人每天怎么在公司混满八小时: 比如开会时精神溜号,把闲聊伪装成工作,偷用上班时间打游戏等等。 很多人把这套法则奉为"职场圣经",殊不知已经掉进了最危险的陷阱。 最终,他们不仅占不到老板任何的便宜,还白白浪费了自己的人生。 杨天真初入 ...
比Adam更有效,POET从谱不变原理出发,让LLM训练又稳又快
机器之心· 2025-07-15 00:59
Core Viewpoint - The article discusses a novel training paradigm for large language models (LLMs) called POET (Reparameterized Training via Orthogonal Equivalence Transformation), which aims to enhance training efficiency and stability based on first principles [2][3]. Group 1: POET Methodology - POET introduces structural reparameterization of each neuron by incorporating two learnable orthogonal matrices and a fixed random weight matrix, maintaining the singular value distribution of weights during training [3][11]. - The method combines singular value invariance with minimal hyperspherical energy, providing a new paradigm that offers both physical interpretability and generalization capability for large model training [3][11]. - POET's training process is designed to stabilize the optimization process and significantly improve model generalization performance [3][11]. Group 2: Advantages of POET - POET maintains the spectral properties of the weight matrix throughout training, ensuring that the singular values remain consistent with the randomly initialized matrix [17]. - The method allows for efficient parameter control and avoids the issue of excessively large singular values that can occur in standard LLM training [17]. - Two new initialization strategies, normalized Gaussian initialization and uniform spectrum initialization, are proposed to ensure bounded singular values in the generated weight matrices [17]. Group 3: Training Dynamics and Performance - The article presents experimental results demonstrating POET's superior performance in training large language models, including improvements in perplexity and training efficiency compared to traditional methods like AdamW [20][24]. - POET's training process is divided into three phases: conical shell searching, stable learning on the conical shell, and final adjusting, which reflects the evolution of the orthogonal matrices during training [40][41]. - The use of a fully stochastic sampling approach in POET allows for a significant reduction in memory costs compared to traditional methods, enhancing scalability [26][27].
娃哈哈宗馥莉被起诉,原告自称是同父异母弟妹|首席资讯日报
首席商业评论· 2025-07-14 04:10
Group 1 - The core viewpoint of the article emphasizes the ongoing positive trend in the A-share market, with a focus on mid-year performance reports and the theme of "anti-involution" [2][3] - China Shenhua reported a coal sales volume of 204.9 million tons in the first half of the year, reflecting a year-on-year decrease of 10.9% [8] - The railway sector completed fixed asset investments of 355.9 billion yuan in the first half of the year, showing a year-on-year growth of 5.5% [9] Group 2 - The article discusses the ongoing family trust dispute involving Wahaha's chairperson, Zong Fuli, who is being sued by her half-siblings for rights to a trust fund valued at 700 million USD each [5][6][7] - The white feather meat duck industry is undergoing a significant capacity reduction, with approximately 9 million breeding ducks eliminated, and an expectation that 30% of breeding duck enterprises may exit the market [11] - Perplexity's CEO indicated plans to utilize the Kimi K2 model for further training, highlighting advancements in AI capabilities [12]
X @Yuyue
Yuyue· 2025-07-13 09:13
AI 模型聪明和不聪明的区别在我看来很多时候来源于数据集的差异。就像之前我对比过腾讯元宝和 deepseek 有关本地生活问题的回答可用性,发现腾讯元宝虽然内核还是 deepseek,但回答要比 deepseek 本体 “聪明” 很多,直接可以根据回答来使用究其本质,是因为腾讯元宝直接能调用大量来自微信公众号这一不算完全开放的数据库,在其中有大量自媒体分享的经验和观点。可想而知,如果小红书能做一个 AI,在生活经验上可能比腾讯元宝更牛逼一点这一问题证明了高质量数据的重要性。AI 固然能帮人找到哪里的餐厅好吃,餐厅的联系方式是什么,但只有人类能原创创造出餐厅,创造性仍然是 AI 做不到的而这两天 Tiger Research 的报告中正是提到了数据领域的危机,由于 AI 内容的泛滥,优质数据资源可能面临枯竭,这将对依赖数据驱动的 AI 模型构成重大挑战。更棘手的是,许多用户创作的内容在未获许可的情况下被用于 AI 训练,而原作者往往无法获得认可或经济回报很多老师都在说 @campnetworkxyz 快发币了,这两天也看到不少 Camp 生态的相关动态,感觉是一个新版本的 $IP ...
VLA 推理新范式!一致性模型 CEED-VLA 实现四倍加速!
机器之心· 2025-07-13 04:58
本文第一作者为香港科技大学(广州)机器人系一年级博士生宋文轩,主要研究方向为VLA模型,共同第一作者是来自香港科技大学广州的研究助理陈家毅,项 目leader为浙江大学和西湖大学联合培养博士生丁鹏翔,他们也是具身智能领域开源项目OpenHelix以及LLaVA-VLA的研究团队。通讯作者为香港科技大学广州 的李昊昂教授, 他是今年的CVPR2025 Best Paper Candidate的获得者。 针对这一问题,部分研究提出采用 Jacobi 解码替代传统的自回归解码,以期提升推理效率。然而,由于 Jacobi 解码往往需要较多迭代次数,其加速效果在实践中 较为有限。 为此,我们提出了一种 一致性蒸馏训练(consistency distillation training)策略 ,使模型在每次迭代中能够同时预测多个正确的动作 token,从而实现解码加速。 同时,我们设计了混合标签监督机制(mixed-label supervision),用于缓解蒸馏过程中可能产生的误差积累问题。 尽管上述方法带来了可接受的加速效果,我们进一步观察到:Jacobi 解码中仍存在若干低效迭代步骤,成为限制整体效率的关键瓶颈。 ...
40岁后还在靠“慢慢动”保持健康?
Hu Xiu· 2025-07-13 04:08
在7个月前的Live Well Be Well一期播客中,运动生理学家Dr. Stacy Sims提出了一条令人意外的观点:Zone 2训练——这项一直被视为"优化线粒体、提高 代谢、燃烧脂肪、延缓衰老"的黄金训练区间,可能并不适合大多数女性作为主要训练方式。 她指出,女性在激素、代谢方式、肌肉与骨骼适应机制等方面与男性存在显著差异,而传统的Zone 2训练模式,往往是基于男性生理特征设计的。 Zone 2:曾被视为"代谢圣杯"的黄金区间 被Dr. Huberman和Peter Atia捧红的Zone 2训练,简单说就是"低强度、长时间"的有氧运动,比如慢跑、快走、骑车或划船。这种训练区间的心率通常控制 在最大心率的60%至70%,被广泛用于提升线粒体功能、增强脂肪氧化能力,并作为提升耐力和减脂的重要方式。 | | Know Your Heart Rate Zones 220 - Your age = Max heart rate (MHR) | | | --- | --- | --- | | Heart Rate Zone | % of MHR | Intensity Level | | Zone 1 | ...
X @外汇交易员
外汇交易员· 2025-07-11 02:07
英特尔CEO陈立武在内部讲话中表示,英特尔已跌出全球半导体行业前十。客户认为英特尔市值不及格,公司在开发AI训练技术方面远远落后于英伟达。陈立武将英特尔的转型比喻为一场马拉松,本周开始的裁员是为了使公司更接近英伟达、博通和AMD等竞争对手。 https://t.co/ZbXV2P3O2F ...
博通管理层会议:AI推理需求激增,甚至超过当前产能,并未反映在当前预期内
Hua Er Jie Jian Wen· 2025-07-10 08:46
Core Insights - The management of Broadcom has indicated a significant and unexpected increase in demand for AI inference, which is currently exceeding existing production capacity, suggesting potential upward revisions in future profitability [1][2][3] - Non-AI business segments are also showing signs of recovery, particularly through VMware's growth, contributing to a multi-faceted growth strategy for the company [1][4] AI Inference Demand - Broadcom's custom AI XPU chip business remains strong, with a clear growth trajectory. The past year saw AI demand primarily focused on training workloads, but a notable surge in inference demand has been observed in the last two months as clients seek to monetize their AI investments [2][3] - The current inference demand is not included in Broadcom's 2027 market size forecast, which estimates $60-90 billion for three existing AI clients, indicating a potential upside opportunity [3] Technological Advancements - Broadcom is collaborating closely with four potential AI XPU clients, aiming to build 1 million XPU AI cluster infrastructures. The company plans to complete the first generation of AI XPU products for two major clients this year [3] - The company is leading the industry transition to next-generation 2nm 3.5D packaging AI XPU architecture, with plans to complete the 2nm 3.5D AI XPU tape-out this year [3] Non-AI Business Recovery - After several quarters of cyclical pressure in non-AI semiconductor businesses, Broadcom is witnessing a gradual "U"-shaped recovery, reflected in current booking and order situations. This recovery may drive positive EPS revisions next year [4] - VMware is leveraging its cloud infrastructure (VCF) platform to provide comprehensive solutions for large enterprise clients, with expected revenue growth to approximately $20 billion annually by 2026/2027 [4] Profitability and Financial Metrics - Despite potential pressure on gross margins from high demand for custom AI XPUs, Broadcom anticipates continued expansion of operating margins due to operational leverage. AI revenue is expected to grow by 60% year-over-year in fiscal 2026, while operating expenses are not expected to increase at the same rate [5] - Key financial estimates for Broadcom include projected revenues of $51.574 billion for FY24, $63.447 billion for FY25, and $76.362 billion for FY26, with adjusted EPS expected to grow from $4.86 in FY24 to $8.38 in FY26 [6] Market Outlook - JPMorgan maintains an "overweight" rating on Broadcom with a target price of $325, representing a 16.9% upside from the current stock price. Broadcom's stock has risen nearly 20% year-to-date [7]