泛化能力

Search documents
打破长视频理解瓶颈:HoPE混合位置编码提升VLM长度泛化能力
机器之心· 2025-06-29 04:23
来自 CMU 和小红书的研究团队对这一问题进行了深入研究,他们首次提出了针对多模态 RoPE 扩展策略的理论评估框架, 指出现有多模态 RoPE 泛化能力不足的原因之一是保留 RoPE 中所有频率对长上下文语义建模有负面影响。基于此分析,他 们提出的混合位置编码(HoPE, Hybrid of Position Embedding)大幅提升了 VLM 的长度泛化能力,在长视频理解和检索等 任务中达到最优表现。 李浩然,CMU 机器学习系研究生,研究方向是基础模型的长上下文建模、对齐、以及检索增强生成。 如今的视觉语言模型 (VLM, Vision Language Models) 已经在视觉问答、图像描述等多模态任务上取得了卓越的表现。然 而,它们在长视频理解和检索等长上下文任务中仍表现不佳。 虽然旋转位置编码 (RoPE, Rotary Position Embedding) 被广泛用于提升大语言模型的长度泛化能力,但是如何将 RoPE 有效 地扩展到多模态领域仍然是一个开放问题。具体而言,常用的扩展方法是使用 RoPE 中不同的频率来编码不同的位置信息 (x,y,t)。然而,由于 RoPE 中每个维度携带 ...
Qwen&清华团队颠覆常识:大模型强化学习仅用20%关键token,比用全部token训练还好
量子位· 2025-06-05 10:28
梦晨 发自 凹非寺 量子位 | 公众号 QbitAI 近期arxiv最热门论文, wen&清华LeapLab 团队最新成果: 在强化学习训练大模型推理能力时, 仅仅20%的高熵token就能撑起整个训练效果 ,甚至比用全部token训练还要好。 团队用这个发现在Qwen3-32B上创造了新的SOTA记录:AIME'24上达到63.5分,AIME'25上达到56.7分, 这是600B参数以下直接从base模型训练的最高分。 最大响应长度从20k延长到29k,AIME'24的分数更是飙升到了68.1分。 经典的二八法则(或帕累托法则)指出,通常80%的结果由20%的关键因素驱动,但剩下80%也是不能轻易舍弃的。 但是在大模型强化学习这里,80%低熵token不仅可以舍弃,甚至还可能起副作用,所以这篇论文被命名为"超越二八法则"。 此 外,团队还从token熵的角度探究了RL对LLM的主要影响,并进一步讨论了RL与SFT的区别、LLM RL的特殊性与clip-higher相较于 entropy bonus的优势。 揭开Chain-of-Thought的熵分布密码 要理解这项研究,需要先从一个有趣的观察说起: 团队发 ...
机器人“孝子”解养老困局:技术路径已明,非人形态先行
Zhong Guo Jing Ying Bao· 2025-05-29 12:07
Core Viewpoint - The article discusses the potential of humanoid robots in addressing the growing elderly care needs in the context of an aging population, highlighting advancements in technology and the evolving landscape of the robotics industry [1][3][20]. Industry Overview - The aging population in China is rapidly increasing, with projections indicating that by the end of 2024, there will be 310 million people aged 60 and above, accounting for 22% of the total population [3][20]. - The concept of "elderly care robots" encompasses various forms of robots, including exoskeletons and humanoid robots, with a particular focus on humanoid robots in popular perception [4][21]. Technological Advancements - Recent breakthroughs in robotics include improvements in bionic joints, motion control algorithms, and cognitive decision-making frameworks, which are essential for the development of humanoid robots [1][6]. - The introduction of international standards for elderly care robots aims to guide the design, manufacturing, testing, and certification processes, promoting healthy industry development [7][9]. Market Dynamics - The market for humanoid robots is expected to grow significantly, with estimates suggesting that by 2035, the global market could reach $38 billion, and in China, the market could expand to 500 billion yuan [20][24]. - The current pricing of humanoid robots ranges from approximately 99,000 yuan to 199,000 yuan, with expectations that prices will decrease as technology matures [14][17]. Future Outlook - Experts predict that humanoid robots capable of providing companionship and care for the elderly may enter households within the next three to ten years, although some believe it could take longer [18][21]. - The industry is witnessing a shift towards consumer markets, with companies exploring opportunities in home care and rehabilitation, indicating a potential for growth in the elderly care robotics sector [22][23].
软件所提出小批量数据采样策略
Jing Ji Guan Cha Wang· 2025-05-27 07:50
Core Insights - A research team from the Institute of Software, Chinese Academy of Sciences, proposed a small-batch data sampling strategy to eliminate the interference of unobservable variable semantics on representation learning, enhancing the out-of-distribution generalization ability of self-supervised learning models [1][2] Group 1: Research Findings - The out-of-distribution generalization ability refers to the model's performance on test data that differs from the training data distribution, which is crucial for maintaining effectiveness in unseen data scenarios [1] - The study identified that self-supervised learning models are affected by unobservable variable semantics during training, which weakens their out-of-distribution generalization ability [1] Group 2: Methodology - The proposed strategy utilizes causal effect estimation techniques to eliminate the confounding effects of unobservable variable semantics [1] - By learning a latent variable model, the strategy estimates the posterior probability distribution of unobservable semantic variables given "anchor" samples, termed as balance scores [1] - Samples with similar or close balance scores are grouped into the same small-batch dataset, ensuring that unobservable semantic variables are conditionally independent of the "anchor" samples within each batch [1] Group 3: Experimental Results - Extensive experiments on benchmark datasets showed that the sampling strategy improved the performance of mainstream self-supervised learning methods by at least 2% across various evaluation tasks [2] - In classification tasks on ImageNet100 and ImageNet, both Top-1 and Top-5 accuracy surpassed the state-of-the-art self-supervised methods [2] - In semi-supervised classification tasks, Top-1 and Top-5 accuracy increased by over 3% and 2%, respectively [2] - The strategy also provided stable gains in average precision for object detection and instance segmentation transfer learning tasks [2] - Performance improvements exceeded 5% for few-shot transfer learning tasks on datasets like Omniglot, miniImageNet, and CIFARFS [2] - The research findings were accepted by the top-tier academic conference in artificial intelligence, International Conference on Machine Learning (ICML-25) [2]
医疗影像大模型,还需“闯三关”
3 6 Ke· 2025-05-18 23:14
在众多应用场景中,因病理图像具有非常大的多样性,病理大模型也被认为是医疗模型"皇冠上的明 珠"。为破解病理诊断准确性与效率难题,透彻未来研发了全球首个临床应用级病理大模型产品——透 彻洞察,基于亿级参数量和海量高精度病理数据训练,为病理医生提供精准稳健、全面快速的病理临床 诊断辅助。 2025年以来,Deepseek通过开放生态加速了算法研发与临床场景的深度融合。医疗大模型摒弃了"技术 至上"的思维,逐渐进入实用主义阶段。作为AI应用最为深入的领域之一,医学影像在大模型时代迎来 了更快速的发展。 如何增强AI模型泛化能力?大模型幻觉问题如何解决?大模型多模态数据整合的难点及解决方案有哪 些?动脉网与数坤科技首席技术官郑超、透彻未来联创兼首席技术官王书浩这两位深耕医疗AI多年的 专家们聊了聊,供行业参考。 本文主要观点如下: 01 大模型已深入医生全工作流程 医学影像人工智能模型在参数规模未达当前水平时便展现出了广阔的应用前景,现已在影像科医生的工 作全流程中实现了常态化应用。而在辅助诊断专用模型之后,数坤科技在4月发布的"数坤坤多模态医疗 健康大模型",便实现了让AI从辅助工具进化为诊疗生态的核心驱动力。 数 ...
泛化性暴涨47%!首个意图检测奖励范式,AI工具爆炸时代意图识别新解法
机器之心· 2025-05-16 04:39
随着大模型(LLMs)的快速发展和可集成工具的爆炸增长,AI 智能助手在日常生活中可提供的便利越来越多,不仅包括传统任务型对话中订机票、查询天气等助 理能力,还增加了无以计数的 AI 能力,如 AI 画图、解数学题、游戏攻略等。而 AI 智能助手准确理解用户的意图(Intent Detection)并路由至下游工具链是实现 这些功能的第一步,其重要性不言而喻。 然而,工具的快速迭代、多样化、工具之间关系的复杂化也给意图识别带来新的挑战,即模型在应对新意图时普遍存在性能衰减问题。如何在开源的轻量级 LLMs 上训练泛化性更好、鲁棒性更强的意图识别模型,使得模型能够更准确理解未见场景的意图至关重要。 近日,腾讯 PCG 社交线的研究团队针对这一问题,采用强化学习(RL)训练方法,通过分组相对策略优化(Group Relative Policy Optimization, GRPO)算法,结 合基于奖励的课程采样策略(Reward-based Curriculum Sampling, RCS),将其创新性地应用在意图识别任务上,显著提升模型在未知意图上的泛化能力,攻克了 工具爆炸引发的意图泛化难题,推动大模型在意图 ...