Workflow
大语言模型上下文学习
icon
Search documents
腾讯姚顺雨团队发布署名论文,让模型“上下文学习”真正走向现实
Yang Zi Wan Bao Wang· 2026-02-03 15:09
Core Insights - The article discusses the challenges faced by current language models in learning from context, highlighting that even the strongest models struggle with this capability [1][2][3] Group 1: Research Findings - Tencent's research team, in collaboration with Fudan University, emphasizes that enabling large models to learn from context is more difficult than previously thought [2][3] - The team developed CL-bench, a benchmark designed to evaluate whether language models can learn new knowledge from context and apply it correctly, consisting of 500 complex contexts, 1,899 tasks, and 31,607 validation standards [3] - The top ten language models achieved an average task resolution rate of only 17.2% on CL-bench, indicating significant shortcomings in their ability to utilize context [3] Group 2: Future Implications - The research suggests that enhancing models' context learning capabilities could shift the role of humans from being primary data providers to context providers, changing the competitive landscape in AI [3][4] - The team also notes that memory management in models may become a core theme in the development of large models by 2026, potentially leading to autonomous learning capabilities [4]
刚刚,腾讯姚顺雨团队首个成果发布,揭示大模型真正瓶颈
3 6 Ke· 2026-02-03 14:26
Core Insights - Tencent's Mix Yuan team has launched a new benchmark called CL-bench, aimed at evaluating the ability of large language models to learn new knowledge from context and apply it correctly [1][7][30] Group 1: Benchmark Overview - CL-bench includes 500 complex context tasks, 1899 tasks, and 31607 validation standards, focusing on the requirement for models to learn new knowledge from context that is not present in their pre-training [9][28] - The benchmark aims to bridge the gap between the static memory of models and the dynamic learning capabilities of humans, emphasizing the need for models to adapt to real-world tasks [5][7] Group 2: Model Performance - The average success rate of models on CL-bench is only 17.2%, with the best-performing model, GPT-5.1 (High), achieving a success rate of 23.7% [15][16] - The evaluation revealed that many models fail to utilize context effectively, with significant percentages of tasks being ignored or misused [17][18] Group 3: Key Findings - Ignoring or misusing context is identified as a primary reason for model failures, indicating that models often rely on static knowledge rather than adapting to new information [17] - The ability to perform inductive reasoning from experimental data is found to be more challenging than applying deductive reasoning based on provided rules [20] - The complexity of context, rather than just its length, significantly impacts the difficulty of tasks, highlighting the need for models to improve their context learning capabilities [25][30] Group 4: Future Directions - The Mix Yuan team plans to focus on enhancing models' context learning abilities and ensuring that knowledge learned from context is retained, which may shift the role of humans from data providers to context providers [30]