Core Insights - The article discusses the launch of CL-bench, a benchmark designed to evaluate the ability of large models to learn from context, led by Yao Shunyu, Tencent's Chief AI Scientist [1][2][4] - The research emphasizes that the focus should shift from merely increasing model size to ensuring models can effectively learn and apply knowledge in real-world tasks [5][10] - Current leading models, including GPT-5.1, show disappointing performance, with a task-solving rate of only 23.7%, indicating a significant gap in their contextual learning capabilities [7][29] Summary by Sections Context Learning Importance - The research highlights that while advanced models excel in standardized tests, they struggle in real-world applications where contextual learning is crucial [9][10] - Human learning relies on real-time context rather than static knowledge, which current models fail to replicate [11][14] CL-bench Design and Objectives - CL-bench consists of 500 complex contexts, 1899 tasks, and 31607 validation criteria, designed to require models to learn new knowledge from context [15][19] - The benchmark aims to assess models' abilities to apply knowledge from unfamiliar domains, rule systems, and procedural tasks [18][22] Model Performance Evaluation - Ten leading models were evaluated on CL-bench, with an average task-solving rate of only 17.2%, underscoring their inability to learn from complex contexts [28][29] - The best-performing model, GPT-5.1, achieved a maximum of 23.7%, revealing a widespread issue across models in contextual learning [30] Error Analysis - The analysis identified that ignoring or misusing context is a primary reason for model failures, with many errors stemming from the models' reliance on pre-trained static knowledge [31][32] - Models performed poorly in tasks requiring inductive reasoning from experimental data, often achieving less than 10% success [32] Future Directions - The research team aims to advance contextual learning in AI, moving beyond merely providing context to ensuring models can genuinely learn from it [36][40] - The collaboration between Tencent and Fudan University reflects a commitment to enhancing AI's practical applications in real-world scenarios [39]
姚顺雨腾讯首篇论文:给AI下半场指路“上下文学习”