红杉中国正式开源AI基准测试xbench评测集
news flash·2025-06-18 00:37

Core Viewpoint - Sequoia China has officially open-sourced its AI benchmarking tool xbench, introducing two evaluation sets: xbench-ScienceQA and xbench-DeepSearch [1] Group 1 - The evaluation sets will be dynamically updated based on the development of large models and AI agents [1] - A "black-box and white-box" mechanism will be employed to ensure that xbench serves a broader range of developers while minimizing overfitting issues commonly associated with static evaluation sets [1] - The goal is to ensure the long-term effectiveness of xbench in the evolving AI landscape [1]