Workflow
深度思考能力
icon
Search documents
AI事实核查与伦理判断能力如何?新京报第三期大模型测评启动
Bei Ke Cai Jing· 2025-06-23 10:42
Core Insights - The report highlights the advancements in AI large models, particularly in enhancing media capabilities such as text generation, fact-checking, and information retrieval [1][2] - The third phase of the evaluation report will be released in July during the Beike Finance Summit, focusing on the effectiveness of large models in media work [2] Group 1: Evaluation Findings - The previous evaluation in January 2025 ranked large models' information gathering, translation, and long text summarization capabilities as the top three, while fact-checking and ethical judgment ranked lowest [1] - Compared to the first evaluation, the information gathering capability improved from third to first place, and long text summarization capability rose from last to third place, indicating significant progress in these areas [1] Group 2: Industry Developments - The emergence of DeepSeek has popularized deep thinking capabilities in large models, leading to the introduction of such features in most mainstream large model products [2] - The exponential growth of AI-generated content has led to issues with "hallucinated" content, which has affected the accuracy of results generated by large models during online searches [2]