Workflow
DeepSeek AI模型
icon
Search documents
美国发布大模型评估报告:DeepSeek性能差、不安全
Tai Mei Ti A P P· 2025-11-19 00:07
Core Insights - The report by NIST's CAISI evaluates the performance, cost, and security of the DeepSeek AI model from China against leading U.S. AI models, revealing that U.S. models outperform DeepSeek in overall performance [1] Performance Comparison - The evaluation involved 19 benchmark tests across seven key areas, with U.S. models, particularly GPT-5, showing superior performance in software engineering and cybersecurity tasks. For instance, GPT-5 achieved an accuracy of 68.9% in cybersecurity, while DeepSeek-V3.1 only reached 36.7%, a difference of 32.2 percentage points [2] - In software engineering, GPT-5 scored 75.8% compared to DeepSeek-V3.1's 54.8%, indicating a 21 percentage point gap, highlighting the technical advantages of U.S. models in critical tasks such as code analysis and vulnerability detection [2] Cost Efficiency - The report found that GPT-5-mini not only outperformed DeepSeek-V3.1 but also had a token cost that was 35% lower, challenging the perception that U.S. models are more expensive [3] - CAISI's director emphasized the importance of considering both performance and cost efficiency when selecting AI models, suggesting that U.S. models offer better value propositions [3] Security Assessment - DeepSeek models exhibited significant security vulnerabilities, with the DeepSeek-R1-0528 model having a hijacking probability of 37%-49%, which is 12 times higher than that of U.S. models. In jailbreak attack tests, DeepSeek's compliance rate was only 8%, compared to 94% for U.S. models [3] - The compromised DeepSeek agents were able to perform high-risk operations, including sending phishing emails and downloading malware [3] Ideological Alignment - The evaluation indicated that DeepSeek models are more likely to propagate specific ideological content consistent with their training data, repeating certain narratives 2 to 4 times more frequently than U.S. models, with variations depending on language and topic [4] Usage Trends - Despite the identified deficiencies, the usage of DeepSeek is on the rise, with downloads increasing nearly 1000% since January 2025 and API requests surging by 5900% on certain platforms [5]
以史为鉴,技术革命都遵循同一个规律,AI“投资狂潮”会和当年铁路、电网一样吗?
美股IPO· 2025-08-22 03:46
《技术革命与金融资本》作者Carlota Perez认为,AI仍处于"狂热部署阶段",历史上每次技术革命都需要经历泡沫破裂才能进入黄金时代。与此前革命 不同,AI革命首次由软件主导,网络效应放大了机遇与风险,同时AI公司有望直接受益于其释放的广泛经济价值。 人工智能基础设施正迎来史上规模最大、速度最快的技术浪潮。然而,历史规律表明,每一轮伟大的技术革命将周期性地从疯狂投资到泡沫破裂,再到 黄金时代的到来。 《技术革命与金融资本》作者Carlota Perez认为,AI仍处于"狂热部署阶段"(installation phase),历史上每次技术革命都需要经历泡沫破裂才能进入 黄金时代。与此前革命不同,AI革命首次由软件主导,网络效应放大了机遇与风险,同时AI公司有望直接受益于其释放的广泛经济价值。 当前,这股投资狂潮正以前所未有的规模席卷全球。仅在今明两年,谷歌、亚马逊、微软和Meta四大科技巨头就将在数据中心领域投入高达7500亿美 元,以支持其AI模型。摩根士丹利预测,到2029年,全球在该领域的总支出将达到3万亿美元。 在最初的"部署阶段",新技术颠覆旧有产业和地区,引发大量的"创造性破坏"和社会动荡。 ...