真钱买假模型？187篇论文被「套壳API」坑惨，准确率暴跌

Core Viewpoint - The recent paper from CISPA reveals that third-party APIs may be substituting advanced models with cheaper alternatives, leading to significant performance discrepancies in AI applications [1][10]. Group 1: Shadow API Market - The high costs and access barriers to cutting-edge models like GPT-5 and Gemini 2.5 have led to the emergence of a large third-party service market known as "Shadow APIs" [8]. - These Shadow APIs claim to provide unrestricted access to official models but often use inferior models instead, undermining the integrity of scientific research [10][11]. Group 2: Performance Discrepancies - A systematic audit of 17 Shadow API services revealed that 62% of the associated academic papers were accepted by top conferences like ACL, CVPR, and ICLR [14]. - In high-risk areas such as medical diagnostics (MedQA), the accuracy of Shadow APIs plummeted from 83.82% to an average of 36.95%, indicating a 47% performance gap [19]. - Similarly, in legal assessments (LegalBench), Shadow APIs underperformed by 40.10% to 42.73% compared to official endpoints [20]. Group 3: Economic Deception Mechanisms - The paper identifies three common economic deception mechanisms used by Shadow API providers: Information Premium, Discount-Substitution, and Resale Markup [31]. - Users pay for official rates (e.g., $14.84 for 1000 requests) but receive a value equivalent to only $5.70 to $7.77, resulting in substantial profit margins for providers [31][33]. Group 4: Implications for Research Integrity - The reliance on these Shadow APIs for serious research could severely undermine the credibility of AI research, as flawed data from these models may propagate through subsequent studies [35]. - A conservative estimate suggests that rectifying the data contamination in just 187 known papers could cost between $115,000 and $140,000 in computational and labor expenses [35]. Group 5: Recommendations - The authors recommend avoiding the use of unverified Shadow APIs in serious research workflows and suggest implementing strict validation protocols if their use is unavoidable [36].