Workflow
模型复用与抄袭界限
icon
Search documents
盘古大模型与通义千问,谁抄袭了谁?
Core Viewpoint - The controversy surrounding Huawei's Pangu 3.5 and Alibaba's Tongyi Qianwen 1.5-7B models centers on the high correlation score of 0.927 derived from the "LLM-Fingerprint" technology, suggesting potential similarities or derivation between the two models [1][14][16]. Group 1: Technical Analysis - The "LLM-Fingerprint" technology analyzes model responses to specific trigger words, generating a unique identity for each large model [12][11]. - A report indicated that the correlation score of 0.927 between Huawei's Pangu 3.5 and Alibaba's Tongyi Qianwen 1.5-7B is significantly higher than the scores between other mainstream models, which are generally below 0.1 [14][15]. - Huawei's defense against the allegations was deemed unscientific by external observers, as they pointed out that high correlation could also be found among different versions of the Tongyi Qianwen models [19][20]. Group 2: Open Source Culture and Ethics - The debate highlights the tension between "reuse" and "plagiarism" within the AI open-source ecosystem, raising questions about the ethical implications of model development [22][21]. - The high costs associated with developing large models, estimated at $12 million for effective training, make it common practice to build upon existing open-source models [25][26]. - The distinction between "reuse" and "plagiarism" remains ambiguous, particularly regarding model parameters and adherence to open-source licenses [28][29]. Group 3: Competitive Landscape - The incident reflects the intense competition between Huawei and Alibaba in the Chinese AI market, with Alibaba currently serving 90,000 enterprises through its Tongyi series models [37][42]. - Huawei's Pangu model is crucial for its strategy to establish a comprehensive AI ecosystem, while Alibaba has leveraged its cloud infrastructure and open-source ecosystem to gain a competitive edge [32][36]. - The silence from Alibaba's Tongyi Qianwen team amid the controversy suggests a strategic decision to avoid escalating the situation into a public dispute [40][47]. Group 4: Industry Implications - The controversy serves as a "stress test" for the current AI open-source ecosystem, exposing its vulnerabilities and the lag in governance [52]. - The industry is urged to establish clearer rules regarding model citation and derivation standards, akin to plagiarism detection systems in academia [53]. - There is a call for greater transparency in model development processes, including the promotion of "Model Cards" and data transparency [54].