Workflow
阿里云Qwen大模型
icon
Search documents
大模型“套壳”争议:自研与借力的边界何在?
Sou Hu Cai Jing· 2025-07-17 01:39
Core Viewpoint - The debate over "original research" versus "shell models" in the AI field has intensified, particularly focusing on the similarities between Huawei's Pangu model and Alibaba Cloud's Qwen model [1][2] Group 1: Development and Trends in AI Models - The rise of large models can be traced back to the Transformer architecture released by Google Brain in 2017, with three main types dominating the field: Decoder-only (like GPT), Encoder-Decoder (like T5), and Encoder-only (like BERT) [2] - The launch of ChatGPT in November 2022 based on GPT 3.5 attracted millions of users, marking the entry of large language models (LLMs) into public awareness and prompting many companies to enter the market [2] - The open-source era in 2023 has led to an increase in teams using open-source frameworks for model training, facilitating technological exchange and iteration [1][4] Group 2: Shell Model Controversies - Initial shell model behaviors often involved simple API wrapping without any secondary development, but regulatory scrutiny has increased, leading to penalties for such practices [3] - Despite regulatory actions, shell models continue to emerge, with some models being criticized for having "GPT-like" responses, raising questions about their originality [3][4] - The concept of "data distillation," where a strong "teacher model" generates high-quality data for training a "student model," has gained attention, especially after ByteDance was reported to have used OpenAI's API for data generation [4] Group 3: Open Source and Compliance Issues - The open-source movement has led to debates about whether using open-source model architectures for secondary development constitutes shell modeling, with various opinions on compliance and ethical boundaries [4][8] - A notable incident involved the Yi-34B model, which sparked discussions about compliance with the LLaMA open-source protocol, highlighting the complexities of defining shell models versus original research [5][7] - The lowering of development barriers in the open-source era has resulted in both positive advancements and negative shell behaviors, prompting ongoing discussions about the moral and legal implications of such practices [8][9] Group 4: Industry Perspectives - Major companies may lack foundational training logic and experience in model development, leading them to leverage open-source technologies for quicker advancements [9] - The AI industry recognizes that while using open-source technology is acceptable, it is crucial to provide clear documentation and avoid misrepresenting such efforts as original research [9]
大模型套壳往事
Hu Xiu· 2025-07-14 09:26
华为盘古大模型涉嫌套壳阿里云Qwen大模型的风波,再次将模型"原研"与"套壳"的讨论摆上了台面。 回溯三年前,在ChatGPT刚刚开启大模型航海时代时,那时候的套壳还停留在小作坊山寨ChatGPT的阶 段。调用ChatGPT的API,接口再包上一层"中文UI",就能在微信群里按调用次数卖会员。那一年,套 壳成了很多人通往AI财富故事的第一张船票。 同时,开始自主研发大模型的公司里,也不乏对ChatGPT的借力。这些企业虽然有着自研的模型架构, 但在微调阶段或多或少利用了ChatGPT或GPT-4等对话模型生成的数据来做微调。这些合成语料,既保 证了数据的多样性,又是经过OpenAI对齐后的高质量数据。借力ChatGPT可以说是行业内公开的秘密。 从2023年开始,大模型赛道进入开源时代,借助开源框架进行模型训练,成为了很多创业团队的选择。 越来越多的团队公开自己的研究成果,推动技术的交流与迭代,也让套壳开发成为了更普遍的行为。随 意之而的,争议性的套壳事件也逐渐增多,各种涉嫌套壳的事件屡次冲上热搜,随后又被相关方解释澄 清。 国内大模型行业也在"套"与"被套"中,轮番向前发展着。 一、GPT火爆的那一年:山寨 ...