数据蒸馏 - filings, earnings calls, financial reports, news

数据蒸馏

Search documents

Sou Hu Cai Jing· 2025-07-17 01:39

Core Viewpoint - The debate over "original research" versus "shell models" in the AI field has intensified, particularly focusing on the similarities between Huawei's Pangu model and Alibaba Cloud's Qwen model [1][2] Group 1: Development and Trends in AI Models - The rise of large models can be traced back to the Transformer architecture released by Google Brain in 2017, with three main types dominating the field: Decoder-only (like GPT), Encoder-Decoder (like T5), and Encoder-only (like BERT) [2] - The launch of ChatGPT in November 2022 based on GPT 3.5 attracted millions of users, marking the entry of large language models (LLMs) into public awareness and prompting many companies to enter the market [2] - The open-source era in 2023 has led to an increase in teams using open-source frameworks for model training, facilitating technological exchange and iteration [1][4] Group 2: Shell Model Controversies - Initial shell model behaviors often involved simple API wrapping without any secondary development, but regulatory scrutiny has increased, leading to penalties for such practices [3] - Despite regulatory actions, shell models continue to emerge, with some models being criticized for having "GPT-like" responses, raising questions about their originality [3][4] - The concept of "data distillation," where a strong "teacher model" generates high-quality data for training a "student model," has gained attention, especially after ByteDance was reported to have used OpenAI's API for data generation [4] Group 3: Open Source and Compliance Issues - The open-source movement has led to debates about whether using open-source model architectures for secondary development constitutes shell modeling, with various opinions on compliance and ethical boundaries [4][8] - A notable incident involved the Yi-34B model, which sparked discussions about compliance with the LLaMA open-source protocol, highlighting the complexities of defining shell models versus original research [5][7] - The lowering of development barriers in the open-source era has resulted in both positive advancements and negative shell behaviors, prompting ongoing discussions about the moral and legal implications of such practices [8][9] Group 4: Industry Perspectives - Major companies may lack foundational training logic and experience in model development, leading them to leverage open-source technologies for quicker advancements [9] - The AI industry recognizes that while using open-source technology is acceptable, it is crucial to provide clear documentation and avoid misrepresenting such efforts as original research [9]

大模型套壳

模型自研

数据蒸馏

Artificial Intelligence

Artificial Intelligence

ChatGPT

讯飞星火大模型

大模型套壳往事

Hu Xiu· 2025-07-14 09:26

Core Viewpoint - The article discusses the ongoing debate in the AI industry regarding "original research" versus "shelling" models, particularly in the context of the emergence of large language models (LLMs) and the practices surrounding their development and deployment [1][2]. Group 1: Historical Context of Model Development - The AI evolution can be traced back to the 2017 release of the Transformer architecture by Google Brain, which remains foundational in the development of various large models today [3]. - The introduction of ChatGPT in November 2022 marked a significant moment, leading to a surge in the development of models, including many that resorted to "shelling" practices to monetize access to ChatGPT's capabilities [4][5]. Group 2: Shelling Practices and Controversies - By the end of 2022, numerous imitation ChatGPT platforms emerged, with developers simply repackaging APIs for profit, leading to regulatory scrutiny [6][7]. - In May 2023, concerns arose regarding the iFlytek Spark model, which allegedly claimed to be developed by OpenAI, highlighting the issue of "identity confusion" in model outputs due to training data contamination [8][9]. Group 3: Data Distillation and Model Training - Data distillation is a method where a powerful "teacher model" generates high-quality data for a "student model" to learn from, which has become a common practice in the industry [9][10]. - The controversy surrounding ByteDance's use of OpenAI's API for data generation raised questions about compliance with usage terms, illustrating the blurred lines between legitimate use and shelling [10]. Group 4: The Open Source Era - The shift to open-source models began in 2023, with many companies opting to release their models to foster innovation and collaboration within the developer community [13][16]. - The emergence of open-source models has led to debates about the legitimacy of using existing architectures for new model development, as seen in the case of Baichuan-7B and Yi-34B [13][14]. Group 5: Industry Dynamics and Future Outlook - The AI industry is witnessing a "hundred model war," where approximately 90% of models are built on open-source frameworks, allowing smaller teams to innovate without starting from scratch [16][17]. - The introduction of lightweight fine-tuning methods has lowered the barriers for model development, enabling more companies to enhance their operational efficiency [17][18]. - The ongoing discussions about the ethical boundaries of shelling and original research highlight the complexities of intellectual property and innovation in the rapidly evolving AI landscape [22][23].

大模型套壳

数据蒸馏

模型自研

Artificial Intelligence

Artificial Intelligence

ChatGPT

阿里云Qwen大模型

对话27岁博导张林峰：模型压缩获CVPR满分有点意外，上海交大像我这样年轻老师很多

量子位· 2025-05-27 01:07

Core Viewpoint - Zhang Linfeng, a young professor at Shanghai Jiao Tong University, has made significant contributions to the field of model compression, particularly through innovative data distillation methods that enhance model efficiency and reduce training costs [2][4][27]. Group 1: Model Compression Techniques - Zhang Linfeng's team developed a new data distillation method that achieved a perfect score at CVPR 2025, utilizing a 6-year-old 2080Ti GPU with only 1/300 of the memory compared to previous state-of-the-art methods, while increasing speed by 20 times [2][4]. - The team introduced a novel distribution difference metric (NCFD) to transform the data distillation problem into a min-max optimization problem, significantly improving the quality of synthetic data and demonstrating scalability across various benchmark datasets [6][7]. - Their approach focuses on efficiently utilizing data to reduce the training costs of large AI models, aiming for a cost-saving ratio greater than 1 for training expenses versus data selection costs [9][10]. Group 2: Token Reduction Strategies - The team has explored token-level feature caching methods, achieving up to 9 times acceleration in diffusion language models with minimal performance loss, and extending this to multimodal models where up to 90% of tokens can be removed without sacrificing accuracy [11][12]. - The introduction of the Toca method allows for adaptive selection of tokens for caching, optimizing performance based on the specific task, such as image editing, where only relevant areas need computation [16][20]. - The latest TaylorSeer model aims to predict the next features instead of reusing previous ones, achieving close to 5 times acceleration across various models, including video generation tasks [18][20][24]. Group 3: Future Directions and Industry Impact - The overarching goal of Zhang Linfeng's research is to lower the deployment costs of large models, making them more applicable in real-world scenarios, particularly in video generation where the aim is to achieve real-time generation speeds [27][25]. - The evolution of model compression is seen as a response to the increasing size of AI models, with a shift from traditional methods to data-centric approaches that minimize knowledge loss during compression [38][44]. - The research outcomes have been open-sourced and are gradually being integrated into various models, indicating a significant impact on the industry and the potential for widespread application [23][26].

猫笔刀· 2025-01-29 14:18

韩国那边也有很多闭麦对嘴型的演出，但通常是女团或者男团的唱跳演出，成员有剧烈的舞蹈动作和队形变换，这个时候实力不稳的成员唱出来的歌声犹如杀猪，确实开不了麦。但韩国也有一个潜规则，就是纯歌手（没有舞蹈动作，就是站着唱歌）基本都是现场live，因为你所有的表演内容就是唱歌，如果歌声还是假的你站那里就没意义了。春晚歌曲节目的预制菜含量高这是众所周知的，曾经有那么几年也尝试大力推行现场live，结果事故频发，比如王菲2012年和陈奕迅的《因为爱情》是真唱，你们去搜，节目效果一言难尽。王菲的嗓子早就不太好了，每次唱高音都让人提心吊胆，昨晚那首突然稳如老狗当然是预制菜了。昨晚我说某菲的节目是预制菜，底下有一些可能是粉丝给我一顿数落，我当然不是乱讲了，哪些是预制菜，哪些是现炒的，这对我来说不难分辨。我有个二十多年的爱好，就是看kpop的打歌舞台，这么多年下来看了得有几千个了，哪些是闭麦演出，哪些是半开麦，哪些是全开麦，听多了就知道。这次的人物塑造有亮点，我出人意料的喜欢上了申公豹这个角色，虽然只是几笔简单着墨，但一下子让这个惯常在封神故事里打酱油的配角立体丰满了起来，影片结尾埋下了他的故事线，所以肯定会继 ...