数据蒸馏

Search documents
大模型“套壳”争议:自研与借力的边界何在?
Sou Hu Cai Jing· 2025-07-17 01:39
Core Viewpoint - The debate over "original research" versus "shell models" in the AI field has intensified, particularly focusing on the similarities between Huawei's Pangu model and Alibaba Cloud's Qwen model [1][2] Group 1: Development and Trends in AI Models - The rise of large models can be traced back to the Transformer architecture released by Google Brain in 2017, with three main types dominating the field: Decoder-only (like GPT), Encoder-Decoder (like T5), and Encoder-only (like BERT) [2] - The launch of ChatGPT in November 2022 based on GPT 3.5 attracted millions of users, marking the entry of large language models (LLMs) into public awareness and prompting many companies to enter the market [2] - The open-source era in 2023 has led to an increase in teams using open-source frameworks for model training, facilitating technological exchange and iteration [1][4] Group 2: Shell Model Controversies - Initial shell model behaviors often involved simple API wrapping without any secondary development, but regulatory scrutiny has increased, leading to penalties for such practices [3] - Despite regulatory actions, shell models continue to emerge, with some models being criticized for having "GPT-like" responses, raising questions about their originality [3][4] - The concept of "data distillation," where a strong "teacher model" generates high-quality data for training a "student model," has gained attention, especially after ByteDance was reported to have used OpenAI's API for data generation [4] Group 3: Open Source and Compliance Issues - The open-source movement has led to debates about whether using open-source model architectures for secondary development constitutes shell modeling, with various opinions on compliance and ethical boundaries [4][8] - A notable incident involved the Yi-34B model, which sparked discussions about compliance with the LLaMA open-source protocol, highlighting the complexities of defining shell models versus original research [5][7] - The lowering of development barriers in the open-source era has resulted in both positive advancements and negative shell behaviors, prompting ongoing discussions about the moral and legal implications of such practices [8][9] Group 4: Industry Perspectives - Major companies may lack foundational training logic and experience in model development, leading them to leverage open-source technologies for quicker advancements [9] - The AI industry recognizes that while using open-source technology is acceptable, it is crucial to provide clear documentation and avoid misrepresenting such efforts as original research [9]
对话27岁博导张林峰:模型压缩获CVPR满分有点意外,上海交大像我这样年轻老师很多
量子位· 2025-05-27 01:07
Core Viewpoint - Zhang Linfeng, a young professor at Shanghai Jiao Tong University, has made significant contributions to the field of model compression, particularly through innovative data distillation methods that enhance model efficiency and reduce training costs [2][4][27]. Group 1: Model Compression Techniques - Zhang Linfeng's team developed a new data distillation method that achieved a perfect score at CVPR 2025, utilizing a 6-year-old 2080Ti GPU with only 1/300 of the memory compared to previous state-of-the-art methods, while increasing speed by 20 times [2][4]. - The team introduced a novel distribution difference metric (NCFD) to transform the data distillation problem into a min-max optimization problem, significantly improving the quality of synthetic data and demonstrating scalability across various benchmark datasets [6][7]. - Their approach focuses on efficiently utilizing data to reduce the training costs of large AI models, aiming for a cost-saving ratio greater than 1 for training expenses versus data selection costs [9][10]. Group 2: Token Reduction Strategies - The team has explored token-level feature caching methods, achieving up to 9 times acceleration in diffusion language models with minimal performance loss, and extending this to multimodal models where up to 90% of tokens can be removed without sacrificing accuracy [11][12]. - The introduction of the Toca method allows for adaptive selection of tokens for caching, optimizing performance based on the specific task, such as image editing, where only relevant areas need computation [16][20]. - The latest TaylorSeer model aims to predict the next features instead of reusing previous ones, achieving close to 5 times acceleration across various models, including video generation tasks [18][20][24]. Group 3: Future Directions and Industry Impact - The overarching goal of Zhang Linfeng's research is to lower the deployment costs of large models, making them more applicable in real-world scenarios, particularly in video generation where the aim is to achieve real-time generation speeds [27][25]. - The evolution of model compression is seen as a response to the increasing size of AI models, with a shift from traditional methods to data-centric approaches that minimize knowledge loss during compression [38][44]. - The research outcomes have been open-sourced and are gradually being integrated into various models, indicating a significant impact on the industry and the potential for widespread application [23][26].
假的
猫笔刀· 2025-01-29 14:18
韩国那边也有很多闭麦对嘴型的演出,但通常是女团或者男团的唱跳演出,成员有剧烈的舞蹈动作和队形变换,这个时候实力不稳的成员唱出来的歌声犹 如杀猪,确实开不了麦。 但韩国也有一个潜规则,就是纯歌手(没有舞蹈动作,就是站着唱歌)基本都是现场live,因为你所有的表演内容就是唱歌,如果歌声还是假的你站那里 就没意义了。 春晚歌曲节目的预制菜含量高这是众所周知的,曾经有那么几年也尝试大力推行现场live,结果事故频发,比如王菲2012年和陈奕迅的《因为爱情》是真 唱,你们去搜,节目效果一言难尽。王菲的嗓子早就不太好了,每次唱高音都让人提心吊胆,昨晚那首突然稳如老狗当然是预制菜了。 昨晚我说某菲的节目是预制菜,底下有一些可能是粉丝给我一顿数落,我当然不是乱讲了,哪些是预制菜,哪些是现炒的,这对我来说不难分辨。 我有个二十多年的爱好,就是看kpop的打歌舞台,这么多年下来看了得有几千个了,哪些是闭麦演出,哪些是半开麦,哪些是全开麦,听多了就知道。 这次的人物塑造有亮点,我出人意料的喜欢上了申公豹这个角色,虽然只是几笔简单着墨,但一下子让这个惯常在封神故事里打酱油的配角立体丰满了起 来,影片结尾埋下了他的故事线,所以肯定会继 ...