模型融合
Search documents
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|36氪独家
36氪· 2025-10-30 13:37
Core Viewpoint - The article discusses the emergence of a new AI paradigm led by Yang Hongxia, who aims to decentralize model training, contrasting with the centralized approaches of major companies like Alibaba and ByteDance [4][12][27]. Group 1: Yang Hongxia's Background and Vision - Yang Hongxia has over seven years of experience in large model research at Alibaba and ByteDance, where she contributed to the development of significant models like M6 and Tongyi Qianwen [5][6]. - After leaving ByteDance in July 2024, she founded InfiX.ai, focusing on model-related technologies and aiming to challenge existing centralized models [7][10]. - Yang's vision includes creating a decentralized model training framework that allows small and medium enterprises, research institutions, and individuals to participate in model training [13][16]. Group 2: Technical Innovations and Frameworks - InfiX.ai has recently open-sourced the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [17][18]. - The company has developed a model fusion technology that allows different domain-specific models to be combined, avoiding resource wastage from redundant training [20][21]. - The InfiMed framework enables the training of small-scale models with strong reasoning capabilities across various medical tasks, particularly in cancer detection [22][26]. Group 3: Market Position and Future Outlook - Yang believes that the future of AI will involve a collaborative approach where every company and institution can have its own expert model, leading to a globalized foundational model for various fields [30][31]. - The article highlights the growing acceptance of decentralized model training in the U.S., with significant funding being raised for companies pursuing this approach [28][29]. - InfiX.ai's focus on challenging fields like healthcare, particularly cancer, is seen as a strategic move to demonstrate the model's capabilities and differentiate it from competitors [72][73].
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|智能涌现独家
Sou Hu Cai Jing· 2025-10-30 08:35
Core Insights - Yang Hongxia, a key figure in large model research from Alibaba and ByteDance, has launched a new AI company, InfiX.ai, focusing on decentralized model training and innovation in the AI space [1][15][36] - InfiX.ai aims to democratize access to large model training, allowing small and medium enterprises, research institutions, and individuals to participate in the process [4][16][19] Company Overview - InfiX.ai was founded by Yang Hongxia after her departure from ByteDance, with a focus on model-related technologies [1][15] - The company has quickly assembled a team of 40 people in Hong Kong, leveraging the region's strong talent pool and funding opportunities [3][15] Technological Innovations - InfiX.ai is developing a decentralized approach to large model training, contrasting with the centralized models dominated by major institutions [4][16] - The company has released the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [7][10] - InfiX.ai's model fusion technology allows for the integration of different domain-specific models, reducing resource waste and enhancing knowledge sharing [10][16] Market Positioning - The company is targeting challenging fields, particularly in healthcare, with a focus on cancer detection, to demonstrate the capabilities of its models [15][41] - InfiX.ai's approach is gaining traction, with increasing interest from investors and a shift in perception towards decentralized model training in the industry [15][36] Future Vision - Yang Hongxia envisions a future where every organization has its own expert model, facilitated by model fusion across different domains and geographical boundaries [16][19] - The company aims to make model training accessible and affordable, fostering a collaborative environment for AI development [16][19]
参数空间对称性:深度学习理论的统一几何框架
机器之心· 2025-10-29 09:25
Core Insights - The article discusses the evolution of deep learning models from millions to billions of parameters, highlighting the lack of systematic understanding of their effectiveness [2] - A key focus is on the concept of parameter space symmetry, which refers to the existence of multiple parameter configurations that yield the same model function, complicating optimization and generalization analysis [4][6] Group 1: Parameter Space Symmetry - Parameter space symmetry allows different parameter combinations to produce identical outputs, exemplified by the interchange of neurons in hidden layers [4][6] - This symmetry is mathematically defined as transformations that keep the loss function invariant, forming a group that defines equivalent orbits in parameter space [6] Group 2: Types of Symmetry - In addition to discrete symmetries, most neural network architectures exhibit continuous symmetries, such as scaling and linear transformations, which maintain function invariance [8] - Complex architectures like Transformers combine various symmetries from their components, including multi-head attention mechanisms [8] Group 3: Impact on Loss Landscape - Symmetry creates a complex yet structured optimization space, where continuous symmetries can stretch isolated minima into flat manifolds, affecting the interpretation of generalization metrics [10] - Observed phenomena like "mode connectivity," where independently trained models can connect through low-loss paths, are partially attributed to continuous symmetries [10] Group 4: Optimization Methods - The presence of symmetry leads to the phenomenon of "equal loss, different gradients," suggesting new algorithmic possibilities for optimization methods that seek better gradient points within equivalent orbits [15][19] - Some optimization strategies leverage symmetry as a degree of freedom, while others aim to reduce it as redundancy, indicating its importance in algorithm design [19] Group 5: Learning Dynamics - Continuous symmetries correspond to conserved quantities, which remain constant during training, revealing insights into the stability of the training process and the implicit bias of optimization [21][23] - The structure of parameter space symmetry influences the statistical distribution of learning trajectories and outcomes [23] Group 6: Connections Across Spaces - Parameter space symmetry is interconnected with data space and internal representation space, where model parameters often reflect the symmetry present in the data distribution [27][28] - Emerging directions like Weight Space Learning utilize symmetry as a new data structure, facilitating the analysis and generation of model properties [28][29] Group 7: Future Directions - The widespread existence of parameter space symmetry offers a new mathematical language for deep learning, linking complex behaviors of models with established tools from group theory and geometry [30] - This perspective is influencing various practical fields, from optimization acceleration to model fusion and new model design, transforming theoretical concepts into actionable algorithmic principles [30]
腾讯研究院AI速递 20250827
腾讯研究院· 2025-08-26 16:01
Group 1: Generative AI Developments - Nvidia has launched the Jet-Nemotron small model series, which features significant performance improvements over mainstream open-source models, achieving a 53.6x increase in inference throughput on H100 GPUs [1] - The MiniCPM-V 4.5 model from Mianbi has demonstrated superior performance in video understanding, outperforming a 72B parameter model with only 8B parameters [2] - Microsoft's VibeVoice-1.5B audio model can synthesize 90 minutes of realistic speech and achieves a compression efficiency 80 times better than mainstream models [3] Group 2: Innovative Model Fusion Techniques - Sakana AI introduced the M2N2 model fusion method, inspired by natural evolution, which enhances model integration through competition and attraction mechanisms [4] Group 3: AI Search and Revenue Sharing - Perplexity has established a $42.5 million fund to share revenue generated from AI searches with publishers, offering 80% of subscription revenue from Comet Plus to participating publishers [7] Group 4: Legal and Market Dynamics - Elon Musk's X company has filed a lawsuit against Apple and OpenAI, claiming they maintain a monopoly that hinders competition from innovators like X and xAI [8] Group 5: Robotics and AI Integration - Nvidia's Jetson Thor chip, designed for robotics, boasts 7.5 times the AI computing power of its predecessor, supporting real-time generative AI model operations [9] Group 6: AI in Education - OpenAI's education head noted that 70% of employers prefer hiring candidates skilled in AI over those with extensive experience but lacking AI knowledge [10] Group 7: Government Initiatives - The Chinese government has released an opinion document aiming for deep integration of AI across six key sectors by 2027, emphasizing the need for foundational support in various areas [12]
ICML 2025 | CoTo:让LoRA训练「渐入佳境」,模型融合、剪枝样样精通
机器之心· 2025-07-26 12:17
Core Viewpoint - The article introduces CoTo, a progressive training strategy designed to enhance the robustness and effectiveness of Low-Rank Adaptation (LoRA) models, addressing issues such as training instability and performance drop after pruning [1][4][23]. Summary by Sections Conventional LoRA Training Issues - LoRA faces challenges including "lazy training," where optimization gets stuck near suboptimal solutions, limiting generalization [7] - There is a hierarchical imbalance in training, with gradient updates concentrated on top layers, leading to undertraining of lower layers [7] - These issues complicate downstream operations like model fusion and pruning, often resulting in unsatisfactory outcomes [7] CoTo Strategy - CoTo employs a simple yet effective progressive activation strategy, initially deactivating a portion of LoRA adapters to encourage uniform gradient flow across all layers [5][8] - The activation probability of adapters is gradually increased during training, returning to standard fine-tuning mode in later stages [8] Experimental Results - CoTo significantly improves the fusion and pruning capabilities of LoRA models, enhancing single-task generalization performance and training efficiency [12][23] - In linear interpolation tasks, CoTo models maintain smooth performance transitions, unlike standard LoRA, which experiences sharp declines [13] - CoTo outperforms standard LoRA in both structured and unstructured pruning scenarios, demonstrating enhanced fault tolerance [17] Performance and Efficiency Improvements - CoTo consistently boosts performance across various benchmarks, including visual and language tasks, and achieves over 24% training acceleration when applied to HiRA [24][23] Ablation Studies - Rigorous ablation studies validate the design choices of CoTo and provide insights into effective regularization of LoRA [21] Conclusion - CoTo effectively resolves hierarchical imbalance and lazy optimization issues in LoRA training, enhancing model robustness and simplifying downstream operations like fusion and pruning [23]
不用等R2了!第三方给新版DeepSeek V3添加深度思考,推理101秒破解7米甘蔗过2米门
量子位· 2025-04-28 06:36
1.2T万亿参数,5.2PB训练数据,高效利用华为芯片……只能说如果有一半是真的都很牛了。 HuggingFace创始人此时推荐"以不变应万变",打开官方认证账号的更新提醒,就能第一时间获取通知。 梦晨 发自 凹非寺 量子位 | 公众号 QbitAI DeepSeek即将发布R2??坊间传闻越来越多了,且 难辨真假 。 抛开具体泄露数据是否准确,大家似乎有一个共识: 如果真的有R2,它的基础模型会是新版DeepSeek V3-0324 。 之所以有很多人相信R2会在4月底发布,有一部分原因也是出于R1与V3之间相隔了一个月左右。 现在,等不及DeepSeek官方, 开源社区已经开始自己动手给V3-0324加入深度思考了 。 新模型 DeepSeek-R1T-Chimera ,能力与原版R1相当,但速度更快,输出token减少40%,也是基于MIT协议开放权重。 相当于拥有接近R1的能力和接近V3-0324的速度,结合了两者的优点。 而且做到这一点,不是靠微调或蒸馏,而是DeepSeek V3-0324和R1两个模型融合而成。 R1+V3融合模型 新模型R1T-Chimera并非DeepSeek官方出品,而是来 ...