大模型预训练
Search documents
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|36氪独家
36氪· 2025-10-30 13:37
Core Viewpoint - The article discusses the emergence of a new AI paradigm led by Yang Hongxia, who aims to decentralize model training, contrasting with the centralized approaches of major companies like Alibaba and ByteDance [4][12][27]. Group 1: Yang Hongxia's Background and Vision - Yang Hongxia has over seven years of experience in large model research at Alibaba and ByteDance, where she contributed to the development of significant models like M6 and Tongyi Qianwen [5][6]. - After leaving ByteDance in July 2024, she founded InfiX.ai, focusing on model-related technologies and aiming to challenge existing centralized models [7][10]. - Yang's vision includes creating a decentralized model training framework that allows small and medium enterprises, research institutions, and individuals to participate in model training [13][16]. Group 2: Technical Innovations and Frameworks - InfiX.ai has recently open-sourced the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [17][18]. - The company has developed a model fusion technology that allows different domain-specific models to be combined, avoiding resource wastage from redundant training [20][21]. - The InfiMed framework enables the training of small-scale models with strong reasoning capabilities across various medical tasks, particularly in cancer detection [22][26]. Group 3: Market Position and Future Outlook - Yang believes that the future of AI will involve a collaborative approach where every company and institution can have its own expert model, leading to a globalized foundational model for various fields [30][31]. - The article highlights the growing acceptance of decentralized model training in the U.S., with significant funding being raised for companies pursuing this approach [28][29]. - InfiX.ai's focus on challenging fields like healthcare, particularly cancer, is seen as a strategic move to demonstrate the model's capabilities and differentiate it from competitors [72][73].
字节Seed新作:模型合并如何改变大模型预训练范式
机器之心· 2025-06-06 09:12AI Processing