大模型预训练
Search documents
MIT天才博士刚毕业,就被前OpenAI CTO抢走,年薪或300万起步
3 6 Ke· 2026-01-09 08:12
Core Insights - Guangxuan Xiao, a PhD graduate from MIT, has officially joined Thinking Machines to focus on pre-training large models [1][6][10] - His academic background includes dual degrees from Tsinghua University in Computer Science and Finance, along with numerous awards and research experiences [6][8][10] Group 1: Academic and Professional Background - Guangxuan Xiao graduated from Tsinghua University with dual degrees in Computer Science and Finance, receiving multiple prestigious awards during his studies [6][8] - He completed his PhD at MIT under the supervision of Professor Song Han, focusing on efficient algorithms and systems for large language models [10][18] - Xiao has interned at major tech companies, including Meta and NVIDIA, where he contributed to research on efficient attention mechanisms and large language model optimization [10][12][18] Group 2: Research Contributions - Xiao's doctoral thesis addresses significant challenges in large language models, proposing solutions for issues like memory overflow and slow inference [18][19] - His research introduced SmoothQuant, achieving lossless quantization for billion-parameter models without retraining, and enabling constant memory streaming inference for long sequences [19][20] - The thesis also includes innovative approaches like DuoAttention and XAttention, which enhance performance while reducing memory usage [19][20] Group 3: Company Insights - Thinking Machines offers competitive salaries, with average base salaries reaching $500,000, significantly higher than those at established companies like OpenAI and Anthropic [21][25] - The company is positioned to attract top talent in the AI field, reflecting the ongoing talent war in Silicon Valley [21][28]
前阿里、字节大模型带头人杨红霞创业:大模型预训练,不是少数顶尖玩家的算力竞赛|36氪独家
36氪· 2025-10-30 13:37
Core Viewpoint - The article discusses the emergence of a new AI paradigm led by Yang Hongxia, who aims to decentralize model training, contrasting with the centralized approaches of major companies like Alibaba and ByteDance [4][12][27]. Group 1: Yang Hongxia's Background and Vision - Yang Hongxia has over seven years of experience in large model research at Alibaba and ByteDance, where she contributed to the development of significant models like M6 and Tongyi Qianwen [5][6]. - After leaving ByteDance in July 2024, she founded InfiX.ai, focusing on model-related technologies and aiming to challenge existing centralized models [7][10]. - Yang's vision includes creating a decentralized model training framework that allows small and medium enterprises, research institutions, and individuals to participate in model training [13][16]. Group 2: Technical Innovations and Frameworks - InfiX.ai has recently open-sourced the world's first FP8 training framework, which enhances training speed and reduces memory consumption compared to the commonly used FP16/BF16 [17][18]. - The company has developed a model fusion technology that allows different domain-specific models to be combined, avoiding resource wastage from redundant training [20][21]. - The InfiMed framework enables the training of small-scale models with strong reasoning capabilities across various medical tasks, particularly in cancer detection [22][26]. Group 3: Market Position and Future Outlook - Yang believes that the future of AI will involve a collaborative approach where every company and institution can have its own expert model, leading to a globalized foundational model for various fields [30][31]. - The article highlights the growing acceptance of decentralized model training in the U.S., with significant funding being raised for companies pursuing this approach [28][29]. - InfiX.ai's focus on challenging fields like healthcare, particularly cancer, is seen as a strategic move to demonstrate the model's capabilities and differentiate it from competitors [72][73].
字节Seed新作:模型合并如何改变大模型预训练范式
机器之心· 2025-06-06 09:12AI Processing