Llama2
Search documents
不再依赖美国!新加坡国家AI计划“换心”阿里千问
Guan Cha Zhe Wang· 2025-11-25 10:49
11月24日,阿里云与新加坡国家人工智能计划(AISG)联合发布了一则重磅消息:新加坡最新的国家 级大语言模型"海狮"(Sea-Lion v4),将不再沿用此前的美国技术路线,而是全面基于阿里的通义千问 Qwen3-32B开源模型构建。 这是继硅谷大佬Chamath Palihapitiya宣布用Kimi取代OpenAI作为生产力工具,美国Vercel、Windsurf等 编程平台接入智谱模型,爱彼迎CEO表示阿里Qwen比美国模型更好用后,中国开源模型在全球市场的 最新成绩。而本次新加坡国家人工计划的认可也意味着在"主权AI"和"多语言适配"的赛道上,中国开源 大模型已经具备了替代甚至超越硅谷巨头的能力。 2023年12月,新加坡启动了一项7000万新元(5200万美元)的计划,旨在建立多模态大型语言模型 (LLM)的研究和工程能力,其中包括开发 Sea-Lion(东南亚语言一体化网络)。 然而,这片拥有6亿人口、数字经济规模奔向万亿美元的市场,长期以来却是西方AI的"盲区"。 这种"盲区"首先体现在数据的极度匮乏上。在Sea-Lion诞生前,被广泛使用的Meta Llama 2等主流模型 中,东南亚语言内 ...
“训练成本才这么点?美国同行陷入自我怀疑”
Guan Cha Zhe Wang· 2025-09-19 11:28
Core Insights - DeepSeek has achieved a significant breakthrough in AI model training costs, with the DeepSeek-R1 model's training cost reported at only $294,000, which is substantially lower than the costs disclosed by American competitors [1][2][4] - The model utilizes 512 NVIDIA H800 chips and has been recognized as the first mainstream large language model to undergo peer review, marking a notable advancement in the field [2][4] - The cost efficiency of DeepSeek's model challenges the notion that only countries with the most advanced chips can dominate the AI race, as highlighted by various media outlets [1][2][6] Cost and Performance - The training cost of DeepSeek-R1 is significantly lower than that of OpenAI's models, which have been reported to exceed $100 million [2][4] - DeepSeek's approach emphasizes the use of open-source data and efficient training methods, allowing for high performance at a fraction of the cost compared to traditional models [5][6] Industry Impact - The success of DeepSeek-R1 is seen as a potential game-changer in the AI landscape, suggesting that AI competition is shifting from resource quantity to resource efficiency [6][7] - The model's development has sparked discussions regarding China's position in the global AI sector, particularly in light of U.S. export restrictions on advanced chips [1][4] Technical Details - The latest research paper provides more detailed insights into the training process and acknowledges the use of A100 chips in earlier stages, although the final model was trained exclusively on H800 chips [4][5] - DeepSeek has defended its use of "distillation" techniques, which are common in the industry, to enhance model performance while reducing costs [5][6]