Workflow
大模型开源
icon
Search documents
华为大模型也加入开源大军了
Hua Er Jie Jian Wen· 2025-06-30 10:16
Core Insights - Huawei has officially announced the open-sourcing of its Pangu models, including a 7 billion parameter dense model and a 72 billion parameter mixture of experts (MoE) model, marking its first foray into open-source AI models [3][4][6] - This move aligns with Huawei's Ascend ecosystem strategy, aimed at promoting AI technology research and innovation, and accelerating the application and value creation of AI across various industries [3][7] - The open-sourced models are designed for broad applicability, with the dense model optimized for deployment on Ascend NPU, demonstrating superior performance in complex reasoning benchmarks compared to similar models [3][4] Model Specifications - The 7 billion parameter dense model features a dual-system framework, allowing it to switch between "fast thinking" and "slow thinking" modes based on task complexity, making it suitable for applications like intelligent customer service and knowledge bases [3][4] - The 72 billion parameter MoE model introduces a grouping mechanism during the expert selection phase, ensuring balanced computational load across devices, thus enhancing training efficiency and inference performance for complex tasks [4] Industry Context - The trend of open-sourcing large models has gained momentum, with companies like OpenAI and Baidu also shifting towards open-source strategies to leverage global developer support for accelerated model development [5][6] - The emergence of DeepSeek has significantly impacted the AI industry, showcasing the value of open-source models and prompting closed-source advocates to reconsider their strategies [5][6] Strategic Implications - Huawei's decision to open-source its Pangu models is seen as a response to the broader industry trend, positioning the company strategically in the global AI competition [6][10] - The open-sourcing initiative is expected to attract developers to create industry applications based on the Pangu models, forming a closed-loop ecosystem of "model - application - hardware" around the Ascend platform [8][9] Technological Advancements - Huawei has also launched a new generation of Ascend AI cloud services based on CloudMatrix 384 super nodes, significantly enhancing inference throughput and efficiency for large model applications [8] - The super node architecture supports parallel inference for multiple experts, improving resource allocation and increasing effective utilization rates [8]
从文心开源谈起,论大模型发展新生态
AI科技大本营· 2025-06-30 09:52
Core Viewpoint - Baidu has officially announced the open-source release of the ERNIE 4.5 series model, marking a significant step in the development of domestic large models and enhancing its position in the AI ecosystem [1] Group 1: Model Details - The ERNIE 4.5 series includes a MoE model with 47 billion and 3 billion active parameters, as well as a dense model with 0.3 billion parameters, with complete open-source pre-training weights and inference code [1] - The new multi-modal heterogeneous model structure proposed by the ERNIE team allows for cross-modal parameter sharing, enhancing multi-modal understanding while maintaining dedicated parameter spaces for individual modalities [1] Group 2: Industry Impact - Baidu's open-source initiative positions it as a key player in the global AI development community, aiming to make the "Wenxin" model a representative of domestic large models that developers can effectively utilize [1] - The open-source release is seen as a response to the evolving landscape of AI, where companies are exploring ways to transition AI from laboratory settings to practical applications in everyday life [5] Group 3: Expert Insights - A panel discussion featuring industry experts will delve into the implications of Baidu's open-source strategy, the future of large models, and the competitive landscape of AI technology [2][3][4]
华为首个开源大模型来了!Pro MoE 720亿参数,4000颗昇腾训练
Hua Er Jie Jian Wen· 2025-06-30 07:27
Core Insights - Huawei has announced the open-sourcing of its Pangu models, including the 70 billion parameter dense model and the 720 billion parameter mixture of experts (MoE) model, marking a significant step in the domestic large model open-source competition [1][3][20] Model Performance - The Pangu Pro MoE model achieves a single-card inference throughput of 1148 tokens/s on the Ascend 800I A2, which can be further enhanced to 1528 tokens/s using speculative acceleration technology, outperforming similar-sized dense models [3][11] - The Pangu Pro MoE model is built on the MoGE architecture, with a total parameter count of 720 billion and an active parameter count of 160 billion, optimized specifically for Ascend hardware [4][11] Training and Evaluation - Huawei utilized 4000 Ascend NPUs for pre-training on a high-quality corpus of 13 trillion tokens, divided into general, inference, and annealing phases to progressively enhance model capabilities [11] - The Pangu Pro MoE model has demonstrated superior performance in various benchmarks, including achieving a score of 91.2 in the DROP benchmark, closely matching the best current models [12][14] Competitive Landscape - The open-sourcing of Pangu models coincides with a wave of domestic AI model releases, with leading companies like MiniMax and Alibaba also upgrading their open-source models, leading to a price reduction of 60%-80% for large models [3][20] - The Pangu Pro MoE model ranks fifth in the SuperCLUE Chinese large model benchmark, surpassing several existing models and indicating its competitive position in the market [17][18] Technological Integration - Huawei's ecosystem, integrating chips (Ascend NPU), frameworks (MindSpore), and models (Pangu), represents a significant technological achievement, providing a viable high-performance alternative to Nvidia's dominance in the industry [20]
刚刚,华为发布!
中国基金报· 2025-06-30 04:05
【导读】华为首次开源盘古大模型,包含 70 亿和 720 亿参数模型 中国基金报记者 张燕北 6 月 30 日,华为宣布开源盘古 70 亿参数的稠密模型、盘古 Pro MoE 720 亿参数的混合专 家模型,以及基于昇腾的模型推理技术。 华为表示,此举是华为践行昇腾生态战略的又一关键举措,推动大模型技术的研究与创新发 展,加速推进人工智能在千行百业的应用与价值创造。 据华为官网信息,此次是华为首次将盘古大模型的核心能力开源,本次开源主要包括:盘古 Pro MoE 72B 模型权重、基础推理代码,已正式上线开源平台;基于昇腾的超大规模 MoE 模型推理代码,已正式上线开源平台;盘古 7B 相关模型权重与推理代码将于近期上线开源 平台。 华为表示, " 我们诚邀全球开发者、企业伙伴及研究人员下载使用,反馈使用意见,共同完 善。 " (来源:开源开发者平台 GitGo ) 据了解,盘古是华为推出的一系列超大规模人工智能预训练模型,涵盖自然语言处理、计算 机视觉、科学计算等多个领域。其名称寓意 " 开天辟地 " ,象征着华为在人工智能基础研究 和行业应用上的突破性探索。盘古模型自发布以来,已在多个行业中实现落地,包括 ...
华为缘何开源盘古大模型?
Tai Mei Ti A P P· 2025-06-30 03:23
这也是华为首度宣布开源盘古大模型,大模型开源的本质是以开放换生态,以生态养技术。 对于华为而言,本次并不是全面开源,而是选择了两款用量相对较多的模型,70亿参数的稠密模型参数 量适中、性能均衡、部署门槛较低,在智能客服、知识库等多种场景中均可应用;盘古Pro MoE 720亿 参数的混合专家模型凭借其稀疏激活、动态路由和多专家协作的特性,更适合处理相对复杂的任务。 不排除华为未来继续开源的可能性,一般而言,华为在做好技术稳定性等优化的基础上,先用两款模型 试验开发者和市场反应,在可用性和易用性方面持续优化,再进一步开源。开源只是第一步,如何持续 运营开源生态,远比开源本身更重要。 其中值得关注的是,本次华为还开源了基于昇腾的模型推理技术,国产AI之难,难在芯片,更难在生 态,如果要让开发者更好地调用盘古等国产模型,就要让底层的AI基础设施实现更优适配,这也是华 为开源基于昇腾的模型推理技术的意义。 6月30日消息,华为正式宣布开源盘古70亿参数的稠密模型、盘古Pro MoE 720亿参数的混合专家模型和 基于昇腾的模型推理技术。 华为官方表示,此举是华为践行昇腾生态战略的又一关键举措,推动大模型技术的研究与创 ...
百度正式开源文心大模型4.5系列模型
第一财经· 2025-06-30 03:12
6月30日,百度正式开源文心大模型4.5系列模型,涵盖47B、3B激活参数的混合专家(MoE)模 型,与0.3B参数的稠密型模型等10款模型,并实现预训练权重和推理代码的完全开源。目前,文心 大模型4.5开源系列可在飞桨星河社区、HuggingFace等平台下载部署使用,同时开源模型API服务 也可在百度智能云千帆大模型平台使用。 ...
腾讯,大动作!
中国基金报· 2025-06-27 15:00
Core Viewpoint - Tencent's Hunyuan-A13B model is the first open-source MoE model at the 13B parameter level, offering significant performance improvements and cost advantages for developers in the AI industry [4][6]. Group 1: Model Features and Performance - Hunyuan-A13B has a total of 80 billion parameters, with 13 billion active parameters, outperforming other leading open-source models in terms of inference speed and cost-effectiveness [4][5]. - The model supports flexible thinking modes, allowing for either quick, efficient outputs or deeper, more comprehensive reasoning processes [5]. - It is user-friendly for individual developers, requiring only a single mid-range GPU for deployment, and integrates seamlessly with mainstream open-source inference frameworks [5][10]. Group 2: Industry Trends and Open Source Movement - The open-source trend in AI is accelerating, with major tech companies like OpenAI, Google, and Alibaba releasing over 10 open-source models since March 2023 [8][9]. - The performance of open-source models continues to improve, with platforms like Hugging Face frequently updating their model rankings [8]. - Companies are increasingly adopting open-source AI technologies, with over 50% of enterprises reportedly utilizing these solutions for data, models, and tools [9][10]. Group 3: Future Developments - Tencent plans to release more models of varying sizes and features, contributing to the growth of the open-source ecosystem [6][10]. - Future releases will include a range of mixed reasoning models from 0.5B to 32B parameters, as well as multi-modal foundational models for images, videos, and 3D [10].
DeepSeek和李飞飞之后,英伟达也看上阿里千问?
Xin Lang Ke Ji· 2025-05-13 07:01
要说全球开源大模型生态圈里,谁最让人"魂牵梦绕"? 阿里,当仁不让。 就在上周,继DeepSeek和"AI教母"李飞飞之后,英伟达也相中阿里了。除了在最新的"混合推理模型"千 问3宣布开源当日,火速官宣接入适配后,5月9日,英伟达还 开源了全新的代码推理模型Open Code Reasoning (后续简称:OCR),包括7B、14B、32B三种尺寸, 基础模型用的都是通义千问。 在LiveCodeBench 评测中,成功超越Open AI 公司o3-mini和o1模型的英伟达OCR-Qwen-32B-Instruct模 型,正是基于Qwen2.5-32B微调形成的。 在通义千问已经迭代至3.0版本,模型性能再度突破的当下,英伟达居然还基于上一代千问模型做出了 比肩全球一流水平的模型,让人不禁想问,千问到底还有多少隐藏潜力待各方解锁? DeepSeek、李飞飞后,英伟达也相中了"通义千问" 目前,英伟达开源的OCR系列模型的代码及数据集,已公开分享至全球最大AI开源社区Hugging Face平 台上,供开发者们免费浏览学习。 其中,英伟达OCR-Qwen-32B-Instruct在LiveCodeBench ...
访清华孙茂松:中国“强音”推大模型开源,全球大模型文化正在扭转
Huan Qiu Wang Zi Xun· 2025-04-30 08:51
中新网北京4月30日电 (记者 夏宾)清华大学人工智能研究院常务副院长、欧洲科学院外籍院士孙茂松近 日在北京接受中新网记者专访时称,中国科技公司在大模型领域掀起的开源浪潮向全球发出了中国"强 音",其技术在获得国际认可的同时,悄然扭转了全球大模型文化。 来源:中国新闻网 最新消息显示,4月29日凌晨,新一代通义千问模型Qwen3(千问3)宣布开源,总共涉及8款不同尺寸的 千问3模型。据悉,阿里通义已开源200余个模型,全球下载量超3亿次,其衍生模型数超10万个,超越 美国Llama,成为全球第一开源模型。 以DeepSeek、Qwen为代表的中国开源模型实现先进模型的参数权重、推理逻辑和工具链条的全开源, 正在打开人工智能商用的新局面。 "尽管DeepSeek总体上是一个'从1到2'的创新,但在人工智能反馈强化学习方面是开源大模型中走得最 远的,将人类反馈变成了人工智能反馈。"谈到DeepSeek时,孙茂松说。 孙茂松特别强调了小模型的重要价值。从应用的角度,小模型可降低成本,拓展应用的普及度;从研究 的角度,小模型可有助于高校科研机构应对资源约束带来的研究挑战,这些都有很强的必要性。 在他看来,大模型做得越 ...