开源模型

Search documents
Qwen3深夜炸场,阿里一口气放出8款大模型,性能超越DeepSeek R1,登顶开源王座
3 6 Ke· 2025-04-29 09:53
Core Insights - The release of Qwen3 marks a significant advancement in open-source AI models, featuring eight hybrid reasoning models that rival proprietary models from OpenAI and Google, and surpass the open-source DeepSeek R1 model [4][24]. - Qwen3-235B-A22B is the flagship model with 235 billion parameters, demonstrating superior performance in various benchmarks, particularly in software engineering and mathematics [2][4]. - The Qwen3 series introduces a unique dual reasoning mode, allowing the model to switch between deep reasoning for complex problems and quick responses for simpler queries [8][21]. Model Performance - Qwen3-235B-A22B achieved a score of 95.6 in the ArenaHard test, outperforming OpenAI's o1 (92.1) and DeepSeek's R1 (93.2) [3]. - Qwen3-30B-A3B, with 30 billion parameters, also shows strong performance, scoring 91.0 in ArenaHard, indicating that smaller models can still achieve competitive results [6][20]. - The models have been trained on approximately 36 trillion tokens, nearly double the data used for the previous Qwen2.5 model, enhancing their capabilities across various domains [17][18]. Model Architecture and Features - Qwen3 employs a mixture of experts (MoE) architecture, activating only about 10% of its parameters during inference, which significantly reduces computational costs while maintaining high performance [20][24]. - The series includes six dense models ranging from 0.6 billion to 32 billion parameters, catering to different user needs and computational resources [5][6]. - The models support 119 languages and dialects, broadening their applicability in global contexts [12][25]. User Experience and Accessibility - Qwen3 is open-sourced under the Apache 2.0 license, making it accessible for developers and researchers [7][24]. - Users can easily switch between reasoning modes via a dedicated button on the Qwen Chat website or through commands in local deployments [10][14]. - The model has received positive feedback from users for its quick response times and deep reasoning capabilities, with notable comparisons to other models like Llama [25][28]. Future Developments - The Qwen team plans to focus on training models capable of long-term reasoning and executing real-world tasks, indicating a commitment to advancing AI capabilities [32].
【昇腾全系列支持Qwen3】4月29日讯,据华为计算公众号,Qwen3于2025年4月29日发布并开源。此前昇腾MindSpeed和MindIE一直同步支持Qwen系列模型,此次Qwen3系列一经发布开源,即在MindSpeed和MindIE中开箱即用,实现Qwen3的0Day适配。
news flash· 2025-04-29 06:27
Core Insights - Huawei's Ascend series fully supports the Qwen3 model, which was released and open-sourced on April 29, 2025 [1] - The Ascend MindSpeed and MindIE have been consistently supporting the Qwen series models, ensuring immediate compatibility with Qwen3 upon its release [1]
通义App全面上线千问3
news flash· 2025-04-29 03:13
Core Insights - The article highlights the launch of Alibaba's new generation open-source model Qwen3, available on the Tongyi App and website, enhancing user experience with advanced AI capabilities [1] Company Developments - The Tongyi App and Tongyi website (tongyi.com) have fully launched the Qwen3 model, which is described as the world's strongest open-source model [1] - Users can access the dedicated intelligent agent "Qwen Large Model" and experience its top-tier intelligent capabilities on both platforms [1]
阿里巴巴,登顶全球开源模型!
Zheng Quan Shi Bao· 2025-04-29 02:41
Core Insights - Alibaba has released the highly anticipated Qwen3 model, which has outperformed top global models in various benchmark tests, establishing itself as a leading open-source model [1][2][3] Model Performance - Qwen3 achieved a score of 81.5 in the AIME25 assessment, setting a new open-source record, and scored over 70 in the Live Code Bench test, surpassing Grok3 [1][2] - In the Arena Hard evaluation, Qwen3 scored 95.6, outperforming OpenAI-o1 and DeepSeek-R1 [1][2] Model Architecture - Qwen3 utilizes a mixed expert architecture with a total parameter count of 235 billion, activating only 22 billion parameters, significantly enhancing capabilities in reasoning, instruction following, tool usage, and multilingual abilities [2][3] Key Features - The model integrates "fast thinking" and "slow thinking," allowing seamless transitions between simple and complex tasks, thus optimizing computational efficiency [3][4] - Qwen3 offers eight different model sizes, including two mixed expert models (30B and 235B) and six dense models (ranging from 0.6B to 32B), catering to various applications and balancing performance with cost [3][4] Cost Efficiency - Deployment costs for Qwen3 are significantly lower compared to competitors, with the flagship model requiring only three H20 units (approximately 360,000 yuan) for deployment, which is 25%-35% of the cost of similar models [5][6] Open Source and Accessibility - Qwen3 is open-sourced under the Apache 2.0 license and supports over 119 languages, making it accessible for global developers and researchers [6][7] - The model is available on platforms like Magic Tower Community, Hugging Face, and GitHub, with personal users able to experience it through the Tongyi app [6][7] Industry Impact - The release of Qwen3 is expected to significantly advance research and development in large foundational models, enhancing the AI industry's focus on intelligent applications [6][7] - Alibaba has established itself as a leader in the open-source AI ecosystem, with over 200 models released and more than 300 million downloads globally, surpassing Meta's Llama [7]
“AI六小龙”中首家IPO要来了!智谱在国资加持下的突围与竞速
Sou Hu Cai Jing· 2025-04-15 05:54
Core Viewpoint - Beijing Zhiyu Huazhang Technology Co., Ltd. (referred to as "Zhiyu") has officially initiated the A-share IPO counseling process, marking the first listed company in the domestic "AI Six Dragons" sector [1][3]. Group 1: IPO Counseling and Timeline - Zhiyu signed a counseling agreement with China International Capital Corporation on March 31, 2025, aiming to complete the counseling plan by October 2025 [2][10]. - The IPO counseling marks a significant step for Zhiyu, which has undergone a name change and is preparing for its public offering [3][4]. Group 2: Technological Advancements - Founded in June 2019, Zhiyu's core team originates from Tsinghua University's Knowledge Engineering Laboratory, with founder Tang Jie being a recognized authority in the AI field [4]. - Zhiyu has developed several significant models, including the GLM architecture and the GLM-130B model, which broke new ground in the domestic large model open-source space [4][6]. Group 3: Capital Injection and Financial Backing - Zhiyu has completed 15 rounds of financing since its establishment, raising over 16 billion RMB, with significant investments from major players like Sequoia, Hillhouse, Tencent, and Alibaba [6][8]. - The company’s valuation has surged to over 30 billion RMB following recent investments, indicating strong market confidence [8]. Group 4: Market Challenges and Commercialization - Despite its advancements, Zhiyu faces challenges in finding the optimal commercialization path, with high operational costs and reliance on a concentrated customer base [12]. - The company’s revenue is heavily dependent on government and state-owned enterprises, with over 60% of orders coming from these sectors [12]. Group 5: Future Outlook - The IPO is seen as a critical juncture for Zhiyu, transitioning from reliance on capital infusion to self-sustaining growth [12]. - The company must address its current losses and innovate its profit model to ensure successful market entry post-IPO [12].
速递|筹集400亿美元后,OpenAI宣布开源模型回归计划,推理能力模型即将面世
Z Potentials· 2025-04-01 03:49
OpenAI 于 3 月 31 日周一宣布,即将在未来数月推出自 GPT-2 以来首个具备推理能力的开源模型。 OpenAI 同时宣布完成了历史上最大的私人融资之一,以 3000 亿美元的估值筹集了 400 亿美元。 约 180 亿美元的资金将用于 OpenAI 的 Stargate 基础设施项目,该项目旨在在美国建立一个人工智能数据中心网络。 图片来源: OpenAI 奥特曼在周一下午的 X 平台上扩展了 OpenAI 的开放模型计划,表示 OpenAI 即将推出的开放模型将具备"推理"能力,类似于 OpenAI 的 o3-mini 。 OpenAI 表示,它计划在"未来几个月"发布自 GPT-2 以来的第一个"开放"语言模型。 OpenAI 计划举办开发者活动以收集反馈,并在未来展示模型的原型。第一次开发者活动将在几周内在旧金山举行,随后将在欧洲和亚太地区进行会议。 在最近的一次 Reddit 问答中, OpenAI 的 CEO 奥特曼表示,他认为 OpenAI 在开源其技术方面方向有调整空间。 " 我个人认为我们需要找到一种不同的开源策略, "奥特曼说。"并不是所有 OpenAI 的人都持这种观点,这也 ...
3D版DeepSeek卷起开源月:两大基础模型率先SOTA!又是VAST
量子位· 2025-03-28 10:01
衡宇 鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 3D生成版DeepSeek再上新高度! 国产、易用、性能强且开源—— 新模型一露面就刷新SOTA,并且 第一时间加入开源全家桶 。 顺时针转个圈圈给大家看,效果是这样: 加上"皮肤"是这样: 再来一个,效果是这样: 肉眼可见,这次妥妥升级变成了更细节的细节控~ 以上效果,都来自 3D大模型明星初创公司VAST ,其刚刚上新的两个基础模型,TripoSG和TripoSF,为团队的最新研发成果。该团队去年3 月开源了TripoSR,在开源3D生成基础模型中爆火全球。 TripoSG ,发布即开源,一露面就刷新开源3D生成模型SOTA,让广大开发者第一时间享受技术进步的成果。 TripoSF ,目前为开源第一阶段,已经用实力证明了自己:横扫一切开源和闭源的现有方法,拿下新SOTA。 你就说秀不秀吧 (手动狗头) ?! ——但基础模型还只是VAST最近大秀一波技术肌肉的上半程表演。 量子位获悉, 接下来VAST要连续开源一个月,每周都有新开源项目公布 。而TripoSG和TripoSF是开源月里第二周的项目。 在整个开源月里,除了第一波单张图像端到端生成三维 ...
华尔街这是“约好了一起唱空”?巴克莱:现有AI算力似乎足以满足需求
硬AI· 2025-03-27 02:52
点击 上方 硬AI 关注我们 巴克莱指出,2025年AI行业有足够的算力来支持15亿到220亿个AI Agent。AI行业需从"无意义基准测试"转向实用的Agent产品部署,低推理成本是盈利关键,开源模型将降低 成本。尽管算力看似充足,但高效、低成本Agent产品的专用算力仍有缺口。 硬·AI 作者 |鲍亦龙 编辑 | 硬 AI 继TD Cowen后,巴克莱似乎也开始唱空AI算力。 3月26日,巴克莱发布最新研究称,2025年全球AI算力可支持15-220亿个AI Agent,这足以满足美国和欧盟1亿多白领工作者和超过10亿企业软件许可证的 需求。而同日 TD Cowen分析师称支撑人工智能运算的计算机集群供过于求 。 巴克莱认为现有的AI算力已经足够支持大规模AI代理的部署,主要基于以下三点: 行业推理容量基础 :2025年全球约有1570万个AI加速器(GPU/TPU/ASIC等)在线,其中40%(约630万个)将用于推理, 而这些推理算力中约一半(310万个)将专门用于 Agent/聊天机器人服务 ; 可支持大量用户 :根据不同模型的计算需求,现有算力可支持15亿到220亿个AI代理,这足以满足美国和欧 ...
Z Potentials|沈振宇,一个潮玩公司如何做出世界第一的AIGC模型平台
Z Potentials· 2025-03-26 03:49
推荐语 本期我们邀请到了沈振宇作为访谈嘉宾。这位曾被张一鸣直接招募、亲历字节跳动崛起的产品人,如 今已经完成了从图虫到千岛的两次创业蜕变。 在本次深度对话中,沈振宇分享了他对 AI 未来的独到见解: " 每个公司最终都会变成 AI 公司 " , 而 "AI 革命不可能只由少数人主导 " 。他坚信开源模型将主导未来,认为 " 技术秘密正在加速流动 " , 这也是他在千岛取得阶段性成功后,选择同步布局 AI 模型平台的战略考量。 作为一个已经服务超 10 万模型训练师、超 50 万个模型的平台, Tensor.Art 如何在激烈的全球竞争中 脱颖而出? 沈振宇的答案是构建双重护城河: " 模型规模和创作者规模 " ,同时坚持 " 低价才能带 来更大规模 " 的商业哲学。 他从字节学到的 " 以终为始 " 思维,让他能够 " 穿透短期噪音,看到那 些必然会发生的事情 " ,这也指导着他在 AI 时代的每一个决策。 在沈振宇看来, "AI 技术未来一定会像水电一样基础普及 " ,而 " 单一大模型的能力其实很有限 " , 我们需要大量微调模型来解决细分场景的问题。正如他所言: " 未来十年 AI 会改变一切 " ,而 ...
DeepSeek,上新!
证券时报· 2025-03-25 04:28
在保持原有技术框架的基础上,V3-0324模型针对性能、用户体验和实用性进行了优化。新版模型延续了V3系列的核心架构,总体积为6850亿参数,较此前版本的 6710亿有小幅增长。目前,最新模型已在官方网页、App小程序等入口开放,开源版本已上架开源网站。 整体来看,新版模型是一次小型的迭代升级,其主要的特点包括: 一是在模型性能方面,虽然DeepSeek并未给出新版模型的基准测试结果,但用户测试表明,其在生成复杂代码、数学问题求解、前端设计任务等方面表现更为出 色。其中,模型前端代码能力的提升是用户感知最明显的部分,有海外AI博主称,DeepSeek终于能在代码领域和Anthropic的Claude3.5/3.7Sonnet相媲美,还有专业 用户在体验后认为,V3-0324的提升幅度大约相当于Sonnet3.5到Sonnet3.6的提升。 | 3月24日晚间,DeepSeek发布了V3模型的最新更新版本——V3-0324模型。 | | --- | 责编:叶舒筠 校对: 刘星莹 (点击图片进入报名页面) 版权声明 证券时报各平台所有原创内容,未经书面授权,任何单位及个人不得转载。我社保留追 究相关 行 为主体法 ...