开源模型
Search documents
公开模型一切,优于DeepSeek-R1,英伟达开源Llama-Nemotron家族
机器之心· 2025-05-06 08:04
Core Viewpoint - The rapid development of large models has made reasoning ability a key indicator of model intelligence, with inference efficiency becoming a critical limiting factor for model deployment and performance [2][3]. Group 1: Model Overview - NVIDIA has launched the Llama-Nemotron series, an open family of large models designed for efficient reasoning, featuring excellent inference capabilities and an enterprise-friendly open license [3][5]. - The series includes three model sizes: Nano (8B), Super (49B), and Ultra (253B), along with an independent variant UltraLong (8B) that supports long context [4][5]. - The models are the first open-source models to support dynamic inference switching, allowing users to toggle between standard chat mode and reasoning mode, enhancing interaction flexibility [6]. Group 2: Model Training and Optimization - The Llama-Nemotron models utilize a multi-stage post-training process to enhance performance on reasoning and non-reasoning tasks, employing supervised fine-tuning and reinforcement learning techniques [9]. - The Puzzle framework is used for efficient reasoning optimization, transforming large language models into hardware-efficient variants while maintaining performance [12][15]. - LN-Super and LN-Ultra models achieve significant throughput improvements, with LN-Super showing a 5x increase in inference throughput compared to Llama 3.3-70B-Instruct [19]. Group 3: Performance Metrics - LN-Ultra demonstrates superior performance in key benchmarks, achieving scores such as 88.1 in MMLU and 80.4 in MATH500, surpassing its predecessors [25][24]. - The models are designed to meet specific deployment constraints, such as supporting up to 3 million cached tokens in FP8 precision for LN-Ultra [21]. Group 4: Reinforcement Learning and Instruction Following - The models incorporate a "detailed thinking on/off" instruction mechanism to enhance flexibility in reasoning depth and response style, improving user interaction [27]. - LN-Ultra's performance is further enhanced through large-scale reinforcement learning, allowing it to exceed the capabilities of its teacher model [31][39]. - The training process for LN-Ultra involved approximately 140,000 H100 GPU hours, focusing on optimizing reasoning capabilities and instruction-following abilities [32][41].
互联网大厂五一前密集开源新模型,布局各异谁将留在牌桌?
Nan Fang Du Shi Bao· 2025-05-01 14:12
Core Insights - Major domestic AI model companies are rapidly open-sourcing their models ahead of the May Day holiday, with Alibaba releasing Qwen3, Xiaomi launching Xiaomi MiMo, and DeepSeek introducing DeepSeek-Prover-V2 [1][2][5] Alibaba - Alibaba's Qwen3 features two MoE models with 30B and 235B parameters, and six dense models ranging from 0.6B to 32B, achieving state-of-the-art performance in its category [2] - Qwen3 is the first "hybrid reasoning model" in China, integrating fast and deep thinking capabilities, significantly reducing computational power consumption [5] - Alibaba has consistently open-sourced various models this year, including the 14B video generation model and the 7B multimodal model, aiming to leverage open-source models for AI applications while monetizing its cloud services [6] Xiaomi - Xiaomi's MiMo model, with only 7B parameters, outperformed OpenAI's closed-source model o1-mini in public benchmarks for mathematical reasoning and coding competitions [6] - This marks Xiaomi's first foray into open-sourcing its models, developed by its newly established Core team [6] DeepSeek - DeepSeek has released two versions of DeepSeek-Prover-V2, focusing on mathematical theorem proving and achieving significant performance improvements in benchmark tests [8] - The new models support extensive context inputs and are based on previous versions, showcasing a commitment to enhancing reasoning capabilities [8] Industry Trends - The open-sourcing of models by these companies is seen as a strategic move to enhance competitiveness against closed-source models from companies like OpenAI and Anthropic, which still hold a slight performance edge [9][10] - Industry experts predict a consolidation in the AI model sector, with DeepSeek, Alibaba, and ByteDance emerging as the leading players in China, while the U.S. market remains competitive with companies like xAI and OpenAI [10][11] - The open-source models are expected to democratize AI technology, making it more accessible and promoting innovation across various industries [9][10]
聊一聊数据中心的投资现状
傅里叶的猫· 2025-04-30 12:37
最近我们花了很多精力在H200/B200这些数据中心的服务器上,只能说坑很多,套路很深,但好事多 磨,最近的收货让我们觉得做件事是值得的。 这篇文章我们就来简单聊一下数据中心的投资现状,综合TD Cowen报告、The Information/BBG文章 及多位行业专家访谈,看下国外的大厂对IDC的态度,后面我们还有专门写一期 国内IDC 投资现 状。 微软数据中心投资放缓 相信大家也都看到这个新闻,微软正经历数据中心投资需求的显著放缓或调整。自去年起退出超 1GW的数据中心交易,并终止部分土地合同。放缓国际扩张步伐,并暂停/推迟了多个国内外项目, 包括美国(亚特兰大、威斯康星二期、圣安东尼奥、堪萨斯城、锡达拉皮兹)及欧洲、印度、英 国、澳大利亚等地,涉及规划租赁需求减少近1.98GW(原计划4年完成,年均约500MW)。 导致调整的原因是多方面的: 1. 资源消化:消化2024年已大量租赁的资源,避免过度建设。 2. 建设复杂性:超大规模数据中心设计和建设本身复杂,导致客观延迟。 3. OpenAI战略转移:OpenAI不再完全依赖微软,转向甲骨文、CoreWeave等第三方并大力推进自 建,导致微软为其规 ...
扎克伯格最新专访:AI 会在知识工作和编程领域,引发一场巨大的革命
Sou Hu Cai Jing· 2025-04-30 10:02
Core Insights - Meta's CEO Mark Zuckerberg discussed the competitive landscape of AI development, particularly comparing the Llama 4 model with DeepSeek, asserting that Llama 4 offers higher efficiency and broader functionality despite DeepSeek's advancements in specific areas [1][36]. - Meta AI has reached nearly 1 billion monthly users, indicating significant growth and the importance of personalized AI interactions [2][21]. - The company is focusing on developing coding agents that will automate much of the coding process within the next 12 to 18 months, which is expected to increase the demand for human jobs rather than decrease it [1][16]. Model Development - The Llama 4 series includes models like Scout and Maverick, which are designed for efficiency and low latency, supporting multi-modal capabilities [4][41]. - The upcoming Behemoth model will exceed 2 trillion parameters, representing a significant leap in model size and capability [4]. - Meta is committed to open-sourcing its models after internal use, allowing others to benefit from their developments [4][41]. Competitive Landscape - Zuckerberg believes that open-source models are likely to surpass closed-source models in popularity, reflecting a trend towards more accessible AI technologies [5][36]. - The company acknowledges the impressive infrastructure and text processing capabilities of DeepSeek but emphasizes that Llama 4's multi-modal abilities give it a competitive edge [35][36]. - The licensing model for Llama is designed to facilitate collaboration with large companies while ensuring that Meta retains some control over its intellectual property [37][39]. User Interaction and Experience - Meta is exploring how AI can enhance user interactions, particularly through natural dialogue and personalized experiences [14][28]. - The integration of AI into existing applications like WhatsApp is crucial for user engagement, especially in markets outside the U.S. [21]. - The company is focused on creating AI that can assist users in complex social interactions, enhancing the overall user experience [27][28]. Future Directions - Zuckerberg envisions a future where AI seamlessly integrates into daily life, potentially through devices like smart glasses that facilitate constant interaction with AI [14][31]. - The development of AI will not only focus on productivity but also on entertainment and social engagement, reflecting the diverse applications of AI technology [25][26]. - The company is aware of the challenges in ensuring that AI interactions remain healthy and beneficial for users, emphasizing the importance of understanding user behavior [26][27].
Qwen 3发布,Founder Park围绕开源模型的生态价值采访心言集团高级算法工程师左右
Zhong Guo Chan Ye Jing Ji Xin Xi Wang· 2025-04-30 09:07
Core Insights - Alibaba's new model Qwen3 is emerging as a significant player in the Chinese open-source AI ecosystem, replacing previous models like Llama and Mistral [1] - The interview with industry representatives highlights the importance of model selection, fine-tuning, and the challenges faced in the AI landscape [1][3] Model Selection and Deployment - The majority of applications (over 90%) require fine-tuned models, primarily deployed locally for online use [3] - Qwen models are preferred due to their mature ecosystem, technical capabilities, and better alignment with specific business needs, particularly in emotional and psychological applications [4][5] Challenges in Model Utilization - In embodied intelligence, challenges include high inference costs and ecosystem compatibility, especially when deploying locally for privacy reasons [6] - For online services, the main challenges are model capability and inference costs, particularly during peak usage times [7] Model Capability and Business Needs - Current models do not fully meet the nuanced requirements of emotional and psychological applications, necessitating post-training to enhance general capabilities while minimizing damage to other skills [8] - The expectation is for open-source models to catch up with top closed-source models, with a focus on transparency and sharing technical details [9][10] Differentiation Among Open-Source Models - DeepSeek is seen as more aggressive and innovative, while Qwen and Llama focus on community engagement and broader applicability [11][12] Product and AI Integration - A significant oversight in AI development is the mismatch between models and product needs, emphasizing that AI should enhance backend processing rather than serve as a front-end interface [13][14] - Successful products should be built on genuine user needs, ensuring high user retention and avoiding superficial demand fulfillment [14] Global Impact of Open-Source Models - The rise of Chinese open-source models like Qwen and DeepSeek is accelerating a global technological transformation, fostering a collaborative and innovative ecosystem [15]
Qwen3深夜炸场,阿里一口气放出8款大模型,性能超越DeepSeek R1,登顶开源王座
3 6 Ke· 2025-04-29 09:53
Core Insights - The release of Qwen3 marks a significant advancement in open-source AI models, featuring eight hybrid reasoning models that rival proprietary models from OpenAI and Google, and surpass the open-source DeepSeek R1 model [4][24]. - Qwen3-235B-A22B is the flagship model with 235 billion parameters, demonstrating superior performance in various benchmarks, particularly in software engineering and mathematics [2][4]. - The Qwen3 series introduces a unique dual reasoning mode, allowing the model to switch between deep reasoning for complex problems and quick responses for simpler queries [8][21]. Model Performance - Qwen3-235B-A22B achieved a score of 95.6 in the ArenaHard test, outperforming OpenAI's o1 (92.1) and DeepSeek's R1 (93.2) [3]. - Qwen3-30B-A3B, with 30 billion parameters, also shows strong performance, scoring 91.0 in ArenaHard, indicating that smaller models can still achieve competitive results [6][20]. - The models have been trained on approximately 36 trillion tokens, nearly double the data used for the previous Qwen2.5 model, enhancing their capabilities across various domains [17][18]. Model Architecture and Features - Qwen3 employs a mixture of experts (MoE) architecture, activating only about 10% of its parameters during inference, which significantly reduces computational costs while maintaining high performance [20][24]. - The series includes six dense models ranging from 0.6 billion to 32 billion parameters, catering to different user needs and computational resources [5][6]. - The models support 119 languages and dialects, broadening their applicability in global contexts [12][25]. User Experience and Accessibility - Qwen3 is open-sourced under the Apache 2.0 license, making it accessible for developers and researchers [7][24]. - Users can easily switch between reasoning modes via a dedicated button on the Qwen Chat website or through commands in local deployments [10][14]. - The model has received positive feedback from users for its quick response times and deep reasoning capabilities, with notable comparisons to other models like Llama [25][28]. Future Developments - The Qwen team plans to focus on training models capable of long-term reasoning and executing real-world tasks, indicating a commitment to advancing AI capabilities [32].
【昇腾全系列支持Qwen3】4月29日讯,据华为计算公众号,Qwen3于2025年4月29日发布并开源。此前昇腾MindSpeed和MindIE一直同步支持Qwen系列模型,此次Qwen3系列一经发布开源,即在MindSpeed和MindIE中开箱即用,实现Qwen3的0Day适配。
news flash· 2025-04-29 06:27
Core Insights - Huawei's Ascend series fully supports the Qwen3 model, which was released and open-sourced on April 29, 2025 [1] - The Ascend MindSpeed and MindIE have been consistently supporting the Qwen series models, ensuring immediate compatibility with Qwen3 upon its release [1]
通义App全面上线千问3
news flash· 2025-04-29 03:13
Core Insights - The article highlights the launch of Alibaba's new generation open-source model Qwen3, available on the Tongyi App and website, enhancing user experience with advanced AI capabilities [1] Company Developments - The Tongyi App and Tongyi website (tongyi.com) have fully launched the Qwen3 model, which is described as the world's strongest open-source model [1] - Users can access the dedicated intelligent agent "Qwen Large Model" and experience its top-tier intelligent capabilities on both platforms [1]
阿里巴巴,登顶全球开源模型!
Zheng Quan Shi Bao· 2025-04-29 02:41
Core Insights - Alibaba has released the highly anticipated Qwen3 model, which has outperformed top global models in various benchmark tests, establishing itself as a leading open-source model [1][2][3] Model Performance - Qwen3 achieved a score of 81.5 in the AIME25 assessment, setting a new open-source record, and scored over 70 in the Live Code Bench test, surpassing Grok3 [1][2] - In the Arena Hard evaluation, Qwen3 scored 95.6, outperforming OpenAI-o1 and DeepSeek-R1 [1][2] Model Architecture - Qwen3 utilizes a mixed expert architecture with a total parameter count of 235 billion, activating only 22 billion parameters, significantly enhancing capabilities in reasoning, instruction following, tool usage, and multilingual abilities [2][3] Key Features - The model integrates "fast thinking" and "slow thinking," allowing seamless transitions between simple and complex tasks, thus optimizing computational efficiency [3][4] - Qwen3 offers eight different model sizes, including two mixed expert models (30B and 235B) and six dense models (ranging from 0.6B to 32B), catering to various applications and balancing performance with cost [3][4] Cost Efficiency - Deployment costs for Qwen3 are significantly lower compared to competitors, with the flagship model requiring only three H20 units (approximately 360,000 yuan) for deployment, which is 25%-35% of the cost of similar models [5][6] Open Source and Accessibility - Qwen3 is open-sourced under the Apache 2.0 license and supports over 119 languages, making it accessible for global developers and researchers [6][7] - The model is available on platforms like Magic Tower Community, Hugging Face, and GitHub, with personal users able to experience it through the Tongyi app [6][7] Industry Impact - The release of Qwen3 is expected to significantly advance research and development in large foundational models, enhancing the AI industry's focus on intelligent applications [6][7] - Alibaba has established itself as a leader in the open-source AI ecosystem, with over 200 models released and more than 300 million downloads globally, surpassing Meta's Llama [7]
AI 烧钱加速、开源模型变现难,Meta寻求亚马逊、微软资助
Hua Er Jie Jian Wen· 2025-04-18 13:48
Core Insights - Meta is seeking external funding to support the development of its flagship language model, Llama, due to increasing financial pressures [1][2] - The company has proposed various collaboration options to potential investors, including allowing them to participate in future development decisions for Llama [1] - Meta's primary challenge lies in the open-source nature of Llama, which complicates its commercialization efforts [2] Group 1: Funding and Partnerships - Meta has approached several tech companies, including Microsoft and Amazon, for financial support to share the training costs of Llama [1] - The initiative, referred to as the "Llama Alliance," has not seen significant market enthusiasm since its inception [1] - Discussions have also included companies like Databricks, IBM, Oracle, and a representative from a Middle Eastern investor [1] Group 2: Commercialization Challenges - Meta is working on an internal project called "Llama X" aimed at developing APIs for enterprise applications [2] - The open-source nature of Llama allows free access to anyone, making it difficult for Meta to monetize the model effectively [2] - Companies approached by Meta are cautious about investing in a model that will ultimately be available for free [2] Group 3: Financial Outlook - Meta plans to spend between $60 billion to $65 billion on capital expenditures this year, a 60% increase from 2024, primarily for AI data centers [3] - This expenditure represents about one-third of Meta's expected revenue for the year [3] - Despite having $49 billion in cash and generating $91 billion in cash flow last year, Meta may face challenges in balancing AI investments with shareholder expectations for buybacks and dividends [3]