Workflow
开源模型
icon
Search documents
黄仁勋谈中美AI竞争:中国的Deepseek和千问是开源模型中最好的
news flash· 2025-05-30 11:47
5月29日,英伟达CEO黄仁勋在财报电话会上说,来自中国的DeepSeek 和 Qwen(阿里通义千问)是开 源 AI 模型之中最好的。免费发布后,它们在美国、欧洲及其他地区获得了巨大关注。最终,赢得 AI 开发者的平台将赢得 AI。出口限制应该加强美国平台,而不是将世界上一半的AI人才推向竞争对手。 (全天候科技) ...
美国法院叫停特朗普大部分进口关税;特斯拉股东们的愿望实现了:马斯克离开DOGE丨百亿美元公司动向
晚点LatePost· 2025-05-30 11:08
与此同时,马斯克宣布:6 月起交付自动驾驶版 Model Y。 马斯克昨天发帖称,特斯拉过去几天一直在德克萨斯州的奥斯汀公共街道测试自动驾驶版 Model Y 汽车,期间 "未发生任何事故"。 马斯克表示,该计划将比原计划提前一个月实施,预计在 6 月实现首次从工厂到客户的自主交付。 自助交付指的是特斯拉通过完全自动驾驶(FSD)技术使车辆自主完成从工厂到客户的运输过程, 这是其规模化应用自动驾驶技术的重要尝试。 美国法院叫停特朗普大部分进口关税。 在美国,国会立法确定总统权力边界,法院则能判定总统是否滥用权力。特朗普绕过国会、加征 10%"基准关税" 和更高的 "对等关税",是靠 1977 年就颁布的《国际紧急经济权力法》,但该法主 要涉及贸易禁运和经济制裁。特朗普之前,没有美国总统靠它改变关税。现在,三名法官组成的小 组判定特朗普已经越权,要求行政部门在 10 日内撤回相关关税。但汽车关税等靠其他法案加征的 关税不受影响。 判决书中,法官认为无论从哪个角度分析,"任何认为《国际紧急经济权力法》赋予总统无限关税 权力的解读都是违宪的。" 法律专家说这判决还意味着美国政府需要偿还已经征收的关税。特朗普 政府则 ...
模型下载量12亿,核心团队却几近瓦解:算力分配不均、利润压垮创新?
猿大侠· 2025-05-30 03:59
Core Viewpoint - Meta is restructuring its AI team to enhance product development speed and flexibility, dividing it into two main teams: AI Products and AGI Foundations [2][3] Group 1: Organizational Changes - The AI Products team will focus on consumer-facing applications like Facebook, Instagram, and WhatsApp, as well as a new independent AI application [2] - The AGI Foundations department will work on broader technologies, including improvements to the Llama model [3] - The restructuring aims to grant teams more autonomy while minimizing inter-team dependencies [3] Group 2: Competitive Landscape - Meta is striving to keep pace with competitors like OpenAI and Google, launching initiatives like "Llama for Startups" to encourage early-stage companies to utilize its generative AI products [3] - Despite initial success, Meta's reputation in the open-source AI field has declined, with significant talent loss from its foundational AI research team, FAIR [4][7] Group 3: Talent and Leadership Issues - A significant number of key researchers from the Llama project have left Meta, raising concerns about the company's ability to retain top AI talent [7][23] - The departure of Joelle Pineau, a long-time leader at FAIR, has highlighted internal issues regarding performance and leadership [8][13] Group 4: Financial Commitment and Future Plans - Meta plans to invest approximately $65 billion in AI projects by 2025, with the aim of enhancing its AI capabilities [22] - The company is expanding its data center capacity, including a new 2GW facility, to support its AI initiatives [22]
Meta CEO X 微软 CEO 对话解读:「蒸馏工厂」为何成为开源的魅力之源?
机器之心· 2025-05-23 15:30
Group 1 - The core discussion at LlamaCon 2025 focused on the transformative impact of AI on the boundaries between documents, applications, and websites, as articulated by Satya Nadella [5][6] - Nadella emphasized that modern AI acts as a "universal converter," understanding user intent and enabling a shift from "tool-oriented computing" to "intent-oriented computing," enhancing user experience [6][7] - Nadella identified the current AI wave as a significant technological platform shift, necessitating a complete overhaul of the technology stack to optimize for AI workloads [7] Group 2 - Nadella noted that approximately 20% to 30% of Microsoft's internal code is now generated by AI, indicating a broad application of AI in software development beyond mere code completion [7][8] - Zuckerberg projected that by 2026, half of Meta's development work will be completed by AI, showcasing the growing reliance on AI in the tech industry [8] - The dialogue also highlighted the strategic value of both open-source and closed-source models, with Nadella advocating for a flexible approach that supports both [9][10] Group 3 - The concept of "distillation factories" was introduced as a key area for future development in the AI ecosystem, with both CEOs agreeing on the importance of infrastructure and toolchains for model distillation [10][11] - Nadella pointed out the trend towards multi-model applications and the necessity of standardized protocols for seamless collaboration among various AI models [10] - Zuckerberg acknowledged Microsoft's unique advantages in supporting multi-model collaboration infrastructure, reinforcing the significance of the "distillation factory" concept [10]
说一下现在我做AI产品经理,使用的几个开源模型
3 6 Ke· 2025-05-14 08:34
Core Insights - The article discusses the importance of AI model selection for product managers, emphasizing the need for private deployment to ensure data security and customization [1][2] - It highlights the varying hardware requirements for different AI models, with specific mention of DEEPSEEK needing up to 700GB of GPU memory [1] - The article also addresses the regulatory challenges in deploying AI models in China, necessitating the use of domestic models [2] Model Selection and Rankings - A recommendation is made to refer to the LLM rankings for selecting appropriate models based on specific needs [3] - The article provides a link to Hugging Face for downloading open-source models, indicating a resource for model acquisition [5] Model Performance and Usage - The article lists various AI models suitable for different applications, including DEEPSEEK and Alibaba's Qwen3.0, noting their capabilities and hardware requirements [10][11] - It mentions that DEEPSEEK V3 is optimized for faster output, while R1 is better for deep reasoning tasks [11] - Other domestic models are also discussed, with a focus on their applicability in specific industries like healthcare and finance [12] Open-source Models for Different Platforms - The article outlines several open-source models suitable for mobile deployment, such as Microsoft's BitNet b1.58, which is designed for low-resource environments [13] - It also mentions international models like Llama 4, which supports multi-modal data integration [14] Model Mechanisms and Integration - Different models are categorized based on their functionalities, such as text generation, image generation, and speech generation, highlighting the need for multiple models to work together in complex applications [20] - The article emphasizes the increasing complexity and learning curve for AI product managers in understanding and integrating these models [20]
Meta、微软掌门人巅峰对话:大模型如何改变世界?
3 6 Ke· 2025-05-07 02:32
Core Insights - The competition in large models is intensifying, with significant developments from major tech companies like Alibaba and Meta [1] - Meta's Llama 4 series and the launch of the Meta AI App are pivotal in the ongoing AI landscape [1][4] - The dialogue between Mark Zuckerberg and Satya Nadella highlights the transformative potential of AI in application development and productivity [3][4] Group 1: AI Development and Impact - Nadella emphasizes that we are entering a phase of "deep applications," where AI will significantly enhance productivity across various sectors [8][29] - By 2026, it is projected that half of application development tasks will be completed by AI, indicating a major shift in the engineering landscape [4][21] - The integration of AI into workflows is expected to accelerate productivity, with examples from Microsoft's GitHub Copilot showcasing its evolving capabilities [15][16] Group 2: Open Source and Interoperability - Nadella discusses the importance of interoperability between open-source and closed-source models, suggesting that both are necessary for meeting customer demands [11][12] - The open-source ecosystem is seen as crucial for enabling developers to create proprietary models while benefiting from community-driven advancements [11][12] - The ability to distill large models into smaller, more efficient versions is highlighted as a key advantage of open-source models [32][34] Group 3: Future of AI and Infrastructure - The concept of a "distillation factory" is introduced, where large models can be transformed into smaller, more accessible versions for broader use [32][35] - Nadella points out that the infrastructure for AI must evolve to support the growing demand for diverse model applications, including smaller models suitable for personal devices [36][37] - The collaboration between companies like Meta and Microsoft is expected to drive innovation in AI tools and infrastructure, enhancing the overall developer experience [12][36]
公开模型一切,优于DeepSeek-R1,英伟达开源Llama-Nemotron家族
机器之心· 2025-05-06 08:04
Core Viewpoint - The rapid development of large models has made reasoning ability a key indicator of model intelligence, with inference efficiency becoming a critical limiting factor for model deployment and performance [2][3]. Group 1: Model Overview - NVIDIA has launched the Llama-Nemotron series, an open family of large models designed for efficient reasoning, featuring excellent inference capabilities and an enterprise-friendly open license [3][5]. - The series includes three model sizes: Nano (8B), Super (49B), and Ultra (253B), along with an independent variant UltraLong (8B) that supports long context [4][5]. - The models are the first open-source models to support dynamic inference switching, allowing users to toggle between standard chat mode and reasoning mode, enhancing interaction flexibility [6]. Group 2: Model Training and Optimization - The Llama-Nemotron models utilize a multi-stage post-training process to enhance performance on reasoning and non-reasoning tasks, employing supervised fine-tuning and reinforcement learning techniques [9]. - The Puzzle framework is used for efficient reasoning optimization, transforming large language models into hardware-efficient variants while maintaining performance [12][15]. - LN-Super and LN-Ultra models achieve significant throughput improvements, with LN-Super showing a 5x increase in inference throughput compared to Llama 3.3-70B-Instruct [19]. Group 3: Performance Metrics - LN-Ultra demonstrates superior performance in key benchmarks, achieving scores such as 88.1 in MMLU and 80.4 in MATH500, surpassing its predecessors [25][24]. - The models are designed to meet specific deployment constraints, such as supporting up to 3 million cached tokens in FP8 precision for LN-Ultra [21]. Group 4: Reinforcement Learning and Instruction Following - The models incorporate a "detailed thinking on/off" instruction mechanism to enhance flexibility in reasoning depth and response style, improving user interaction [27]. - LN-Ultra's performance is further enhanced through large-scale reinforcement learning, allowing it to exceed the capabilities of its teacher model [31][39]. - The training process for LN-Ultra involved approximately 140,000 H100 GPU hours, focusing on optimizing reasoning capabilities and instruction-following abilities [32][41].
互联网大厂五一前密集开源新模型,布局各异谁将留在牌桌?
Nan Fang Du Shi Bao· 2025-05-01 14:12
Core Insights - Major domestic AI model companies are rapidly open-sourcing their models ahead of the May Day holiday, with Alibaba releasing Qwen3, Xiaomi launching Xiaomi MiMo, and DeepSeek introducing DeepSeek-Prover-V2 [1][2][5] Alibaba - Alibaba's Qwen3 features two MoE models with 30B and 235B parameters, and six dense models ranging from 0.6B to 32B, achieving state-of-the-art performance in its category [2] - Qwen3 is the first "hybrid reasoning model" in China, integrating fast and deep thinking capabilities, significantly reducing computational power consumption [5] - Alibaba has consistently open-sourced various models this year, including the 14B video generation model and the 7B multimodal model, aiming to leverage open-source models for AI applications while monetizing its cloud services [6] Xiaomi - Xiaomi's MiMo model, with only 7B parameters, outperformed OpenAI's closed-source model o1-mini in public benchmarks for mathematical reasoning and coding competitions [6] - This marks Xiaomi's first foray into open-sourcing its models, developed by its newly established Core team [6] DeepSeek - DeepSeek has released two versions of DeepSeek-Prover-V2, focusing on mathematical theorem proving and achieving significant performance improvements in benchmark tests [8] - The new models support extensive context inputs and are based on previous versions, showcasing a commitment to enhancing reasoning capabilities [8] Industry Trends - The open-sourcing of models by these companies is seen as a strategic move to enhance competitiveness against closed-source models from companies like OpenAI and Anthropic, which still hold a slight performance edge [9][10] - Industry experts predict a consolidation in the AI model sector, with DeepSeek, Alibaba, and ByteDance emerging as the leading players in China, while the U.S. market remains competitive with companies like xAI and OpenAI [10][11] - The open-source models are expected to democratize AI technology, making it more accessible and promoting innovation across various industries [9][10]
聊一聊数据中心的投资现状
傅里叶的猫· 2025-04-30 12:37
最近我们花了很多精力在H200/B200这些数据中心的服务器上,只能说坑很多,套路很深,但好事多 磨,最近的收货让我们觉得做件事是值得的。 这篇文章我们就来简单聊一下数据中心的投资现状,综合TD Cowen报告、The Information/BBG文章 及多位行业专家访谈,看下国外的大厂对IDC的态度,后面我们还有专门写一期 国内IDC 投资现 状。 微软数据中心投资放缓 相信大家也都看到这个新闻,微软正经历数据中心投资需求的显著放缓或调整。自去年起退出超 1GW的数据中心交易,并终止部分土地合同。放缓国际扩张步伐,并暂停/推迟了多个国内外项目, 包括美国(亚特兰大、威斯康星二期、圣安东尼奥、堪萨斯城、锡达拉皮兹)及欧洲、印度、英 国、澳大利亚等地,涉及规划租赁需求减少近1.98GW(原计划4年完成,年均约500MW)。 导致调整的原因是多方面的: 1. 资源消化:消化2024年已大量租赁的资源,避免过度建设。 2. 建设复杂性:超大规模数据中心设计和建设本身复杂,导致客观延迟。 3. OpenAI战略转移:OpenAI不再完全依赖微软,转向甲骨文、CoreWeave等第三方并大力推进自 建,导致微软为其规 ...
扎克伯格最新专访:AI 会在知识工作和编程领域,引发一场巨大的革命
Sou Hu Cai Jing· 2025-04-30 10:02
Core Insights - Meta's CEO Mark Zuckerberg discussed the competitive landscape of AI development, particularly comparing the Llama 4 model with DeepSeek, asserting that Llama 4 offers higher efficiency and broader functionality despite DeepSeek's advancements in specific areas [1][36]. - Meta AI has reached nearly 1 billion monthly users, indicating significant growth and the importance of personalized AI interactions [2][21]. - The company is focusing on developing coding agents that will automate much of the coding process within the next 12 to 18 months, which is expected to increase the demand for human jobs rather than decrease it [1][16]. Model Development - The Llama 4 series includes models like Scout and Maverick, which are designed for efficiency and low latency, supporting multi-modal capabilities [4][41]. - The upcoming Behemoth model will exceed 2 trillion parameters, representing a significant leap in model size and capability [4]. - Meta is committed to open-sourcing its models after internal use, allowing others to benefit from their developments [4][41]. Competitive Landscape - Zuckerberg believes that open-source models are likely to surpass closed-source models in popularity, reflecting a trend towards more accessible AI technologies [5][36]. - The company acknowledges the impressive infrastructure and text processing capabilities of DeepSeek but emphasizes that Llama 4's multi-modal abilities give it a competitive edge [35][36]. - The licensing model for Llama is designed to facilitate collaboration with large companies while ensuring that Meta retains some control over its intellectual property [37][39]. User Interaction and Experience - Meta is exploring how AI can enhance user interactions, particularly through natural dialogue and personalized experiences [14][28]. - The integration of AI into existing applications like WhatsApp is crucial for user engagement, especially in markets outside the U.S. [21]. - The company is focused on creating AI that can assist users in complex social interactions, enhancing the overall user experience [27][28]. Future Directions - Zuckerberg envisions a future where AI seamlessly integrates into daily life, potentially through devices like smart glasses that facilitate constant interaction with AI [14][31]. - The development of AI will not only focus on productivity but also on entertainment and social engagement, reflecting the diverse applications of AI technology [25][26]. - The company is aware of the challenges in ensuring that AI interactions remain healthy and beneficial for users, emphasizing the importance of understanding user behavior [26][27].