Workflow
AI推理
icon
Search documents
芯片新贵,集体转向
半导体芯闻· 2025-05-12 10:08
Core Viewpoint - The AI chip market is shifting focus from training to inference, as companies find it increasingly difficult to compete in the training space dominated by Nvidia and others [1][20]. Group 1: Market Dynamics - Nvidia continues to lead the training chip market, while companies like Graphcore, Intel Gaudi, and SambaNova are pivoting towards the more accessible inference market [1][20]. - The training market requires significant capital and resources, making it challenging for new entrants to survive [1][20]. - The shift towards inference is seen as a strategic move to find more scalable and practical applications in AI [1][20]. Group 2: Graphcore's Transition - Graphcore, once a strong competitor to Nvidia, is now focusing on inference as a means of survival after facing challenges in the training market [6][4]. - The company has optimized its Poplar SDK for efficient inference tasks and is targeting sectors like finance and healthcare [6][4]. - Graphcore's previous partnerships, such as with Microsoft, have ended, prompting a need to adapt to the changing market landscape [6][5]. Group 3: Intel Gaudi's Strategy - Intel's Gaudi series, initially aimed at training, is now being integrated into a new AI acceleration product line that emphasizes both training and inference [10][11]. - Gaudi 3 is marketed for its cost-effectiveness and performance in inference tasks, particularly for large language models [10][11]. - Intel is merging its Habana and GPU departments to streamline its AI chip strategy, indicating a shift in focus towards inference [10][11]. Group 4: Groq's Focus on Inference - Groq, originally targeting the training market, has pivoted to provide inference-as-a-service, emphasizing low latency and high throughput [15][12]. - The company has developed an AI inference engine platform that integrates with existing AI ecosystems, aiming to attract industries sensitive to latency [15][12]. - Groq's transition highlights the growing importance of speed and efficiency in the inference market [15][12]. Group 5: SambaNova's Shift - SambaNova has transitioned from a focus on training to offering inference-as-a-service, allowing users to access AI capabilities without complex hardware [19][16]. - The company is targeting sectors with strict compliance needs, such as government and finance, providing tailored AI solutions [19][16]. - This strategic pivot reflects the broader trend of AI chip companies adapting to market demands for efficient inference solutions [19][16]. Group 6: Inference Market Characteristics - Inference tasks are less resource-intensive than training, allowing companies with limited capabilities to compete effectively [21][20]. - The shift to inference is characterized by a focus on cost, deployment, and maintainability, moving away from the previous emphasis on raw computational power [23][20]. - The competitive landscape is evolving, with smaller teams and startups finding opportunities in the inference space [23][20].
智通决策参考︱恒指稳步推进 重点观察机器人和稀土概念表现
Zhi Tong Cai Jing· 2025-05-12 00:51
Group 1: Market Overview - The recent meetings have played a crucial role in stabilizing the Hong Kong stock market, with the Hang Seng Index continuing to progress steadily [1] - There are positive developments regarding ceasefire announcements between India and Pakistan, as well as potential progress in Russia-Ukraine negotiations, which may benefit market sentiment [1] - The key focus is on the US-China talks, which lasted for 8 hours on May 10, indicating a shift towards resolving differences, with constructive progress expected [1] Group 2: Company Performance - For 2024, GDS Holdings Limited (万国数据-SW) is projected to achieve revenue of 10.322 billion yuan, a year-on-year increase of 5.5%, and an adjusted EBITDA of 4.876 billion yuan, up 3% [3] - The company’s domestic operational area reached 613,583 square meters by the end of Q4 2024, reflecting a 12% year-on-year growth, with a cabinet utilization rate of 73.8% [3] - GDS's international business, DayOne, has signed contracts totaling 467 MW, with an operational scale of 121 MW, generating revenue of 1.73 million USD and adjusted EBITDA of 0.45 million USD in 2024 [4] Group 3: Industry Insights - Chinese construction companies are increasingly competitive in the international market, with several state-owned enterprises ranking among the top 10 in the ENR "Global Top 250 International Contractors" for 2024 [5] - The demand for construction projects along the Belt and Road Initiative is strong, with significant projects like the Jakarta-Bandung High-Speed Railway and China-Europe Railway Express enhancing infrastructure in participating countries [6] - The international engineering business is experiencing better conditions than the domestic market, with a notable increase in new contracts signed overseas by major Chinese construction firms [7]
芯片新贵,集体转向
半导体行业观察· 2025-05-10 02:53
在这种格局下,新晋芯片企业在训练市场几乎没有生存空间。"训练芯片的市场不是大多数玩家 的竞技场",AI基础设施创业者坦言,"光是拿到一张大模型训练订单,就意味着你需要烧掉数千 万美元——而且你未必赢。" 如果您希望可以时常见面,欢迎标星收藏哦~ 在AI芯片这个波澜壮阔的竞技场上,一度被奉为"技术圣杯"的大规模训练,如今正悄然让位于更 低调、但更现实的推理市场。 Nvidia依然在训练芯片市场一骑绝尘,Cerebras则继续孤注一掷地打造超大规模计算平台。但其 他曾在训练芯片上争得面红耳赤的玩家——Graphcore、英特尔Gaudi、SambaNova等——正在 悄悄转向另一个战场:AI推理。 这一趋势,并非偶然。 AI训练作为一个重资本、重算力、重软件生态的产业,Nvidia的CUDA工具链、成熟的GPU生态 与广泛的框架兼容性,使其几乎掌握了训练芯片的全部话语权。而Cerebras虽然另辟蹊径,推出 了超大芯片的训练平台,但仍局限于科研机构和极少数商业化应用场景。 正因如此,那些曾在训练芯片上"正面硬刚"Nvidia的创业公司,开始寻求更容易进入、更能规模 化落地的应用路径。推理芯片,成为最佳选项。 Gr ...
AI推理时代 边缘云不再“边缘”
Core Insights - The rise of edge cloud technology is revolutionizing data processing by shifting capabilities closer to the network edge, enhancing real-time data response and processing, particularly in the context of AI inference [1][5] - The demand for AI inference is significantly higher than for training, with estimates suggesting that inference computing needs could be 10 times greater than training needs [1][3] - Companies are increasingly focusing on the post-training phase and deployment issues, as edge cloud solutions improve the efficiency and security of AI inference [1][5] Group 1: AI Inference Demand - AI inference is expected to account for over 70% of total computing demand for general artificial intelligence, potentially reaching 4.5 times the demand for training [3] - The founder of NVIDIA predicts that the computational requirements for inference will exceed previous estimates by 100 times [3] - The transition from pre-training to inference is becoming evident, with industry predictions indicating that future investments in AI inference will surpass those in training by 10 times [4][6] Group 2: Edge Cloud Advantages - Edge cloud environments provide significant advantages for AI inference due to their proximity to end-users, which enhances response speed and efficiency [5][6] - The geographical distribution of edge cloud nodes reduces data transmission costs and improves user experience by shortening interaction chains [5] - Edge cloud solutions support business continuity and offer additional capabilities such as edge caching and security protection, enhancing the deployment and application of AI models [5][6] Group 3: Cost and Performance Metrics - Future market competition will hinge on cost/performance calculations, including inference costs, latency, and throughput [6] - Running AI applications closer to users improves user experience and operational efficiency, addressing concerns about data sovereignty and high data transmission costs [6] - The shift in investment focus within the AI sector is moving towards inference capabilities rather than solely on training [6]
Sambanova裁员,放弃训练芯片
半导体行业观察· 2025-05-06 00:57
如果您希望可以时常见面,欢迎标星收藏哦~ 来源:本文编译自zach,谢谢。 四月下旬,资金最雄厚的AI芯片初创公司之一SambaNova Systems大幅偏离了最初的目标。与许 多其他AI芯片初创公司一样,SambaNova最初希望为训练和推理提供统一的架构。但从今年开 始,他们放弃了训练的雄心,裁掉了15%的员工,并将全部精力放在AI推理上。而且,他们并非 第一家做出这种转变的公司。 2017 年,Groq 还在吹嘘他们的训练性能,但到了2022 年,他们完全专注于推理基准。Cerebras CS-1 最初主要用于训练工作负载,但CS-2 和后来的版本将重点转向了推理。SambaNova 似乎是 第一代 AI 芯片初创公司中最后一个仍然认真专注于训练的公司,但这种情况终于发生了变化。那 么,为什么所有这些初创公司都从训练转向了推理呢?幸运的是,作为 SambaNova 的前员工(指 代本文作者zach,该作者自称 2019 年至 2021 年期间在 SambaNova Systems 工作),我(指代 本文作者zach,下同)有一些内部人士的见解。 SambaNova 非常重视在其硬件上训练模型。他们发布 ...
过去四周,AI推理爆了,GPU在燃烧,英伟达依旧供不应求
硬AI· 2025-04-29 00:18
根据摩根士丹利Joseph Moore团队25日发布的报告, 这种强劲的需求主要驱动因素在于token生成量的 增长,自年初以来,token生成量增长了5倍以上 ,这给生态系统带来了巨大压力,并推动了对处理这些 工作负载的投资激增。 点击 上方 硬AI 关注我们 大摩指出,受益于大型语言模型对推理芯片的巨大需求,英伟达面临GPU供不应求局面。但在持续的供应限制、毛利率 压力等负面影响下,大摩轻微下调英伟达目标价至160美元。长期来看,公司增长轨迹依然强劲。 硬·AI 作者 | 张雅琦 编辑 | 硬 AI 过去四周,投资者情绪因宏观经济和供应链风险而恶化,但与此同时,对英伟达GPU核心的需求却因主要 大型语言模型(LLM)对推理芯片的巨大需求而飙升,且这种需求遍及所有地区。 多家AI公司报告用户数量呈爆炸式增长,例如,Open Router等API公司的数据显示,许多公司为满足推 理软件的巨量需求,被迫争抢GPU资源,甚至出现"最后一块GB200"在2025年仅剩一块的状况。 摩根士丹利认为, 这种对推理的需求是关键。 这是由使用模型并产生收入的部分驱动的,证明了推理模 型的扩展是真实存在的,这与仅依赖于风险投 ...
过去四周,AI推理爆了,GPU在燃烧,英伟达依旧供不应求
Hua Er Jie Jian Wen· 2025-04-27 10:38
Group 1 - Investor sentiment has deteriorated due to macroeconomic and supply chain risks, but demand for NVIDIA's GPUs has surged due to the significant need for inference chips driven by large language models (LLMs) [1] - Token generation has increased over five times since the beginning of the year, creating immense pressure on the ecosystem and driving a surge in investment to handle these workloads [1] - AI companies are experiencing explosive user growth, with many forced to compete for GPU resources to meet the massive demand for inference software [1] Group 2 - Morgan Stanley has lowered its target price for NVIDIA to $160 from $162, reflecting overall valuation declines in the peer group rather than changes in the company's fundamentals [2] - Despite strong demand, supply constraints for NVIDIA's Blackwell chips, particularly the GB200/300 models, are limiting the ability to meet the explosive growth in demand [2][4] - Morgan Stanley has raised its revenue forecast for fiscal year 2026 by 10.7% and adjusted earnings per share up by 11.9%, indicating that these figures may still be conservative [5]
内存压缩技术新突破,提高AI推理效率!
半导体芯闻· 2025-04-25 10:19
如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容编译自 eetimes ,谢谢。 ZeroPoint Technologies 和 Rebellions 旨在开发一种 AI 加速器,以降低 AI 推理的成本和功耗。 据称,ZeroPoint Technologies 的内存优化技术能够快速压缩数据、增加数据中心的内存容量并提 高每瓦的 AI 推理性能。 2025年4月,瑞典内存优化知识产权(IP)供应商ZeroPoint Technologies(以下简称ZeroPoint) 宣布与Rebellions建立战略合作伙伴关系,共同开发用于AI推理的下一代内存优化AI加速器。该 公司计划在 2026 年发布一款新产品,并声称"有望实现前所未有的代币/秒/瓦特性能水平"。 作为合作的一部分,两家公司将使用 ZeroPoint 的内存压缩、压缩和内存管理技术来增加基本模 型推理工作流程的内存带宽和容量。 ZeroPoint 首席执行官 Klas Moreau 声称其基于硬件的内存 优化引擎比现有的软件压缩方法快 1,000 倍。 ZeroPoint 的内存压缩 IP 价值主张 首先,压缩和解压缩。其次,压缩生成的 ...
大模型一体机塞进这款游戏卡,价格砍掉一个数量级
量子位· 2025-04-09 08:58
Core Viewpoint - The article discusses the rising trend of using Intel's Arc graphics cards in large model all-in-one machines, highlighting their cost-effectiveness compared to traditional NVIDIA cards, making them suitable for small to medium-sized teams [2][8][12]. Group 1: Performance Comparison - A comparison test conducted by Feizhi Cloud showed that an all-in-one machine equipped with four Intel Arc A770 graphics cards took approximately 50 minutes to complete a large task, while a machine with NVIDIA cards took about 30 minutes [6]. - The cost of four Intel Arc graphics cards is significantly lower than that of a single NVIDIA card, making the Intel option more appealing in terms of price-performance ratio [7][8]. Group 2: Market Adoption - The article notes that many companies are increasingly adopting Intel's combination of Arc graphics cards and Xeon W processors for their all-in-one systems, indicating a shift in the industry towards this more affordable solution [23][33]. - Companies like Chaoyun and Yunjian are developing various devices based on Intel's platform, including workstations and high-end all-in-one machines capable of running large models [28][32]. Group 3: Advantages of All-in-One Machines - All-in-one machines offer quick deployment and ease of use, allowing businesses to integrate large models into their operations without complex setup [36]. - The low startup costs associated with all-in-one machines enable companies to run large models initially and iterate over time, reducing financial risk [37]. - These machines simplify operations and maintenance by integrating hardware and software into a unified system, thus lowering management complexity and costs [40]. Group 4: Reliability and Flexibility - The all-in-one systems are designed for stability and reliability, ensuring consistent performance in complex environments, which is crucial for AI applications [41]. - Intel's GPU and CPU combination is adaptable to various applications, supporting a range of open-source models and providing diverse functionality for different business needs [43][44]. Group 5: Industry Impact - The article suggests that the trend of integrating AI models into various industries is akin to the evolution from mainframe computers to personal computers, with Intel aiming to replicate its past success in the AI domain [45][46].
AI芯片,需求如何?
半导体行业观察· 2025-04-05 02:35
Core Insights - The article discusses the emergence of GPU cloud providers outside of traditional giants like AWS, Microsoft Azure, and Google Cloud, highlighting a significant shift in AI infrastructure [1] - Parasail, founded by Mike Henry and Tim Harris, aims to connect enterprises with GPU computing resources, likening its service to that of a utility company [2] AI and Automation Context - Customers are seeking simplified and scalable solutions for deploying AI models, often overwhelmed by the rapid release of new open-source models [2] - Parasail leverages the growth of AI inference providers and on-demand GPU access, partnering with companies like CoreWeave and Lambda Labs to create a contract-free GPU capacity aggregation [2] Cost Advantages - Parasail claims that companies transitioning from OpenAI or Anthropic can save 15 to 30 times on costs, while savings compared to other open-source providers range from 2 to 5 times [3] - The company offers various Nvidia GPUs, with pricing ranging from $0.65 to $3.25 per hour [3] Deployment Network Challenges - Building a deployment network is complex due to the varying architectures of GPU clouds, which can differ in computation, storage, and networking [5] - Kubernetes can address many challenges, but its implementation varies across GPU clouds, complicating the orchestration process [6] Orchestration and Resilience - Henry emphasizes the importance of a resilient Kubernetes control plane that can manage multiple GPU clouds globally, allowing for efficient workload management [7] - The challenge of matching and optimizing workloads is significant due to the diversity of AI models and GPU configurations [8] Growth and Future Plans - Parasail has seen increasing demand, with its annual recurring revenue (ARR) exceeding seven figures, and plans to expand its team, particularly in engineering roles [8] - The company recognizes a paradox in the market where there is a perceived shortage of GPUs despite available capacity, indicating a need for better optimization and customer connection [9]