Workflow
AI Inference
icon
Search documents
KV Cache Acceleration of vLLM using DDN EXAScaler
DDN· 2025-11-11 16:44
AI Inference Challenges & KV Caching Solution - AI inference faces challenges with large context windows, impacting tokenization and latency [1][2] - Caching context tokens speeds up responsiveness, lowers latency, and allows storing larger context amounts [4] - Effective caching requires storage systems with low latency and large capacity at scale [5] DDN's Solution & Performance - DDN's Exoscaler platform enables high-performance KV caching for AI inference, improving user concurrency, responsiveness, and user experience [7] - DDN leverages GPU direct storage (GDS) for cached engine [9] - Caching demonstrates a 10x improvement in performance with larger context [14] - DDN's Exoscaler performance can improve time to first token during inference by 10-25x [16] - DDN improves response times, provides larger cache repository space, and delivers cost-effective performance and capacity density [17] Capacity Implications - KV caching accelerates the end-user experience, putting a premium on high-performance shared storage [16] - Approximately 200,000 input characters resulted in a cache of 796 files, totaling almost 13 gigabytes [15]
Scaling AI Inference Performance in the Cloud with Nebius
NVIDIA· 2025-11-10 14:01
When you build the AI infrastructure, the most important challenge you always face that the industry is moving very fast. When it comes to inference, there is many important components. You still need very reliable underlaying physical infrastructure and it should be more and more performant and our mission is to provide scalable, high performant, high reliable AI cloud.We brought all our experience in building cloud offering but we build the platform from the ground up. So we identify the core AI scenarios ...
Google's Latest AI Chip Puts the Focus on Inference
The Motley Fool· 2025-11-09 11:42
Core Insights - Google has launched its seventh-generation Tensor Processing Unit (TPU), named Ironwood, designed specifically for AI workloads, marking a significant advancement in AI computing capabilities [1][2][3] - The new TPU offers a 10X peak performance improvement over the previous generation and more than 4X better performance per chip for both training and inference tasks [3] - Google is positioning itself in the "age of inference," where the focus shifts from training AI models to utilizing them for practical applications, anticipating a surge in demand for AI computing [5][9] Product Launch and Features - Ironwood TPUs will be available for Google Cloud customers soon, alongside new Arm-based Axion virtual machine instances that enhance performance per dollar [2] - The Ironwood TPU is optimized for high-volume AI inference workloads, which require quick response times and the ability to handle numerous requests [4] Market Position and Growth - Google Cloud generated $15.2 billion in revenue in Q3, reflecting a 34% year-over-year increase, with an operating income of $3.6 billion and an operating margin of approximately 24% [8] - The cloud computing sector is competitive, with Microsoft Azure and Amazon Web Services also expanding their AI capabilities, but Google is leveraging its decade-long experience in TPU development to gain an edge [7][9] Strategic Partnerships - AI companies like Anthropic are expanding their use of Google's TPUs, with a new deal granting access to 1 million TPUs, which is crucial for their goal of reaching $70 billion in revenue by 2028 [6]
Akamai(AKAM) - 2025 Q3 - Earnings Call Transcript
2025-11-06 22:30
Financial Data and Key Metrics Changes - Akamai reported Q3 2025 revenue of $1.055 billion, representing a 5% year-over-year increase as reported and a 4% increase in constant currency [4][20] - Non-GAAP operating margins improved to 31%, and non-GAAP earnings per share was $1.86, up 17% year-over-year as reported and in constant currency [4][20] - Non-GAAP net income for Q3 was $269 million, with a non-GAAP EPS of $1.86, exceeding guidance by $0.20 [21][24] Business Line Data and Key Metrics Changes - Cloud Infrastructure Services (CIS) revenue was $81 million, up 39% year-over-year as reported and in constant currency, accelerating from a 30% growth rate in Q2 [6][19] - Security revenue reached $568 million, up 10% year-over-year as reported and 9% in constant currency, with high-growth security products generating $77 million, an increase of 35% year-over-year [20][14] - Delivery revenue was $306 million, down 4% year-over-year as reported and in constant currency, but showing improved trends [20] Market Data and Key Metrics Changes - International revenue was $525 million, up 9% year-over-year, representing 50% of total revenue in Q3 [20] - Foreign exchange fluctuations positively impacted revenue by $4 million sequentially and $8 million year-over-year [20] Company Strategy and Development Direction - Akamai is transitioning from a CDN pioneer to a leader in cloud security and distributed cloud computing, with a focus on AI inference capabilities [5][10] - The launch of Akamai Inference Cloud aims to support the growing demand for AI inference on the internet, positioning the company to leverage its distributed architecture [7][11] - The company emphasizes the importance of reliability, aiming for five nines of uptime, which is critical for attracting major clients like banks [75] Management's Comments on Operating Environment and Future Outlook - Management expressed confidence in the growth of CIS and high-growth security solutions, anticipating continued strong demand for AI-related services [20][24] - The company expects Q4 revenue to be in the range of $1.065 billion to $1.085 billion, reflecting a 4%-6% increase as reported [23] - Management noted that the AI inference market is at a transition point, with significant growth expected as AI systems are adopted at scale [10][12] Other Important Information - Akamai's CapEx for Q3 was $224 million, representing 21% of revenue, as the company continues to invest in its CIS business [21] - The company has not repurchased any shares in Q3 but has spent $800 million year-to-date on share buybacks, marking the largest annual buyback in its history [21][22] Q&A Session Summary Question: Guidance on security and compute growth - Management reiterated security growth at about 10% and compute growth slightly under 15% for the year, with momentum in CIS [28] Question: Insights on Akamai Inference Cloud - Management indicated strong interest and demand for AI applications, with many customers looking to adopt inference capabilities [30][32] Question: Hiring strategy for sales reps - The company is continuing to hire sales reps to support new business sales in security and compute, with a transformation expected to be largely complete by Q2 next year [36][37] Question: Confidence in benefiting from capacity constraints at hyperscalers - Management highlighted Akamai's unique platform and extensive points of presence, which allow it to provide faster services compared to hyperscalers [41][42] Question: Opportunities in API Security - Management confirmed ongoing efforts to extend API security into new agentic protocols, with strong interest from customers [44] Question: CapEx requirements for inference - Management noted that CapEx will closely follow revenue and demand, with expectations for similar gross margins to current compute margins [46][47] Question: Traffic mix and future trends - Management indicated that video delivery currently dominates traffic, but AI applications are expected to increase traffic significantly in the future [68][70]
Supermicro Launches New 6U 20-Node MicroBlade with AMD EPYC 4005
Yahoo Finance· 2025-10-30 13:31
Core Insights - Super Micro Computer Inc. (NASDAQ:SMCI) is recognized as a promising growth stock for the next five years, particularly following the launch of its new 6U 20-Node MicroBlade system featuring AMD EPYC 4005 Series Processors [1][3] Group 1: Product Launch and Features - The new 6U MicroBlade system is designed to be a cost-effective and environmentally friendly solution for Cloud Service Providers, achieving 3.3 times higher density than traditional 1U servers, allowing up to 160 servers and 16 Ethernet switches in a single 48U rack, resulting in up to 2560 CPU cores per rack [2] - The system utilizes Supermicro's unique building block architecture, providing up to 95% cable reduction, 70% space savings, and 30% energy savings compared to traditional 1U servers, which helps enterprises maximize their Total Cost of Ownership (TCO) savings [3] Group 2: Company Overview - Super Micro Computer Inc. and its subsidiaries develop and sell server and storage solutions based on modular and open-standard architecture across the US, Asia, Europe, and internationally [4]
Will QCOM's New AI Inference Solutions Boost Growth Prospects?
ZACKS· 2025-10-28 13:36
Core Insights - Qualcomm has launched AI200 and AI250 chip-based AI accelerator cards and racks, optimized for AI inference in data centers, utilizing its NPU technology [1][9] - The AI250 features a near-memory computing architecture that provides 10x effective memory bandwidth while optimizing power consumption [2] - The global AI inference market is projected to reach $97.24 billion in 2024, with a compound annual growth rate of 17.5% from 2025 to 2030, indicating a significant growth opportunity for Qualcomm [3] Product Offerings - The AI200 is designed for large language models and multimodal model inference, offering a lower total cost of ownership [2] - Qualcomm's solutions are characterized by high memory capacity, affordability, and flexibility, making them suitable for modern AI data center needs [4] - HUMAIN, a global AI company, has chosen Qualcomm's AI200 and AI250 solutions for high-performance AI inference services [4] Competitive Landscape - Qualcomm competes with NVIDIA, Intel, and AMD in the AI inference market [5] - NVIDIA offers a robust portfolio for AI inference infrastructure, including products like Blackwell and H200 [5] - Intel has launched the Crescent Island GPU optimized for AI inference workloads, while AMD's MI350 Series GPU has set new benchmarks in generative AI [6][7] Financial Performance - Qualcomm shares have increased by 9.3% over the past year, compared to the industry's growth of 62% [8] - The company's shares trade at a forward price/earnings ratio of 15.73, lower than the industry average of 37.93 [10] - Earnings estimates for 2025 remain unchanged, while estimates for 2026 have improved by 0.25% to $11.91 [11]
Qualcomm announces new data center AI chips to target AI inference
CNBC Television· 2025-10-27 14:25
Well, Qualcomm just announced it's taking on Nvidia in AI chips, a massive pivot for a company that built its empire on smartphones. So, they're launching new data center AI chips starting in 2026. And they also just announced their first major customer.That would be Saudi backed AI startup Humane targeting roughly 200 megawatts of capacity starting in 2026. So, they're not essentially going after AI training. That's the market that made Nvidia worth over $4 trillion.Qualcomm is targeting inference. That's ...
JonesResearch recommends Hold on Cipher, Iren, Mara, CleanSpark and issues Buy Ratings on Hut 8, TeraWulf, Riot
Yahoo Finance· 2025-10-20 14:30
Group 1: Cipher Mining (CIFR) - Cipher Mining's stock remains stable after modest cuts to Q3 and full-year 2025 revenue and EBITDA forecasts, with strong execution on its Fluidstack/Google lease and potential follow-on deals noted, although much of the 2027 development pipeline is already priced in, trading at about 87% of estimated pipeline equity value versus a 61% coverage average [2] Group 2: IREN Ltd. (IREN) - IREN's Hold rating reflects downward revisions to near-term production and cost assumptions, partially offset by raised 2026 estimates due to plans to expand its Canadian AI cloud build-out to 60,000 GPUs, but the firm's bare-metal focus is seen as lacking the necessary software depth and enterprise integration for durable returns, with an elevated valuation amid execution and dilution risks [3] Group 3: Mara Holdings (MARA) - Mara remains on Hold after reductions to Q3 and 2025 revenue and EBITDA estimates, with skepticism around its ability to monetize power-management services for AI inference and advance off-grid mining growth, compounded by uncertainty over a proposed 64% acquisition of EDF's Exaion, which is under review on sovereignty grounds [4] Group 4: CleanSpark (CLSK) - CleanSpark's Hold rating follows reductions to Q3 and 2025 estimates due to lower mining uptime, despite management's appointment of Matt Schultz and renewed optimism around AI/HPC optionality, with shares rallying 94% since the leadership change, but the company is preferred to await clearer updates on the scale and timing of its AI/HPC pipeline before any upgrade [5] Group 5: Hut 8 (HUT) - Hut 8 earns a Buy rating with a raised price target to $67, reflecting full value for an estimated 530 MW gross AI/HPC leasing pipeline across River Bend, Batavia, and Texas Site 03, valued at $5.85 billion at a 5.5% cap rate, with American Bitcoin's mining operations dominating results and presenting dilution risk, while exposure to AI/HPC colocation supports long-term upside [6] Group 6: TeraWulf (WULF) - TeraWulf retains a Buy rating with an increased price target of $24, supported by a sum-of-the-parts valuation of its 886 MW AI/HPC pipeline through 2027, spanning Core42/Fluidstack, Lake Mariner, and Cayuga Lake, valued at $13.85 billion at a 5.5% cap rate, along with modestly raised Q3 revenue and EBITDA forecasts on higher hashprice trends [7]
TrendForce:AI存储需求激发HDD替代效应 NAND Flash供应商加速转进大容量Nearline SSD
智通财经网· 2025-10-14 06:04
Core Insights - The demand for real-time access and high-speed processing of massive data is rapidly increasing due to AI inference applications, prompting HDD and SSD suppliers to expand their offerings of high-capacity storage products [1][2] - The HDD market is currently facing a significant supply gap, which is encouraging NAND Flash manufacturers to accelerate the production of ultra-large capacity Nearline SSDs, such as 122TB and 245TB models [1] - The HDD industry is undergoing a painful technological transition, with high initial costs associated with the new HAMR technology leading to a rise in average selling prices (ASP Per GB) from $0.012-$0.013 to $0.015-$0.016, undermining HDD's core cost advantage [1][2] Industry Dynamics - SSDs offer significantly higher IOPS and lower latency compared to HDDs, making them more efficient for AI workloads that involve random data access and quick model parameter retrieval [2] - The power consumption of SSDs is much lower than that of HDDs, which can lead to substantial savings in electricity, cooling costs, and rack space for large data centers, offsetting the higher initial purchase costs of SSDs [2] - As the HDD industry upgrades to HAMR technology and achieves economies of scale, there will be potential for cost optimization; however, NAND Flash's structural advantages in cost reduction and capacity expansion remain significant [2] Market Opportunities - The emergence of the Nearline SSD market presents a significant opportunity for NAND Flash suppliers seeking to diversify beyond smartphone and PC demands, positioning them well for the competition in data center storage architecture over the next decade [2]
NVIDIA Blackwell Sets New Standard in AI Inference with 15X ROI and $75 Million Revenue
NVIDIA· 2025-10-09 23:43
Performance Benchmarks - Blackwell 在 Deepsee R1、GPTOSS 和 Llama 等领先的开源模型上实现了突破性性能,基于 inference max 基准 [1] - 新的基准设计不仅用于理解性能,还包括成本和效率,从而了解大规模部署推理的需求 [2] - GB200 MBL72 单系统可以产生足够的 tokens 来创造 7500 万美元的收入,投资回报率达 15 倍(基于 GPT OSS)[2] - 借助最新的 TRT LLM 软件改进,每个 GPU 每秒能够生成 6 万个 tokens [3] - 对于像 Llama 这样的密集开放模型,每个 GPU 每秒能够生成 1 万个 tokens,是上一代 Hopper 平台的 4 倍 [3] Efficiency Improvements - Blackwell 在功率受限的数据中心中,每兆瓦的性能是上一代 Hopper 平台的 10 倍 [3] - 更多的 tokens 转化为更多收入 [4] Future Expectations - 预计 Blackwell Ultra 将有新的结果,以及更多的软件改进和增强,从而提高 AI 工厂的性能和效率 [4]