Workflow
Inference
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-09-12 06:31
Inference/Generation Process - Autoregressive generation is used step-by-step during inference [1] - The encoder runs once, while the decoder runs multiple times [1] - Each step utilizes previous predictions to generate the next token [1]
Jensen Huang & Alex Bouzari: CUDA + NIMs Are Accelerating AI
DDN· 2025-09-05 18:41
I mean the other thing I think is extremely enabling is the CUDA ecosystem which you fostered and nurtured and and and helped really people embark on now with CUDA OBJ I think it is opening all kinds of possibilities because people can now tie into this and apply it combination of Kudo OBJ NIMS you know the inference part of it for specific industries life sciences financial services autonomous driving and so on and so forth you take all these things you tie together with the advances that will be made in t ...
Nvidia wants to be the Ferraris of computing.
Yahoo Finance· 2025-09-03 17:36
Nvidia's Supply and Demand - Nvidia's products are currently sold out, indicating high demand and limited supply [1] - The primary concern is the allocation of available products to different sectors [1] Future Market Focus - The industry anticipates that inference will become a larger business than model training [2] - Inference is likened to the use of electricity, while model training is compared to building a power plant, suggesting a shift in focus towards application [2] Nvidia's Strategy - Nvidia aims to provide the most powerful chips globally, positioning them as the "Ferraris of computing" [3] - Nvidia believes that high-performance chips are the optimal solution for computing needs [3]
Alibaba's AI Chip A Big Deal?
Forbes· 2025-09-03 09:06
Core Insights - Alibaba's stock increased nearly 13% to approximately $135 per share, with a year-to-date rise of close to 60%, following a favorable Q1 earnings report highlighting growth in its cloud business [2] - The company has developed a new AI chip for its cloud computing division, aimed at securing a supply of AI semiconductors amid U.S. export restrictions, while enhancing its cloud competitiveness [2][4] Chip Development - Alibaba's T-Heat unit has been developing AI chips for several years, with the new chip designed for inference workloads, focusing on large language and diffusion models [3] - The new chip is expected to be manufactured using a 7 nanometer process, enhancing its capabilities compared to the previous Hanguang chip, and is rumored to be compatible with Nvidia's software ecosystem [4] Market Context - The development of Alibaba's chip occurs amid geopolitical tensions, with the U.S. restricting leading-edge chip exports to China, prompting Alibaba to reduce reliance on U.S. suppliers [4] - The AI market is shifting focus from training to inference, with Alibaba targeting the inference segment, which is less intensive per task but scales across millions of users [5] Strategic Approach - Alibaba plans to leverage its new chip to enhance Alibaba Cloud, allowing customers to rent computational power, thereby deepening customer dependency and generating recurring revenues [6] - The company is committing 380 billion yuan (approximately $53 billion) towards AI infrastructure over the next three years, motivated by a 26% year-on-year growth in its cloud division [6] Competitive Landscape - Alibaba's new chips are expected to supplement Nvidia's GPUs in its AI strategy, with the company likely to continue using Nvidia hardware for training while focusing its own chips on cloud-based inference [7] - Other Chinese companies, including Baidu and Huawei, are also developing AI chips, but Alibaba's established cloud presence provides a distribution advantage [7]
CoreWeave CEO: Inference More Than 50% of AI Workloads
Bloomberg Technology· 2025-08-13 14:21
Market Dynamics & Industry Trends - Demand is outpacing supply in the computing infrastructure space, particularly for parallelized computing, driven by the traction in artificial intelligence for both training new models and inference [1][3][4] - The industry is experiencing a systemic imbalance, leading to a planetary-scale buildout of computing infrastructure expected to support AI for the next 50 years [8][9][10] - Inference demand is continually increasing and now accounts for over 50% of compute usage, while training demand remains strong [12][13][28] - The newest GPU architectures are used for bleeding-edge training, with prior generations being utilized for inference, showing a generational shift in usage [15] Company Strategy & Business Model - The company focuses on delivering comprehensive supercomputer solutions, emphasizing that delivering 97% is insufficient [4][5] - The company is diversifying its client base beyond hyperscalers like Microsoft and penetrating additional layers of the enterprise space, including VFX and life sciences [16][17][18] - The company structures its business around long-term contracts to insulate from short-term spot price variance in computing [25] - The company has seen a 900 basis points (9%) decrease in the cost of capital in its latest delayed draw facility for non-investment grade clients [26] Operational Challenges & Infrastructure - The key bottleneck is the powered shell, encompassing building, cooling, and electricity distribution [5] - The company anticipates facing a cycle of shortages, moving from powered shell bottlenecks to silicon or networking shortages [6] - Hyperscale clients are extending and broadening their contractual relationships, indicating a broader effort to address the systemic imbalance [22][23]
AMD Shares Sink Despite Strong Growth. Is It Time to Buy the Dip?
The Motley Fool· 2025-08-09 11:05
Core Viewpoint - Advanced Micro Devices (AMD) has experienced solid growth despite temporary challenges from the Chinese export ban, with a year-to-date stock increase of approximately 30% following a recent dip after Q2 earnings results [1] Group 1: Financial Performance - AMD's overall revenue increased by 32% to $7.69 billion in Q2, but adjusted earnings per share (EPS) fell by 30% to $0.48, missing analyst expectations [8] - The data center segment, AMD's primary growth driver, saw a revenue increase of 14% to $3.2 billion, impacted by the inability to sell MI308 GPUs in China [3][8] - The client and gaming segment experienced a significant revenue surge of 69% to $3.6 billion, driven by strong CPU share gains and demand for new gaming GPUs [6] - The embedded segment reported a 4% revenue decline to $824 million, with expectations for sequential growth in the second half of the year [7] Group 2: Market Dynamics - AMD's data center revenue would have grown approximately 39% if not for the $700 million negative impact from the Chinese export restrictions [10] - The company is seeing increasing adoption of its MI300 and MI325 GPUs, with seven out of ten top model builders and AI companies utilizing its products [4] - AMD's CPUs are gaining market share in the server space, driven by rising demand for cloud and on-premises computing and investments in AI infrastructure [5] Group 3: Future Outlook - AMD projects Q3 revenue growth of 28% to $8.7 billion, excluding potential revenue from MI308 shipments to China [8] - The company is on track to introduce its M400 chip, aiming to compete with Nvidia's next-generation Rubin chip, indicating future growth potential in the AI inference market [10][11] - The stock trades at a forward price-to-earnings ratio of 27.5 times 2026 analyst estimates, suggesting potential upside if AMD becomes a significant player in the AI inference market [11]
Iron Mountain(IRM) - 2025 Q2 - Earnings Call Transcript
2025-08-06 13:30
Financial Data and Key Metrics Changes - Revenue increased by 12% to $1.7 billion, adjusted EBITDA grew by 15% to $628 million, and AFFO increased by 15% to $370 million [5][20][21] - Adjusted EBITDA margin was 36.7%, up 120 basis points year on year, reflecting improved margins across all business segments [21][22] Business Line Data and Key Metrics Changes - Global Records and Information Management (RIM) business achieved record revenue of $1.32 billion, up $73 million year on year, with organic storage revenue up 6% [23][24] - Data center revenue was $189 million, an increase of $37 million year on year, with organic storage rental growth of 26% [25][26] - Asset Lifecycle Management (ALM) revenue was $153 million, a 70% increase year on year, with 42% organic growth [28] Market Data and Key Metrics Changes - The data center market remains strong, with pricing trends showing renewal pricing spreads of 13-20% on cash and GAAP basis [26] - The company expects data center revenue growth in excess of 25% in 2026, driven by a strong leasing backlog [27][31] Company Strategy and Development Direction - The company is focused on driving double-digit revenue growth supported by strong cross-selling opportunities in fragmented markets [31][33] - The acquisition of CRC India is expected to enhance the company's digital product portfolio and capitalize on growth opportunities in India [12][31] Management's Comments on Operating Environment and Future Outlook - Management expressed confidence in sustaining double-digit revenue and profit growth, supported by strong customer relationships and operational execution [18][31] - The company is increasing its financial guidance for the year based on strong second-quarter performance and positive outlook [31][32] Other Important Information - The company invested $477 million in the second quarter, with $442 million allocated to growth CapEx [29] - The quarterly dividend declared is $0.785 per share, with a payout ratio of 63% [29] Q&A Session Summary Question: Data center signings came in lighter than expected; can you elaborate on the slowdown? - Management noted that while the market remains strong, customers have been prioritizing large campuses for AI, which has affected leasing activity [35][36] Question: Is the slowdown in data center leasing just timing? - Management indicated that the focus on large language models has shifted back to their core markets, which should improve leasing activity going forward [38][40] Question: Can you break down the ALM growth in the quarter? - ALM growth was balanced between enterprise and data center, with volume being the primary driver of growth [45][48] Question: What are the dynamics in the hyperscale decommissioning sector? - Management highlighted their competitive advantage in providing secure and flexible decommissioning services, which has led to recent wins [52][54] Question: Can you discuss the margin trajectory and flow-through? - Management confirmed a 47% flow-through margin, driven by strong performance in the global RIM and data center businesses [60][62] Question: Can you clarify the revenue from the treasury contract? - Management stated that only $1 million of revenue was recognized in Q2, with expectations for more significant revenue in 2026 [64][69] Question: What are the targets for megawatts this year? - The expected range for new lease signings is 30 to 80 megawatts, with year-to-date signings at about 6 megawatts [72][74] Question: How is the company positioned in the data center ecosystem? - Management emphasized their focus on AI inference and cloud infrastructure, highlighting strong demand in key markets [78][82] Question: Can you elaborate on the growth in the digital business? - The digital business is experiencing strong growth due to unique capabilities in managing unstructured data, with a projected run rate of over $540 million [87][88]
Hacking the Inference Pareto Frontier - Kyle Kranen, NVIDIA
AI Engineer· 2025-08-01 13:45
Challenges in LLM Inference - LLM inference systems face challenges related to latency, cost, and output quality, impacting user experience, profitability, and applicability [1] - The trade-offs between cost, throughput, latency, and quality define a Pareto frontier, limiting the successful application of LLM systems [1] NVIDIA Dynamo and Inference Techniques - NVIDIA Dynamo, a datacenter-scale distributed inference framework, aims to improve the Pareto frontier of inference systems [1] - Techniques employed include disaggregation (separating LLM generation phases), speculation (predicting multiple tokens per cycle), KV routing, storage, and manipulation (avoiding redundant work), and pipelining improvements for agents (accelerating workflows) [1] Key Inference Optimization Strategies - Disaggregation enhances efficiency by separating phases of LLM generation [1] - Speculation predicts multiple tokens per cycle to improve throughput [1] - KV routing, storage, and manipulation prevent redoing work, optimizing resource utilization [1] - Pipelining improvements for agents accelerate workflows by leveraging agent information [1]
Jensen Huang on Why Data Intelligence is the Future of AI
DDN· 2025-07-31 16:13
AI应用与数据 - AI应用正从训练模型转向利用前沿模型解决大型问题[1] - 应用阶段的数据重要性被低估,AI需要访问信息而非原始数据[1] - 行业正在将对象和原始数据的存储重构为数据智能[1] 存储与计算 - 数据智能为全球企业提供AI运行所需的信息结构[1] - 这代表着计算和存储关系的非凡重构[1]
X @Avi Chawla
Avi Chawla· 2025-07-29 06:30
Performance Comparison - LitServe is reported to be 2x faster than FastAPI [2] Key Features - LitServe offers full control over inference processes [2] - The platform supports serving various model types, including LLMs, vision, audio, and multimodal models [2] - LitServe enables the composition of agents, RAG (Retrieval-Augmented Generation), and pipelines within a single file [2]