Workflow
Inference
icon
Search documents
CoreWeave CEO: Inference More Than 50% of AI Workloads
Bloomberg Technology· 2025-08-13 14:21
Michael. To me, the situation is pretty clear, right. Demand is outpacing supply and cool waves, trying to scale as fast as it can to redress that that balance.Could you give specifics on what that looks like on how you're trying to scale your offering at both, both in terms of literal infrastructure build out, but also keeping pace with technology that suits you. Thank you for for giving me the opportunity to speak with you. You know, we just had our quarterly earnings call yesterday and we reported an abs ...
AMD Shares Sink Despite Strong Growth. Is It Time to Buy the Dip?
The Motley Fool· 2025-08-09 11:05
Core Viewpoint - Advanced Micro Devices (AMD) has experienced solid growth despite temporary challenges from the Chinese export ban, with a year-to-date stock increase of approximately 30% following a recent dip after Q2 earnings results [1] Group 1: Financial Performance - AMD's overall revenue increased by 32% to $7.69 billion in Q2, but adjusted earnings per share (EPS) fell by 30% to $0.48, missing analyst expectations [8] - The data center segment, AMD's primary growth driver, saw a revenue increase of 14% to $3.2 billion, impacted by the inability to sell MI308 GPUs in China [3][8] - The client and gaming segment experienced a significant revenue surge of 69% to $3.6 billion, driven by strong CPU share gains and demand for new gaming GPUs [6] - The embedded segment reported a 4% revenue decline to $824 million, with expectations for sequential growth in the second half of the year [7] Group 2: Market Dynamics - AMD's data center revenue would have grown approximately 39% if not for the $700 million negative impact from the Chinese export restrictions [10] - The company is seeing increasing adoption of its MI300 and MI325 GPUs, with seven out of ten top model builders and AI companies utilizing its products [4] - AMD's CPUs are gaining market share in the server space, driven by rising demand for cloud and on-premises computing and investments in AI infrastructure [5] Group 3: Future Outlook - AMD projects Q3 revenue growth of 28% to $8.7 billion, excluding potential revenue from MI308 shipments to China [8] - The company is on track to introduce its M400 chip, aiming to compete with Nvidia's next-generation Rubin chip, indicating future growth potential in the AI inference market [10][11] - The stock trades at a forward price-to-earnings ratio of 27.5 times 2026 analyst estimates, suggesting potential upside if AMD becomes a significant player in the AI inference market [11]
Iron Mountain(IRM) - 2025 Q2 - Earnings Call Transcript
2025-08-06 13:30
Financial Data and Key Metrics Changes - Revenue increased by 12% to $1.7 billion, adjusted EBITDA grew by 15% to $628 million, and AFFO increased by 15% to $370 million [5][20][21] - Adjusted EBITDA margin was 36.7%, up 120 basis points year on year, reflecting improved margins across all business segments [21][22] Business Line Data and Key Metrics Changes - Global Records and Information Management (RIM) business achieved record revenue of $1.32 billion, up $73 million year on year, with organic storage revenue up 6% [23][24] - Data center revenue was $189 million, an increase of $37 million year on year, with organic storage rental growth of 26% [25][26] - Asset Lifecycle Management (ALM) revenue was $153 million, a 70% increase year on year, with 42% organic growth [28] Market Data and Key Metrics Changes - The data center market remains strong, with pricing trends showing renewal pricing spreads of 13-20% on cash and GAAP basis [26] - The company expects data center revenue growth in excess of 25% in 2026, driven by a strong leasing backlog [27][31] Company Strategy and Development Direction - The company is focused on driving double-digit revenue growth supported by strong cross-selling opportunities in fragmented markets [31][33] - The acquisition of CRC India is expected to enhance the company's digital product portfolio and capitalize on growth opportunities in India [12][31] Management's Comments on Operating Environment and Future Outlook - Management expressed confidence in sustaining double-digit revenue and profit growth, supported by strong customer relationships and operational execution [18][31] - The company is increasing its financial guidance for the year based on strong second-quarter performance and positive outlook [31][32] Other Important Information - The company invested $477 million in the second quarter, with $442 million allocated to growth CapEx [29] - The quarterly dividend declared is $0.785 per share, with a payout ratio of 63% [29] Q&A Session Summary Question: Data center signings came in lighter than expected; can you elaborate on the slowdown? - Management noted that while the market remains strong, customers have been prioritizing large campuses for AI, which has affected leasing activity [35][36] Question: Is the slowdown in data center leasing just timing? - Management indicated that the focus on large language models has shifted back to their core markets, which should improve leasing activity going forward [38][40] Question: Can you break down the ALM growth in the quarter? - ALM growth was balanced between enterprise and data center, with volume being the primary driver of growth [45][48] Question: What are the dynamics in the hyperscale decommissioning sector? - Management highlighted their competitive advantage in providing secure and flexible decommissioning services, which has led to recent wins [52][54] Question: Can you discuss the margin trajectory and flow-through? - Management confirmed a 47% flow-through margin, driven by strong performance in the global RIM and data center businesses [60][62] Question: Can you clarify the revenue from the treasury contract? - Management stated that only $1 million of revenue was recognized in Q2, with expectations for more significant revenue in 2026 [64][69] Question: What are the targets for megawatts this year? - The expected range for new lease signings is 30 to 80 megawatts, with year-to-date signings at about 6 megawatts [72][74] Question: How is the company positioned in the data center ecosystem? - Management emphasized their focus on AI inference and cloud infrastructure, highlighting strong demand in key markets [78][82] Question: Can you elaborate on the growth in the digital business? - The digital business is experiencing strong growth due to unique capabilities in managing unstructured data, with a projected run rate of over $540 million [87][88]
Hacking the Inference Pareto Frontier - Kyle Kranen, NVIDIA
AI Engineer· 2025-08-01 13:45
Challenges in LLM Inference - LLM inference systems face challenges related to latency, cost, and output quality, impacting user experience, profitability, and applicability [1] - The trade-offs between cost, throughput, latency, and quality define a Pareto frontier, limiting the successful application of LLM systems [1] NVIDIA Dynamo and Inference Techniques - NVIDIA Dynamo, a datacenter-scale distributed inference framework, aims to improve the Pareto frontier of inference systems [1] - Techniques employed include disaggregation (separating LLM generation phases), speculation (predicting multiple tokens per cycle), KV routing, storage, and manipulation (avoiding redundant work), and pipelining improvements for agents (accelerating workflows) [1] Key Inference Optimization Strategies - Disaggregation enhances efficiency by separating phases of LLM generation [1] - Speculation predicts multiple tokens per cycle to improve throughput [1] - KV routing, storage, and manipulation prevent redoing work, optimizing resource utilization [1] - Pipelining improvements for agents accelerate workflows by leveraging agent information [1]
Jensen Huang on Why Data Intelligence is the Future of AI
DDN· 2025-07-31 16:13
training a model to the journey now of us uh uh taking advantage of these incredible frontier models and AI models and turning them to AI applications that are for inference and and solving solving large problems. One of the most important things people forgot uh is the importance of uh data that is necessary uh during application not just during training and so so of course you want to train on a vast amount of of data for pre-training um but during use the AI has to access information and uh AI would like ...
X @Avi Chawla
Avi Chawla· 2025-07-29 06:30
The image below shows an interaction.I like LitServe because:- It’s 2x faster than FastAPI.- It gives full control over inference.- We can serve any model (LLM, vision, audio, multimodal).- We can compose agents, RAG & pipelines in one file.Check this👇 https://t.co/yjW0Hh0DqF ...
POC to PROD: Hard Lessons from 200+ Enterprise GenAI Deployments - Randall Hunt, Caylent
AI Engineer· 2025-07-23 15:50
Core Business & Services - Kalin builds custom solutions for clients, ranging from Fortune 500 companies to startups, focusing on app development and database migrations [1][2] - The company leverages generative AI to automate business functions, such as intelligent document processing for logistics management, achieving faster and better results than human annotators [20][21] - Kalin offers services ranging from chatbot and co-pilot development to AI agent creation, tailoring solutions to specific client needs [16] Technology & Architecture - The company utilizes multimodal search and semantic understanding of videos, employing models like Nova Pro and Titan v2 for indexing and searching video content [6][7] - Kalin uses various databases including Postgress, PG vector, and OpenSearch for vector search implementations [13] - The company builds AI systems on AWS, utilizing services like Bedrock and SageMaker, and custom silicon like Tranium and Inferentia for price performance improvements of approximately 60% over Nvidia GPUs [27] AI Development & Strategy - Prompt engineering has proven highly effective, sometimes negating the need for fine-tuning models [40] - Context management is crucial for differentiating applications, leveraging user data and history to make strategic inferences [33][34] - UX design is important for mitigating the slowness of inference, with techniques like caching and UI spinners improving user experience [36][37]
X @Avi Chawla
Avi Chawla· 2025-07-17 06:30
Model Performance - Student model inference run-time significantly increased by 35% compared to the teacher model [1] - The 35% speed increase of the student model only resulted in a 1-2% performance drop [1]
Advanced Insights S2E4: Deploying Intelligence at Scale
AMD· 2025-06-25 17:00
AI Infrastructure & Market Perspective - Oracle views AI at an inflection point, suggesting significant growth and change in the industry [1] - The discussion highlights that it's a great time to be an AI customer, implying increased options and competitive pricing [1] - Enterprise AI adoption is underway, but the extent of adoption is still being evaluated [1] - The future of AI training and inference is a key area of focus, indicating ongoing development and innovation [1] Technology & Partnerships - Oracle emphasizes making AI easy for enterprise adoption, suggesting user-friendly solutions and services [1] - AMD and Oracle have a performance-driven partnership, indicating collaboration to optimize AI infrastructure [1] - Cross-collaboration across the AI ecosystem is considered crucial for advancement [1] - Co-innovation on MI355 and future roadmaps between AMD and Oracle is underway [1] - Openness and freedom from lock-in are promoted, suggesting a preference for flexible and interoperable AI solutions [1] Operational Considerations - Training large language models at scale requires evolving compute needs and energy efficiency [1] - Operating in a scarce environment is a challenge, potentially referring to resource constraints like compute power or data [1] - Edge inference can be enabled with fewer GPUs, suggesting advancements in efficient AI deployment [1] Ethical & Societal Impact - Societal impact, guardrails, and responsibility are important considerations in the development and deployment of AI [1]
Altimeter's Brad Gerstner's offers his AI playbook
CNBC Television· 2025-06-12 17:50
Uh, you know, I had Jensen uh Hang on the BG2 podcast uh last year. It's when people were were saying, "Oh, this training is hitting an upper bound and all this AI is overblown and remember uh Nvidia fell and all these AI stocks were falling and and Jensen Hang said on on the podcast, he said, you know, we are now moving into inference time reasoning where the machines begin to recursively think for themselves." And he said at that moment, inference isn't going to 10x. It isn't going to 100x. It isn't going ...