Workflow
Inference
icon
Search documents
Altimeter's Brad Gerstner's offers his AI playbook
CNBC Television· 2025-06-12 17:50
Uh, you know, I had Jensen uh Hang on the BG2 podcast uh last year. It's when people were were saying, "Oh, this training is hitting an upper bound and all this AI is overblown and remember uh Nvidia fell and all these AI stocks were falling and and Jensen Hang said on on the podcast, he said, you know, we are now moving into inference time reasoning where the machines begin to recursively think for themselves." And he said at that moment, inference isn't going to 10x. It isn't going to 100x. It isn't going ...
美银-英伟达会议纪要
美银· 2025-06-05 06:42
Investment Rating - The report indicates a bullish sentiment towards NVIDIA and its leadership in the inference market, particularly with the introduction of DeepSeek, which is expected to significantly expand the inference revenue pool [1][2]. Core Insights - NVIDIA is positioned as a leader in the inference market, with the DeepSeek model democratizing reasoning capabilities and potentially increasing token consumption per user by 13-20 times, leading to a twenty-fold expansion in the inference revenue pool [1]. - The GB200 architecture is designed for multi-GPU reasoning, transforming training chips into inference workhorses, which aligns with the growing demand for long-context inference [2]. - The report emphasizes that inference is eclipsing training in profit potential, prompting a shift in cloud capital expenditure towards sustained demand for accelerators [2]. - Open-source acceleration through DeepSeek is expected to optimize future models for NVIDIA's NVLink from inception, reinforcing its platform advantage [3]. - A breakthrough in combining Mixture-of-Experts with MLA compression technology is noted, which reduces memory usage while maintaining accuracy, thus bending the cost curve for large language models (LLMs) [4]. Summary by Sections NVIDIA - The DeepSeek model is a significant catalyst for expanding reasoning capabilities, with a model size of 671 billion parameters that enhances user engagement and revenue potential [1]. - The GB200 architecture is tailored for multi-GPU reasoning, enhancing NVIDIA's position in the inference market [2]. - The report highlights the shift in profit pools from training to inference, indicating a robust future for NVIDIA's cloud services [2]. - Open-source initiatives are expected to solidify NVIDIA's competitive edge by ensuring new models are optimized for its technology [3]. - Innovations in memory compression techniques are set to lower costs while preserving performance in LLMs [4]. ServiceNow - ServiceNow is experiencing tangible AI successes, with significant transformations in major accounts demonstrating real value [5]. - The company is rapidly monetizing its AI capabilities through Pro Plus and turnkey agents, indicating a shift in market perception of AI as essential [6]. - Federal revenue growth of 30% year-over-year showcases the strength of ServiceNow's vertical strategies [7]. - The company has engaged with over 40 Fortune 500 companies in AI design sessions, indicating strong demand for its services [7]. Twilio - Twilio's focus on AI-driven efficiency and high-margin products is expected to sustain margin growth [8]. - The company has achieved a significant increase in operating margins, with automation and AI expected to further enhance operational leverage [9]. - New software-centric products are projected to improve gross margins over time [10]. - Twilio's strategy emphasizes partnerships over building proprietary models, allowing for rapid deployment of AI solutions [11]. Booking Holdings - Booking Holdings aims for over 8% growth in bookings and revenue, with a focus on alternative accommodations outpacing traditional hotels [14][15]. - The company has successfully expanded into the flight booking sector, demonstrating rapid growth potential [35]. - Attractions revenue has surged by 92% year-over-year, indicating strong market demand [36]. Microchip Technology - Microchip Technology expresses optimism in the analog sector, supported by record bookings and backlog growth [16]. - The company has raised its revenue and EPS guidance based on strong demand data [17]. - Inventory management strategies are in place to recover gross margins to targeted levels [17][20]. Cisco Systems - Cisco Systems is positioned for a durable growth cycle, with a focus on disciplined spending and transparency [27]. - The company reports steady demand despite tariff concerns, with significant year-over-year growth in product bookings [28]. - Cisco is capitalizing on a multi-year AI networking cycle, with opportunities in various sectors [29].
Building Scalable Foundations for Large Language Models
DDN· 2025-05-27 22:00
AI Infrastructure & Market Trends - Modern AI applications are expanding across various sectors like finance, energy, healthcare, and research [3] - The industry is evolving from initial LLM training to Retrieval Augmented Generation (RAG) pipelines and agentic AI [3] - Vulture is positioned as an alternative hyperscaler, offering cloud infrastructure with 50-90% cost savings compared to traditional providers [4] - A new 10-year cycle requires rethinking infrastructure to support global AI model deployment, necessitating AI-native architectures [4] Vulture & DDN Partnership - Vulture and DDN share a vision for radically rethinking the infrastructure landscape to support global AI deployment [4] - The partnership aims to build a data pipeline to bring data to GPU clusters for training, tuning, and deploying models [4] - Vulture provides the compute infrastructure pipeline, while DDN offers the data intelligence platform to move data [4] Scalability & Flexibility - Enterprises need composable infrastructure for cost-efficient AI model delivery at scale, including automated provisioning of GPUs, models, networking, and storage [2] - Elasticity is crucial to scale GPU and storage resources up and down based on demand, avoiding over-provisioning [3] - Vulture's worldwide serverless inference infrastructure scales GPU resources to meet peak demand in different regions, optimizing costs [3] Performance & Customer Experience - Improving customer experience requires lightning-fast and relevant responses, making time to first token and tokens per second critical metrics [4] - Consistency in response times is essential, even with thousands of concurrent users [4] - The fastest response for a customer is the ultimate measure of customer satisfaction [4] Data Intelligence Platform - DDN's Exascaler offers high throughput for training, with up to 16x faster data loading and checkpointing compared to other parallel file systems [5] - DDN's Infinia provides low latency for tokenization, vector search, and RAG lookups, with up to 30% lower latency [5] - The DDN data intelligence platform helps speed up data response times, enabling saturated GPUs to respond quickly [6]
Pay Close Attention to This Crucial Revenue Source for Artificial Intelligence (AI) Giant IBM
The Motley Fool· 2025-04-06 13:00
Core Insights - IBM is often overlooked in discussions about the future of artificial intelligence, but it should be considered by investors as a potential player in the AI space [2][3] Company Overview - IBM's primary profit center is software, which constitutes over 40% of its revenue and nearly two-thirds of its gross profits [4] - Hardware sales are crucial for generating software and consulting revenue, with every $1 spent on IBM's cloud hardware leading to an additional $3 to $5 on software and $6 to $8 on services [5] Market Position - While IBM's share of the AI data center market is significantly smaller than Nvidia's, which generated over $35 billion in AI data center revenue, IBM is still a company to watch as the AI sector evolves [3][6] - The enterprise infrastructure business is currently stagnant, primarily due to weak sales of its Z series mainframes [7] Technological Advancements - IBM's Z16 and upcoming Z17 platforms excel in a type of machine learning called inference, which is becoming increasingly relevant in the AI industry [9][11] - The global AI inference server market is projected to grow at an annualized rate of over 18% through 2034, indicating potential demand for IBM's mainframe servers [12] Investment Potential - IBM is positioned as a strong player in the inference market, with its Telum II processors capable of handling 24 TOPS, which could lead to significant revenue growth [14] - Increased AI server revenue could translate into higher-margin software revenue for IBM, with much of this new revenue being recurring [15]
2 reasons why Nvidia's Jensen Huang isn't worried
Business Insider· 2025-03-19 15:17
Core Insights - Jensen Huang, CEO of Nvidia, is optimistic about continued spending on Nvidia's products, driven by new powerful chips and the industry's shift towards inference in AI [1][8] - Nvidia is set to release a new generation of Rubin GPUs next year, which are expected to significantly outperform previous models [2][3] - Huang highlighted that the demand for computation in AI has increased dramatically, requiring 100 times more than previously anticipated, which supports the need for Nvidia's advanced chips [5] Company Developments - The upcoming Rubin chips will succeed the Blackwell and Hopper lines, with an ultra version of Rubin projected to have 14 times the performance of the ultra version of Blackwell [2] - Nvidia is also teasing a future line of chips named after physicist Richard Feynman, expected to surpass the performance of the Rubin chips [3] Market Reactions - Despite Huang's confidence, Nvidia's shares fell by over 3.4% following his keynote, indicating investor concerns about the implications of DeepSeek on chip demand and slowing revenue growth [6][8] - Nvidia's fourth-quarter revenue for 2024 was $39.3 billion, reflecting a 78% increase year-on-year, but this growth rate is lower than the 262% seen in the first quarter [6]