Inference
Search documents
X @Polyhedra
Polyhedra· 2025-09-25 12:00
6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...
X @Avi Chawla
Avi Chawla· 2025-09-22 19:59
Dropout Mechanism - During training, the average neuron input is significantly lower compared to inference, potentially causing numerical instability due to activation scale misalignment [1] - Dropout addresses this by multiplying inputs during training by a factor of 1/(1-p), where 'p' is the dropout rate [2] - For example, with a dropout rate of 50%, an input of 50 is scaled to 100 (50 / (1 - 0.5) = 100) [2] - This scaling ensures coherence between training and inference stages for the neural network [2] Training vs Inference - Consider a layer with 100 neurons, each with an activation value of 1, and a weight of 1 from each neuron to neuron 'A' in the next layer [2] - With a 50% dropout rate, approximately 50 neurons are active during training [2] - During inference, all 100 neurons are active since Dropout is not used [2]
CoreWeave CEO: Building AI infrastructure will require trillions in public-private investment
CNBC Television· 2025-09-22 15:45
AI Infrastructure Investment & Scale - Building planetary-scale AI infrastructure requires both private and public sector resources [2] - The AI infrastructure buildout, encompassing energy exploration, power generation, transmission, data centers, supercomputers, and application layers, is estimated to be a multi-trillion dollar investment [5] - This infrastructure is considered a fundamental component of the future economy for the next 50 years [5] Demand & Monetization - The large deals announced by AI labs and hyperscalers indicate significant demand for compute [4] - The majority of compute being built is to serve inference, which represents the monetization of AI [11] - Hyperscalers are being paid to build this infrastructure to serve current and projected client demand [12] Bubble Concerns & Commercial Activity - The question of whether the large capital investments in AI will produce a significant return is being raised [10] - The flow of money into AI is supported by broad-based demand across the technology space as businesses integrate AI into their workflows [13] - The companies deploying capital in AI are among the largest and most successful, differentiating this from the dot-com bubble [14] Comparison to Dot-com Era - Unlike the dot-com era, which was a bolt-on to existing infrastructure, AI requires a new layer of power generation due to increased power consumption [7] - The disruption caused by AI is considered to be on the same order of magnitude as the advent of the internet [8]
X @Avi Chawla
Avi Chawla· 2025-09-22 06:39
Here's a hidden detail about Dropout that many people don't know.Assume that:- There are 100 neurons in a layer, and all activation values are 1.- The weight from 100 neurons to a neuron ‘A’ in the next layer is 1.- Dropout rate = 50%Computing the input of neuron ‘A’:- During training → Approx. 50 (since ~50% of values will be dropped).- During inference → 100 (since we don't use Dropout during inference).So essentially, during training, the average neuron input is significantly lower than that during infer ...
Prediction: These AI Chip Stocks Could Soar (Hint: It's Not Nvidia or Broadcom)
Yahoo Finance· 2025-09-20 19:05
Core Insights - Nvidia and Broadcom are leading the headlines due to significant data center revenue growth driven by strong demand for AI infrastructure, but other chipmakers like AMD and Marvell also have substantial opportunities ahead [2] Group 1: Advanced Micro Devices (AMD) - AMD has historically been a secondary player to Nvidia in the GPU market, but it has a chance to gain market share as the focus shifts towards inference [3] - The demand for chips that handle inference is expected to rise as AI models grow larger and are deployed more widely, with AMD already serving a significant portion of inference traffic for major AI companies [4] - AMD's ROCm software platform has improved and allows for competitive pricing and efficiency, which could enable AMD to capture market share from Nvidia by lowering total costs for customers [5] - The UALink Consortium, founded by AMD, offers an open standard alternative to Nvidia's NVLink, potentially allowing for greater flexibility in multi-GPU systems [6] - Even small market share gains would be impactful for AMD, given its much smaller revenue base compared to Nvidia, which reported over $40 billion in data center revenue last quarter compared to AMD's approximately $3 billion [7] Group 2: Marvell Technology - Marvell is also positioned in the AI infrastructure market, winning custom AI chip designs with various customers, although it currently operates under the shadow of Nvidia and Broadcom [8]
Groq Hits $6.9 Billion Valuation as Inference Demand Surges
Bloomberg Technology· 2025-09-17 18:44
I always like to go back to basics. Jonathan Like Grok stacks up against some video and Google's TPU and we're largely focused on inference. Why did you raise this round in that context.What is it going to allow you to do to be more competitive against those two giants. Well, the bottom line is that the demand for inference is insatiable. The total amount of capacity that people are trying to deploy is mind boggling. The numbers that people are putting up.And that's only growing. And in our case, we don't f ...
Equinix CEO: AI inference in business process needs connectivity which we do
CNBC Television· 2025-09-15 19:38
joined on set by Adair Fox Martin, CEO of Equinex. Adair, thank you for coming in. >> Thank you so much for having me.>> To the wilds of New Jersey. >> Delighted to be here. >> Fant.We're delighted you're here. So, you gave me a really good example before the show began of what you guys do. You compared it to an airport.Explain what that means and who your company is. >> Yeah. So, I think in the world of data center, there's a harmonious view of data centers, but actually there's different types.Um, and Equ ...
X @Avi Chawla
Avi Chawla· 2025-09-12 06:31
Inference/Generation Process - Autoregressive generation is used step-by-step during inference [1] - The encoder runs once, while the decoder runs multiple times [1] - Each step utilizes previous predictions to generate the next token [1]
Jensen Huang & Alex Bouzari: CUDA + NIMs Are Accelerating AI
DDN· 2025-09-05 18:41
I mean the other thing I think is extremely enabling is the CUDA ecosystem which you fostered and nurtured and and and helped really people embark on now with CUDA OBJ I think it is opening all kinds of possibilities because people can now tie into this and apply it combination of Kudo OBJ NIMS you know the inference part of it for specific industries life sciences financial services autonomous driving and so on and so forth you take all these things you tie together with the advances that will be made in t ...
Nvidia wants to be the Ferraris of computing.
Yahoo Finance· 2025-09-03 17:36
Nvidia's Supply and Demand - Nvidia's products are currently sold out, indicating high demand and limited supply [1] - The primary concern is the allocation of available products to different sectors [1] Future Market Focus - The industry anticipates that inference will become a larger business than model training [2] - Inference is likened to the use of electricity, while model training is compared to building a power plant, suggesting a shift in focus towards application [2] Nvidia's Strategy - Nvidia aims to provide the most powerful chips globally, positioning them as the "Ferraris of computing" [3] - Nvidia believes that high-performance chips are the optimal solution for computing needs [3]