AMD
Search documents
AAI 2025 | Powering AI at Scale: OCI Superclusters with AMD
AMD· 2025-07-15 16:01
AI Workload Challenges & Requirements - AI workloads differ from traditional cloud workloads due to the need for high throughput and low latency, especially in large language model training involving thousands of GPUs communicating with each other [2][3][4] - Network glitches like packet drops, congestion, or latency can slow down the entire training process, increasing training time and costs [3][5] - Networks must support small to large-sized clusters for both inference and training workloads, requiring high performance and reliability [8] - Networks should scale up within racks and scale out across data halls and data centers, while being autonomous and resilient with auto-recovery capabilities [9][10] - Networks need to support increasing East-West traffic, accommodating data transfer from various sources like on-premises data centers and other cloud locations, expected to scale 30% to 40% [10] OCI's Solution: Backend and Frontend Networks - OCI addresses AI workload requirements by implementing a two-part network architecture: a backend network for high-performance AI and a frontend network for data ingestion [11][12] - The backend network, designed for RDMA-intensive workloads, supports AI, HPC, Oracle databases, and recommendation engines [13] - The frontend network provides high-throughput and reliable connectivity within OCI and to external networks, facilitating data transfer from various locations [14] OCI's RDMA Network Performance & Technologies - OCI utilizes RDMA technology powered by RoCEv2, enabling high-performance, low-latency RDMA traffic on standard Ethernet hardware [18] - OCI's network supports multi-class RDMA workloads using Q-cure techniques in switches, accommodating different requirements for training, HPC, and databases on the same physical network [20] - Independent studies show OCI's RDMA network achieves near line-rate throughput (100 gig) with roundtrip delays under 10 microseconds for HPC workloads [23] - OCI testing demonstrates close to 96% of the line rate (400 gig throughput) with Mi300 clusters, showcasing efficient network utilization [25] Future Roadmap: Zeta-Scale Clusters with AMD - OCI is partnering with AMD to build a zeta-scale Mi300X cluster, powering over 131,000 GPUs, which is nearly triple the compute power and 50% higher memory bandwidth [26] - The Mi300X cluster will feature 288 gig HBM3 memory, enabling customers to train larger models and improve inferencing [26] - The new system will utilize AMD AI NICs, enabling innovative standards-based RoCE networking at peak performance [27]
Vik Malyala, Supermicro President & Managing Director, EMEA; SVP Technology & AI
AMD· 2025-07-15 14:30
Innovation doesn’t happen in isolation. It takes a village. Vik Malyala, Managing Director and President, EMEA, SVP Technology and AI at Supermicro, shares how open collaboration, with AMD and across the ecosystem, is key to bringing real value to customers and improving the human experience. #AdvancingAI *** Subscribe: https://bit.ly/Subscribe_to_AMD Join the AMD Red Team Discord Server: https://discord.gg/amd-red-team Like us on Facebook: https://bit.ly/AMD_on_Facebook Follow us on Twitter: https://bit.ly ...
AMD Ryzen AI Max PRO Series: A New Generation of AI PC Workstations to Supercharge AI Workflows
AMD· 2025-07-14 17:00
Industry Trends & Technological Advancement - The automotive industry is in constant evolution, with a growing need for advanced tools and applications, including AI, to enhance design work [1] - AI is being integrated into workflows to automate tasks, allowing designers to focus on creative aspects rather than administrative or menial tasks [2] AMD's Impact - AMD processors facilitate AI-assisted research and software optimization within the automotive design process [2] - AMD workstations are considered compact and a potential "automotive workstation of the future," offering a game-changing solution for designers [3]
Ravi Kuppuswamy, AMD SVP of Server Product & Engineering
AMD· 2025-07-14 15:01
AI is moving fast. Leaders need to keep up. Ravi Kuppuswamy, Senior Vice President, Server Product & Engineering at AMD, shares 3 principles to follow: 1. Set a clear mission with simple, trackable steps 2. Use AI as a strategic tool 3. Upskill and empower teams with accountability #AdvancingAI *** Subscribe: https://bit.ly/Subscribe_to_AMD Join the AMD Red Team Discord Server: https://discord.gg/amd-red-team Like us on Facebook: https://bit.ly/AMD_on_Facebook Follow us on Twitter: https://bit.ly/AMD_On_Twi ...
AMD | Your AI Future Starts with a Partner You Trust
AMD· 2025-07-14 14:01
Trust has always been at the heart of progress. It's how ideas take flight From test To triumph. But trust also asks us to believe in one another And that's why trust has to be earned.It's earned by relentlessly working together to solve the most important challenges And drive results. That's why as we advance into the AI future, trust has to lead the way. So yes.we've built a roadmap we deliver on. On time. Yes.we build for the open ecosystem. And yes. we've built the broadest AI portfolio with CPUs, GPUs ...
AMD | We Co - Innovate to Bring Your AI Vision Into Focus
AMD· 2025-07-14 14:01
We co-innovate with partners to advance their AI ambitions and solve the world's most important challenges. So you can trust our solutions will match your vision. ...
AMD | Power Your Full AI Potential With Our Full Portfolio
AMD· 2025-07-14 14:01
Core Business & Technology - The company offers a broad portfolio of CPU, GPU, and FPGA solutions to power the full potential of AI [1] - The company aims to provide the right solution for each customer's specific AI needs [1]
AMD | Your AI Future Starts With a Partner You Trust
AMD· 2025-07-14 14:01
Trust has always been at the heart of progress. It's how ideas take flight. That's why as we advance into the AI future, trust has to lead the way.So yes, we've built a roadmap we deliver on, on time. We build for the open ecosystem. And we've built the broadest AI portfolio with CPUs, GPUs and FPGAs.But of all the things we build, trust is the most important. ...
AMD | Break Barriers to Build AI Advancements
AMD· 2025-07-14 14:00
We build for the open ecosystem to break down barriers to AI breakthroughs. So you can trust that you'll have the full freedom to innovate. Now and in the future. ...
AMD | Drive Your AI Ambitions With Our Reliable Roadmap
AMD· 2025-07-14 14:00
We've built a reliable roadmap that we deliver on, on time. So you can trust we'll deliver on your AI plans, no matter what road you take. ...