Inference Chips for Agent Workflows

Most AI chips are designed for a world where inference means prompt in response out. Agents don't work that [music] way. They loop, calling tools, branching, backtracking, holding context across dozens of steps.That's a completely [music] different hardware problem. Current GPUs hit 30 to 40% of peak utilization on these workloads because the work is bursty, bouncing between memory bound model calls, IO bound tool use, and CPU bound orchestration. That gap is where purpose-built silicon wins.[music] Nvidia ...