Workflow
CPU to GPU transfers
icon
Search documents
X @Avi Chawla
Avi Chawlaยท 2025-10-19 06:31
Here's a neural net optimization trick that leads to ~4x faster CPU to GPU transfers.Imagine an image classification task.- We define the network, load the data and transform it.- In the training loop, we transfer the data to the GPU and train.Here's the problem with this:If you look at the profiler:- Most of the time/resources will be allocated to the kernel (the actual training code).- However, a significant amount of time will also be dedicated to data transfer from CPU to GPU (this appears under cudaMem ...