Data Transfer
Search documents
X @Avi Chawla
Avi Chawlaยท 2025-10-19 20:32
RT Avi Chawla (@_avichawla)Here's a neural net optimization trick that leads to ~4x faster CPU to GPU transfers.Imagine an image classification task.- We define the network, load the data and transform it.- In the training loop, we transfer the data to the GPU and train.Here's the problem with this:If you look at the profiler:- Most of the time/resources will be allocated to the kernel (the actual training code).- However, a significant amount of time will also be dedicated to data transfer from CPU to GPU ...