X @Avi Chawla - Reportify

The core engineering behind @UnslothAI has always been impressive!Instead of relying on PyTorch's default autograd for backpropagation, Unsloth built their own backprop kernels from scratch in OpenAI's Triton language (a Python-based language for writing GPU kernels without needing to write raw CUDA C++).One of the reasons to do this is that the default autograd runs each operation as a separate GPU call, and each call reads and writes data back to global memory before the next one can start.Across dozens o ...