X @Avi Chawla
Avi Chawlaยท2025-11-11 20:14
RT Avi Chawla (@_avichawla)Transformer and Mixture of Experts in LLMs, explained visually!Mixture of Experts (MoE) is a popular architecture that uses different experts to improve Transformer models.Transformer and MoE differ in the decoder block:- Transformer uses a feed-forward network.- MoE uses experts, which are feed-forward networks but smaller compared to those Transformer.During inference, a subset of experts are selected. This makes inference faster in MoE.Also, since the network has multiple decod ...