X @Avi Chawla
Avi Chawla·2026-03-08 06:33
Transformer and Mixture of Experts, explained visually!Mixture of Experts (MoE) is a popular architecture that uses different experts to improve Transformer models.Transformer and MoE differ in the decoder block:- Transformer uses a feed-forward network.- MoE uses experts, which are feed-forward networks but smaller compared to those Transformer.During inference, a subset of experts are selected. This makes inference faster in MoE.Also, since the network has multiple decoder layers:- The text passes through ...