Mixtral 8x7B
Search documents
X @Avi Chawla
Avi Chawla· 2025-11-11 20:14
RT Avi Chawla (@_avichawla)Transformer and Mixture of Experts in LLMs, explained visually!Mixture of Experts (MoE) is a popular architecture that uses different experts to improve Transformer models.Transformer and MoE differ in the decoder block:- Transformer uses a feed-forward network.- MoE uses experts, which are feed-forward networks but smaller compared to those Transformer.During inference, a subset of experts are selected. This makes inference faster in MoE.Also, since the network has multiple decod ...
X @Avi Chawla
Avi Chawla· 2025-06-14 06:30
Model Architecture - Mixture of Experts (MoE) models activate only a fraction of their parameters during inference, leading to faster inference [1] - Mixtral 8x7B by MistralAI is a popular MoE-based Large Language Model (LLM) [1] - Llama 4 is another popular MoE-based LLM [1]