Kimi Linear model - filings, earnings calls, financial reports, news

Kimi Linear model

Search documents

Avi Chawla· 2026-03-16 20:41

RT Avi Chawla (@_avichawla)Big release from Kimi!They just released a new way to handle residual connections in Transformers.In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection.If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs.Every layer contributes with weight=1, so every layer gets equal importance.This creates a problem called PreNorm dilut ...

Transformers

residual connections

Attention Residuals

Block Attention Residuals

Block Attention Residuals

PreNorm dilution

Kimi Linear model

X @Avi Chawla

Avi Chawla· 2026-03-16 09:17

Big release from Kimi!They just released a new way to handle residual connections in Transformers.In a standard Transformer, every sub-layer (attention or MLP) computes an output and adds it back to the input via a residual connection.If you consider this across 40+ layers, the hidden state at any layer is just the equal-weighted sum of all previous layer outputs.Every layer contributes with weight=1, so every layer gets equal importance.This creates a problem called PreNorm dilution, where as the hidden st ...

residual connections

Attention Residuals

Transformers

Block Attention Residuals

Block Attention Residuals

PreNorm dilution

Kimi Linear model