Workflow
Avi Chawla
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-12-05 13:42
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/xGHVOyNLz1Avi Chawla (@_avichawla):An MCP server that detects production-grade code quality issues in real-time!Even though AI is now generating code at light speed, the engineering bottleneck has just moved from writing to reviewing, and now devs spend 90% of their debugging time on AI-generated code.AI https://t.co/MoAUkPZQiF ...
X @Avi Chawla
Avi Chawla· 2025-12-05 06:31
SonarQube MCP server:(don't forget to star it ⭐)https://t.co/oqcTUdZsWE ...
X @Avi Chawla
Avi Chawla· 2025-12-05 06:31
An MCP server that detects production-grade code quality issues in real-time!Even though AI is now generating code at light speed, the engineering bottleneck has just moved from writing to reviewing, and now devs spend 90% of their debugging time on AI-generated code.AI reviewers aren't that reliable either because they share the same fundamental blind spots as AI generators do:- They pattern match, not proof check.- They validate syntax, not system behavior.- They review code, not consequences.I have been ...
X @Avi Chawla
Avi Chawla· 2025-12-04 19:38
LLM Fine-tuning Techniques - Traditional fine-tuning is impractical for LLMs due to the large number of parameters (billions) and data size (hundreds of GBs), leading to the development of parameter-efficient fine-tuning (PEFT) [1] - PEFT techniques involve finding a lower-rank adaptation of LLM weight matrices [2] Specific PEFT Techniques - **LoRA (Low-Rank Adaptation):** Adds two low-rank trainable matrices (A and B) alongside weight matrices, adjusting updates in these low-rank matrices instead of fine-tuning the original weights, significantly reducing memory usage [3] - **LoRA-FA (Frozen-A):** Freezes matrix A in LoRA and only updates matrix B, further reducing activation memory requirements [4] - **VeRA:** Freezes matrices A and B, sharing them across all layers, and learns layer-specific scaling vectors instead [4] - **Delta-LoRA:** Tunes the original weight matrix W by adding the difference (delta) between the product of matrices A and B in two consecutive training steps [4][5] - **LoRA+:** Sets a higher learning rate for matrix B compared to matrix A in LoRA, resulting in better convergence [6]
X @Avi Chawla
Avi Chawla· 2025-12-04 06:30
I have been fine-tuning LLMs for over 2 years now!Here are the top 5 LLM fine-tuning techniques, explained with visuals:First of all, what's so different about LLM finetuning?Traditional fine‑tuning is impractical for LLMs (billions of params; 100s GB).Since this kind of compute isn't accessible to everyone, parameter-efficient finetuning (PEFT) came into existence.Before we go into details of each technique, here's some background that will help you better understand these techniques:LLM weights are matric ...
X @Avi Chawla
Avi Chawla· 2025-12-03 19:06
RT Avi Chawla (@_avichawla)Bias-variance tradeoff has a missing detail!Not many ML engineers know about it.Consider fitting a polynomial regression model on some dummy dataset, say, y=sin(x) + noise.As shown in the first plot in the image, as we increase the degree (m):- The training loss will go down to zero.- The test (or validation) loss will decrease and then increase.But notice what happens as we continue to increase the degree (m):↳ Test loss decreases again (shown in the second plot)This is called th ...
X @Avi Chawla
Avi Chawla· 2025-12-03 13:19
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/pFxYUsWxlvAvi Chawla (@_avichawla):Bias-variance tradeoff has a missing detail!Not many ML engineers know about it.Consider fitting a polynomial regression model on some dummy dataset, say, y=sin(x) + noise.As shown in the first plot in the image, as we increase the degree (m):- The training loss will go https://t.co/BIdCfkZRHO ...
X @Avi Chawla
Avi Chawla· 2025-12-03 06:44
Here's the exact time stamp where Ilya Sutskever talks about it on Lex Fridman's podcast: https://t.co/HermsBI3eB https://t.co/3A3EtGpDqx ...
X @Avi Chawla
Avi Chawla· 2025-12-03 06:44
Bias-variance tradeoff has a missing detail!Not many ML engineers know about it.Consider fitting a polynomial regression model on some dummy dataset, say, y=sin(x) + noise.As shown in the first plot in the image, as we increase the degree (m):- The training loss will go down to zero.- The test (or validation) loss will decrease and then increase.But notice what happens as we continue to increase the degree (m):↳ Test loss decreases again (shown in the second plot)This is called the “double descent phenomeno ...
X @Avi Chawla
Avi Chawla· 2025-12-02 19:45
RT Avi Chawla (@_avichawla)Few people know this about L2 regularization:(Hint: it is NOT just a regularization technique)Most models intend to use L2 Regularization for just one thing:↳ Reduce overfitting.However, L2 regularization is a great remedy for multicollinearity.Multicollinearity arises when:→ Two (or more) features are highly correlated, OR,→ Two (or more) features can predict another feature.To understand how L2 regularization addresses multicollinearity, consider a dataset with two features and ...