Workflow
Avi Chawla
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-10-07 19:17
Technology & Performance - LLM (Large Language Model) inference speed is affected by the use of KV caching [1] - The tweet shares a resource comparing LLM inference speed with and without KV caching [1]
X @Avi Chawla
Avi Chawla· 2025-10-07 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):LLM inference speed with vs. without KV caching: https://t.co/ReVMJMteKa ...
X @Avi Chawla
Avi Chawla· 2025-10-07 06:31
The visual explains the underlying details of KV caching.I also wrote a detailed explainer thread on KV caching a few months back, if you want to learn more.Check below👇 https://t.co/e4KILO0cEeAvi Chawla (@_avichawla):KV caching in LLMs, clearly explained (with visuals): ...
X @Avi Chawla
Avi Chawla· 2025-10-07 06:31
Inference Optimization - LLM 推理速度对比,有无 KV 缓存 [1]
X @Avi Chawla
Avi Chawla· 2025-10-06 19:22
Model Training Strategy - The initial approach of capturing user images and training a binary classifier for face unlock is flawed due to the need for on-device training and the difficulty of obtaining "Class 0" samples [1][2] - A Siamese Network trained via Contrastive learning offers a more suitable solution for face unlock systems [2] - Contrastive learning maps data points to a shared embedding space, where low distance indicates similarity and high distance indicates dissimilarity [3] - The system creates a dataset of face pairs, labeling pairs of the same person as 0 and different people as 1, then trains a supervised model [3] - A neural network generates embeddings for each image, and the distance between embeddings is minimized for similar faces and maximized for dissimilar faces using contrastive loss [4] - The contrastive loss function, L = (1-y)*D^2 + y*max(0, margin-D)^2, guides the model to produce low distances for similar inputs and high distances for dissimilar inputs [5] Face Unlock System Implementation - During setup, the user's facial data generates a reference embedding, and subsequent unlocks compare new embeddings against this reference embedding without further training [6] - New identities can be added by creating additional reference embeddings [6] - During unlock, the incoming user's embedding is compared against all reference embeddings [7]
X @Avi Chawla
Avi Chawla· 2025-10-06 06:31
You're in an ML Engineer interview at Apple.The interviewer asks:"You have to build an ML-based face unlock system for iPhones.How would you train the model?"You: "I will capture user's images & train a binary classifier on them"Interview over.Here's what you missed:There are multiple issues with capturing user's images and training a clasifier.> Firstly, you'd need to on-device training, which can be expensive.> All images provided by the user will be "Class 1" samples. To train a binary classifier, where ...
X @Avi Chawla
Avi Chawla· 2025-10-05 19:18
RT Avi Chawla (@_avichawla)JSON prompting for LLMs, clearly explained: ...
X @Avi Chawla
Avi Chawla· 2025-10-05 06:30
That's a wrap!If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):JSON prompting for LLMs, clearly explained: ...
X @Avi Chawla
Avi Chawla· 2025-10-05 06:30
To summarise:Structured (JSON) prompting for LLMs is like writing modular code; it brings clarity of thought, makes adding new requirements effortless, & creates better communication with AI.It's not just a technique, but rather evolving towards a habit worth developing for cleaner AI interactions. ...
X @Avi Chawla
Avi Chawla· 2025-10-05 06:30
JSON prompting for LLMs, clearly explained: ...