Distillation - filings, earnings calls, financial reports, news

Distillation

Search documents

Avi Chawla· 2026-03-05 20:00

RT Avi Chawla (@_avichawla)You're in a Research Scientist interview at DeepMind.The interviewer asks:"Our investors want us to contribute to open-source.Gemini crushed benchmarks.But we'll lose competitive edge by open-sourcing it.What to do?"You: "Release a research paper."Here's what you missed:LLMs today don't just learn from raw text; they also learn from each other.For example:- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.- Gemma 2 and 3 were trained using Gemini.Distillation helps us ...

LLMs

Distillation

Soft-label distillation

Hard-label distillation

Soft-label distillation

Hard-label distillation

Co-distillation

Qwen

X @Avi Chawla

Avi Chawla· 2026-03-05 06:31

You're in a Research Scientist interview at DeepMind.The interviewer asks:"Our investors want us to contribute to open-source.Gemini crushed benchmarks.But we'll lose competitive edge by open-sourcing it.What to do?"You: "Release a research paper."Here's what you missed:LLMs today don't just learn from raw text; they also learn from each other.For example:- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.- Gemma 2 and 3 were trained using Gemini.Distillation helps us do so, and the visual expla ...

LLMs (Large Language Models)

Distillation

Soft-label distillation

Hard-label distillation

Co-distillation

Open-source contribution

LLMs (Large Language Models)

Distillation

Soft-label distillation

Hard-label distillation

Co-distillation

Open-source contribution

X @Anthropic

Anthropic· 2026-02-23 18:15

Distillation can be legitimate: AI labs use it to create smaller, cheaper models for their customers.But foreign labs that illicitly distill American models can remove safeguards, feeding model capabilities into their own military, intelligence, and surveillance systems. ...

Distillation

Artificial Intelligence

Distillation

Artificial Intelligence

X @Avi Chawla

Avi Chawla· 2025-09-29 19:20

RT Avi Chawla (@_avichawla)You're in a Research Scientist interview at OpenAI.The interviewer asks:"Our investors want us to contribute to open-source.o3 crushed benchmarks.But we can lose a competitive edge by open-sourcing it.What do we do?"You: "Release the research paper."Interview over.You forgot that LLMs don't just learn from raw text; they also learn from each other.For example:- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.- Gemma 2 and 3 were trained using Gemini.Distillation helps ...

Distillation

Open-source

Soft-label distillation

Hard-label distillation

Soft-label distillation

Hard-label distillation

Co-distillation

LLMs

X @Avi Chawla

Avi Chawla· 2025-09-29 06:33

You're in a Research Scientist interview at OpenAI.The interviewer asks:"Our investors want us to contribute to open-source.o3 crushed benchmarks.But we can lose a competitive edge by open-sourcing it.What do we do?"You: "Release the research paper."Interview over.You forgot that LLMs don't just learn from raw text; they also learn from each other.For example:- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.- Gemma 2 and 3 were trained using Gemini.Distillation helps us do so, and the visual e ...

LLMs

Open-source

Distillation

Hard-label distillation

Co-distillation

Soft-label distillation

LLMs

Open-source

Distillation

Hard-label distillation

Co-distillation

Soft-label distillation

360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI

AI Engineer· 2025-07-16 17:59

Model Building and Training - LinkedIn leverages large language models (LLMs) for personalization and ranking tasks, aiming to use one model for all tasks [2][3] - The process involves converting user information into prompts, a method called "promptification" [8] - LinkedIn builds a large foundation model, Blue XL, with 150 billion parameters, then distills it to smaller, more efficient models like a 3B model for production [12] - Distillation from a large model is more effective than training a small model from scratch [14] - Increasing data, model size (up to 8x22B), and context length can improve model performance, but longer contexts may require model adjustments [17][18][19] Model Performance and Generalization - The model improves performance for cold start users, showing a growing gap compared to production models as interactions decrease [21] - The model demonstrates generalization to new domains, performing on par with or better than task-specific production models in out-of-domain tasks [23] Model Serving and Optimization - LinkedIn focuses on model specification, pruning, and quantization to improve throughput and reduce latency for production [26] - Gradual pruning and distillation are more effective than aggressive pruning, minimizing information loss [29][30] - Mixed precision, including FP8 for activations and model parameters but FP32 for the LM head, is crucial for maintaining prediction precision [31][32] - Sparsifying attention scores can reduce latency by allowing multiple item recommendations without each item attending to each other [34][35] - LinkedIn achieved a 7x reduction in latency and a 30x increase in throughput per GPU through these optimization techniques [36]

Large Language Models

Recommendation System

Large Language Models

Recommendation System