Llama 4 Scout - filings, earnings calls, financial reports, news

Llama 4 Scout

Search documents

Avi Chawla· 2026-03-05 20:00

RT Avi Chawla (@_avichawla)You're in a Research Scientist interview at DeepMind.The interviewer asks:"Our investors want us to contribute to open-source.Gemini crushed benchmarks.But we'll lose competitive edge by open-sourcing it.What to do?"You: "Release a research paper."Here's what you missed:LLMs today don't just learn from raw text; they also learn from each other.For example:- Llama 4 Scout & Maverick were trained using Llama 4 Behemoth.- Gemma 2 and 3 were trained using Gemini.Distillation helps us ...

LLMs

Distillation

Soft-label distillation

Hard-label distillation

Soft-label distillation

Hard-label distillation

Co-distillation

Qwen

中金：：人工智能十年展望）：越过“遗忘”的边界，模型记忆的三层架构与产业机遇

中金· 2026-02-24 14:20

Investment Rating - The report maintains the profit forecasts, target prices, and ratings for relevant companies unchanged [6] Core Insights - The evolution of large models is fundamentally a history of combating "forgetting." The lack of a memory retention architecture leads to costly "repeated calculations" each time historical information is processed. The current model faces physical limits of memory walls and context windows. The report suggests that the AI infrastructure battlefield will increasingly focus on "model memory" starting in 2026 [3][14] - The report introduces a three-layer memory framework: short-term, medium-term, and long-term memory, each corresponding to different software and hardware requirements. This framework aims to provide a structured analysis paradigm for investment logic in AI infrastructure [14][20] Summary by Sections Short-term Memory - Short-term memory constitutes the "current view" of large models during single inference tasks. It is characterized by high-frequency read/write and extreme sensitivity to latency. The core challenge lies in the dual occupation of memory capacity and bandwidth by KV Cache. Software optimizations include PagedAttention virtualization and cutting-edge architectures like Infini-attention to support million-token context windows. Key hardware elements include HBM and on-chip SRAM [4][30][50] Medium-term Memory - Medium-term memory ensures situational continuity across sessions and is foundational for agents. The need for cross-session windows indicates a shift from stateless short-term intelligence to a complex system capable of "storage-retrieval-update-forget" dynamic management. Software advancements like GraphRAG and MemoryOS facilitate this transition, while hardware requirements include large-capacity DRAM and enterprise-grade SSDs to address high-concurrency random read/write bottlenecks [4][56] Long-term Memory - Long-term memory supports the transition from pre-training to "continuous evolution." The need for real-time updates blurs the lines between model training and inference. Long-term memory aims to break the limitations of pre-training cut-off times, allowing for continuous knowledge accumulation through implicit parameters, explicit semantics, and parameterized lookup tables. This new paradigm will drive demand for various databases and compute-storage hardware [5][21] Hardware and Software Requirements - The report outlines the hardware and software requirements for each memory layer, emphasizing the need for high-bandwidth memory (HBM), large-capacity DRAM, and enterprise SSDs. It also highlights the importance of software solutions like KV Cache management and advanced attention mechanisms to optimize memory usage and enhance performance [16][50][64]