Workflow
Language Models
icon
Search documents
Google DeepMind researchers react to Nano Banana demos 🍌
Google DeepMind· 2025-09-24 17:26
I think the fact that people surprise us with a model we built is the best idea. So, so this is like a demo with nano banana hooked up into I think it's an studio demo. It's hooked onto a canvas and you can like drag these isometric shapes around.Oh, and you're so cool. I mean, we often thought of like Nano Banana as a single tool, as a single thing, but now actually this becomes more part of a pipeline. Wait, San Francisco. They merged San Francisco, New York halfway.What. Oh, no way. Oh, wow.Is that the B ...
X @The Economist
The Economist· 2025-09-14 14:40
Market Trends - Corporate demand for small language models is projected to grow twice as fast as it is for large models [1] - The growth of small language models is starting from a much lower base [1]
X @The Economist
The Economist· 2025-09-13 14:20
Technology Trends - Small language models becoming more reliable could justify device-makers' decisions to not invest in larger models [1]
X @The Economist
The Economist· 2025-09-03 07:40
Isambard, Britain’s latest supercomputer, is not big enough to train the largest language models. It will, however, enable other research breakthroughs https://t.co/DT9HDZrmvj ...
Why LLMs are like Power Plants
Infrastructure Importance - Countries benefit from having infrastructure within their borders [1] - Power plants, such as nuclear and water power plants, are considered beneficial infrastructure [1] Language Models as Infrastructure - Language models are viewed as similar to infrastructure [1]
OpenAI Goes OPEN-SOURCE! gpt-oss is HERE!
Matthew Berman· 2025-08-05 22:09
Model Release - Open AAI 发布了最先进的开源模型 GPTOSS,包含 1200 亿参数和 200 亿参数两个版本 [1] - 这些模型是 open weight 的语言模型,意味着模型权重也被发布 [1] Performance Benchmarks - 1200 亿参数版本的 GPTOSS 在 Code Forces 竞赛中,使用工具的情况下得分为 2622,与 Frontier 模型(得分 2706)非常接近 [2] - 200 亿参数版本的 GPTOSS 在使用工具的情况下得分为 2516,考虑到其规模,表现同样出色 [2] - 这些模型在编程方面的得分超过了地球上大多数人 [2]
OpenAI Dropped a FRONTIER Open-Weights Model
Matthew Berman· 2025-08-05 17:17
Model Release & Capabilities - Open AAI released GPTOSS, state-of-the-art open-weight language models in 120 billion and 20 billion parameter versions [1] - The models outperform similarly sized open-source models on reasoning tasks and demonstrate strong tool use capabilities [3] - The models are optimized for efficient deployment on consumer hardware, with the 120 billion parameter version running efficiently on a single 80 GB GPU and the 20 billion parameter version on edge devices with 16 GB of memory [4][5] - The models excel in tool use, few-shot learning, function calling, chain of thought reasoning, and health issue diagnosis [8] - The models support context lengths of up to 128,000 tokens [12] Training & Architecture - The models were trained using a mix of reinforcement learning and techniques informed by OpenAI's most advanced internal models [3] - The models utilize a transformer architecture with a mixture of experts, reducing the number of active parameters needed to process input [10][11] - The 120 billion parameter version activates only 5 billion parameters per token, while the 20 billion parameter version activates 36 billion parameters [11][12] - The models employ alternating dense and locally banded sparse attention patterns, group multi-query attention, and RoPE for positional encoding [12] Safety & Security - OpenAI did not put any direct supervision on the chain of thought for either OSS model [21] - The models were pre-trained and filtered to remove harmful data related to chemical, biological, radiological, and nuclear data [22] - Even with robust fine-tuning, maliciously fine-tuned models were unable to reach high capability levels according to OpenAI's preparedness framework [23] - OpenAI is hosting a challenge for red teamers with $500,000 in awards to identify safety issues with the models [24]
X @Anthropic
Anthropic· 2025-08-01 16:23
New Anthropic research: Persona vectors.Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination. https://t.co/PPX1oXj9SQ ...
X @Anthropic
Anthropic· 2025-07-22 16:32
Model Behavior & Transfer Learning - Language models can transfer traits to other models through seemingly meaningless data [1] - LLMs can transmit traits to other models via hidden signals in data [2] - Datasets consisting only of 3-digit numbers can transmit specific preferences or tendencies [2]
What every AI engineer needs to know about GPUs — Charles Frye, Modal
AI Engineer· 2025-07-20 07:00
AI Engineering & GPU Utilization - AI engineering is shifting towards tighter integration and self-hosting of language models, increasing the need to understand GPU hardware [6][7] - The industry should focus on high bandwidth, not low latency, when utilizing GPUs [8] - GPUs optimize for math bandwidth over memory bandwidth, emphasizing computational operations [9] - Low precision matrix matrix multiplications are key to fully utilizing GPU potential [10] - Tensor cores, specialized for low precision matrix matrix multiplication, are crucial for efficient GPU usage [6][37] Hardware & Performance - GPUs achieve parallelism significantly exceeding CPUs, with the Nvidia H100 SXM GPU capable of over 16,000 parallel threads at 5 cents per thread, compared to AMD Epic CPU's two threads per core at approximately 1 watt per thread [20][21] - GPUs offer faster context switching compared to CPUs, happening every clock cycle [23] - Bandwidth improvement increases at the square of latency improvement, favoring bandwidth-oriented hardware [25][26] Model Optimization - Small models can be more hardware-sympathetic, potentially matching the quality of larger models with techniques like verification and multiple generations [32][33] - Multi-token prediction and multi-sample queries can become nearly "free" due to tensor core capabilities [36] - Generating multiple samples or tokens can improve performance by leveraging matrix matrix operations [39]