Workflow
Language Models
icon
Search documents
X @Anthropic
Anthropic· 2025-11-04 00:32
Current language models struggle to reason in ciphered language, led by Jeff Guo.Training or prompting LLMs to obfuscate their reasoning by encoding it using simple ciphers significantly reduces their reasoning performance.https://t.co/uqTCGWqmSaJeff Guo (@Jeff_Guo_):New Anthropic research: All Code, No Thought: Current Language Models Struggle to Reason in Ciphered LanguageCan LLMs do math when thinking in ciphered text? Across 10 LLMs & 28 ciphers, they only reason accurately in simple ciphers but easily ...
New DeepSeek just did something crazy...
Matthew Berman· 2025-10-22 17:15
Deepseek just did it again. They just dropped a new paper and model DeepSseek OCR. OCR is basically image recognition.But why is that a big deal. Image recognition has been around forever, right. Well, they discovered something completely novel that has the potential to make language models, textbased models so much more powerful.Let me show you. This is the new paper from Deep Seek. Now, like I said, image recognition has been around for a long time.It's nothing special. We've seen it. It's been done a mil ...
Why we're giving AI too much credit | Morten Rand-Hendriksen | TEDxSurrey
TEDx Talks· 2025-10-16 17:01
[Music] have you ever been lied to by artificial intelligence the other day our family sat down for a home-cooked meal the food looked great and smelled delicious and tasted wrong in the weirdest way my wife usually a magician in the kitchen pulled out her phone and said look look I asked an AI what I could make with our ingredients it said this was a popular recipe why would it lie why would it lie language is fascinating build a sentence one word at a time and meaning emerges put the right words in the ri ...
Google DeepMind researchers react to Nano Banana demos 🍌
Google DeepMind· 2025-09-24 17:26
I think the fact that people surprise us with a model we built is the best idea. So, so this is like a demo with nano banana hooked up into I think it's an studio demo. It's hooked onto a canvas and you can like drag these isometric shapes around.Oh, and you're so cool. I mean, we often thought of like Nano Banana as a single tool, as a single thing, but now actually this becomes more part of a pipeline. Wait, San Francisco. They merged San Francisco, New York halfway.What. Oh, no way. Oh, wow.Is that the B ...
X @The Economist
The Economist· 2025-09-14 14:40
Market Trends - Corporate demand for small language models is projected to grow twice as fast as it is for large models [1] - The growth of small language models is starting from a much lower base [1]
X @The Economist
The Economist· 2025-09-13 14:20
Technology Trends - Small language models becoming more reliable could justify device-makers' decisions to not invest in larger models [1]
X @The Economist
The Economist· 2025-09-03 07:40
Isambard, Britain’s latest supercomputer, is not big enough to train the largest language models. It will, however, enable other research breakthroughs https://t.co/DT9HDZrmvj ...
Why LLMs are like Power Plants
Infrastructure Importance - Countries benefit from having infrastructure within their borders [1] - Power plants, such as nuclear and water power plants, are considered beneficial infrastructure [1] Language Models as Infrastructure - Language models are viewed as similar to infrastructure [1]
OpenAI Goes OPEN-SOURCE! gpt-oss is HERE!
Matthew Berman· 2025-08-05 22:09
Model Release - Open AAI 发布了最先进的开源模型 GPTOSS,包含 1200 亿参数和 200 亿参数两个版本 [1] - 这些模型是 open weight 的语言模型,意味着模型权重也被发布 [1] Performance Benchmarks - 1200 亿参数版本的 GPTOSS 在 Code Forces 竞赛中,使用工具的情况下得分为 2622,与 Frontier 模型(得分 2706)非常接近 [2] - 200 亿参数版本的 GPTOSS 在使用工具的情况下得分为 2516,考虑到其规模,表现同样出色 [2] - 这些模型在编程方面的得分超过了地球上大多数人 [2]
OpenAI Dropped a FRONTIER Open-Weights Model
Matthew Berman· 2025-08-05 17:17
Model Release & Capabilities - Open AAI released GPTOSS, state-of-the-art open-weight language models in 120 billion and 20 billion parameter versions [1] - The models outperform similarly sized open-source models on reasoning tasks and demonstrate strong tool use capabilities [3] - The models are optimized for efficient deployment on consumer hardware, with the 120 billion parameter version running efficiently on a single 80 GB GPU and the 20 billion parameter version on edge devices with 16 GB of memory [4][5] - The models excel in tool use, few-shot learning, function calling, chain of thought reasoning, and health issue diagnosis [8] - The models support context lengths of up to 128,000 tokens [12] Training & Architecture - The models were trained using a mix of reinforcement learning and techniques informed by OpenAI's most advanced internal models [3] - The models utilize a transformer architecture with a mixture of experts, reducing the number of active parameters needed to process input [10][11] - The 120 billion parameter version activates only 5 billion parameters per token, while the 20 billion parameter version activates 36 billion parameters [11][12] - The models employ alternating dense and locally banded sparse attention patterns, group multi-query attention, and RoPE for positional encoding [12] Safety & Security - OpenAI did not put any direct supervision on the chain of thought for either OSS model [21] - The models were pre-trained and filtered to remove harmful data related to chemical, biological, radiological, and nuclear data [22] - Even with robust fine-tuning, maliciously fine-tuned models were unable to reach high capability levels according to OpenAI's preparedness framework [23] - OpenAI is hosting a challenge for red teamers with $500,000 in awards to identify safety issues with the models [24]