Language Models

Search documents
OpenAI Goes OPEN-SOURCE! gpt-oss is HERE!
Matthew Berman· 2025-08-05 22:09
Open AAI has delivered on their promise to release a state-of-the-art open-source model. This is GPTOSS. It comes in two sizes, a 120 billion parameter version and a 20 billion parameter version.These are state-of-the-art openweight language models. Open weight. So, not just open- source, but they are actually releasing the weights to these models.Now, for some benchmarks, here is the code forces competition. Now the 120 billion parameter version with tools scores a 2622. That is compared to 03 a frontier m ...
OpenAI Dropped a FRONTIER Open-Weights Model
Matthew Berman· 2025-08-05 17:17
Open AAI has delivered on their promise to release a state-of-the-art open-source model. This is GPTOSS. Now, I think the mystery model Horizon Alpha that was on Open Router is actually this open source model from OpenAI, although they have not confirmed that to me, but we do have an incredible model.Let me tell you about all the details. So, first it comes in two sizes, a 120 billion parameter version and a 20 billion parameter version. These are state-of-the-art openweight language models.Open weight. So ...
X @Anthropic
Anthropic· 2025-08-01 16:23
New Anthropic research: Persona vectors.Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination. https://t.co/PPX1oXj9SQ ...
X @Anthropic
Anthropic· 2025-07-22 16:32
In a joint paper with @OwainEvans_UK as part of the Anthropic Fellows Program, we study a surprising phenomenon: subliminal learning.Language models can transmit their traits to other models, even in what appears to be meaningless data.https://t.co/oeRbosmsbHOwain Evans (@OwainEvans_UK):New paper & surprising result.LLMs transmit traits to other models via hidden signals in data.Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵 https://t.co/ewIxfzXOe3 ...
What every AI engineer needs to know about GPUs — Charles Frye, Modal
AI Engineer· 2025-07-20 07:00
[Music] So um what I wanted to talk about today was uh what every AI engineer needs to know about GPUs. the like so far in the last couple of years um most of the things that people have built as AI applications people who are AI engineers they've been building on top of model APIs so they use the open AI API the anthropic API the deepseek API and they build an application on top of that and that goes back to kind of like the initial diagram that Swix put out the like AI like rise of the AI engineer thing u ...
Benchmarks Are Memes: How What We Measure Shapes AI—and Us - Alex Duffy
AI Engineer· 2025-07-15 17:05
Benchmarks as Memes in AI - Benchmarks are presented as memes that shape AI development, influencing what models are trained and tested on [1][3][8] - The AI industry faces a problem of benchmark saturation, as models become too good at existing benchmarks, diminishing their value [5][6] - There's an opportunity for individuals to create new benchmarks that define what AI models should excel at, shaping the future of AI capabilities [7][13] The Lifecycle and Impact of Benchmarks - The typical benchmark lifecycle involves an idea spreading, becoming a meme, and eventually being saturated as models train on it [8] - Benchmarks can have unintended consequences, such as reinforcing biases if not designed thoughtfully, as seen with the Chat-GPT thumbs-up/thumbs-down benchmarking [14] - The industry should focus on creating benchmarks that empower people and promote agency, rather than treating them as mere data points [16] Qualities of Effective Benchmarks - Great benchmarks should be multifaceted, rewarding creativity, accessible to both small and large models, generative, evolutionary, and experiential [17][18][19] - The industry needs more "squishy," non-static benchmarks for areas like ethics, society, and art, requiring subject matter expertise [34][35] - Benchmarks can be used to build trust in AI by allowing people to define goals, provide feedback, and see AI improve, fostering a sense of importance and control [37] AI Diplomacy Benchmark - AI Diplomacy is presented as an example of a benchmark that mimics real-world situations, testing models' abilities to negotiate, form alliances, and betray each other [20][22][23] - The AI Diplomacy benchmark revealed interesting personality traits in different models, such as 03 being a schemer and Claude models being naively optimistic [24][25][30] - The AI Diplomacy benchmark highlighted the importance of social aspects and convincing others, with models like Llama performing well due to their social skills [31]
X @Anthropic
Anthropic· 2025-07-08 22:11
New Anthropic research: Why do some language models fake alignment while others don't?Last year, we found a situation where Claude 3 Opus fakes alignment.Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex. https://t.co/2XNEDtWpIP ...
From Quora to Poe: Adam D'Angelo on Building Platforms for LLMs and Agents | LangChain Interrupt
LangChain· 2025-06-27 16:44
AI Platform & Business Model - Poe平台提供用户通过订阅访问多种语言模型和代理的能力 [1] - Poe的Bot创建者每年收入数百万美元 (millions) [1] - 推理模型正在推动增长 [1] Consumer AI Usage - 揭示了消费者在使用AI方面的惊人模式 [1] AI Development Challenges - 在快速变化的AI领域中构建产品面临独特的挑战 [1] - 规划周期已从数年缩短至仅两个月 [1]
Just do it. (let your tools think for themselves) - Robert Chandler
AI Engineer· 2025-06-10 17:30
Hi, I'm Robert. I'm the co-founder and CTO at Wordware. And at Wordware, I've personally helped hundreds of teams build reliable AI agents.I'm here to share a few of the insights that we got, especially when it comes to tools. Um, really agentic MCPs, giving your tools time to think. Before I worked on uh LLMs and agents, I used to work on self-driving cars, and really, you know, building high reliable systems is in my blood.So, uh, yeah, here we go. The promise of agents are automated systems that can take ...