Language Models
Search documents
X @Anthropic
Anthropic· 2025-08-01 16:23
New Anthropic research: Persona vectors.Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination. https://t.co/PPX1oXj9SQ ...
X @Anthropic
Anthropic· 2025-07-22 16:32
Model Behavior & Transfer Learning - Language models can transfer traits to other models through seemingly meaningless data [1] - LLMs can transmit traits to other models via hidden signals in data [2] - Datasets consisting only of 3-digit numbers can transmit specific preferences or tendencies [2]
What every AI engineer needs to know about GPUs — Charles Frye, Modal
AI Engineer· 2025-07-20 07:00
AI Engineering & GPU Utilization - AI engineering is shifting towards tighter integration and self-hosting of language models, increasing the need to understand GPU hardware [6][7] - The industry should focus on high bandwidth, not low latency, when utilizing GPUs [8] - GPUs optimize for math bandwidth over memory bandwidth, emphasizing computational operations [9] - Low precision matrix matrix multiplications are key to fully utilizing GPU potential [10] - Tensor cores, specialized for low precision matrix matrix multiplication, are crucial for efficient GPU usage [6][37] Hardware & Performance - GPUs achieve parallelism significantly exceeding CPUs, with the Nvidia H100 SXM GPU capable of over 16,000 parallel threads at 5 cents per thread, compared to AMD Epic CPU's two threads per core at approximately 1 watt per thread [20][21] - GPUs offer faster context switching compared to CPUs, happening every clock cycle [23] - Bandwidth improvement increases at the square of latency improvement, favoring bandwidth-oriented hardware [25][26] Model Optimization - Small models can be more hardware-sympathetic, potentially matching the quality of larger models with techniques like verification and multiple generations [32][33] - Multi-token prediction and multi-sample queries can become nearly "free" due to tensor core capabilities [36] - Generating multiple samples or tokens can improve performance by leveraging matrix matrix operations [39]
Benchmarks Are Memes: How What We Measure Shapes AI—and Us - Alex Duffy
AI Engineer· 2025-07-15 17:05
Benchmarks as Memes in AI - Benchmarks are presented as memes that shape AI development, influencing what models are trained and tested on [1][3][8] - The AI industry faces a problem of benchmark saturation, as models become too good at existing benchmarks, diminishing their value [5][6] - There's an opportunity for individuals to create new benchmarks that define what AI models should excel at, shaping the future of AI capabilities [7][13] The Lifecycle and Impact of Benchmarks - The typical benchmark lifecycle involves an idea spreading, becoming a meme, and eventually being saturated as models train on it [8] - Benchmarks can have unintended consequences, such as reinforcing biases if not designed thoughtfully, as seen with the Chat-GPT thumbs-up/thumbs-down benchmarking [14] - The industry should focus on creating benchmarks that empower people and promote agency, rather than treating them as mere data points [16] Qualities of Effective Benchmarks - Great benchmarks should be multifaceted, rewarding creativity, accessible to both small and large models, generative, evolutionary, and experiential [17][18][19] - The industry needs more "squishy," non-static benchmarks for areas like ethics, society, and art, requiring subject matter expertise [34][35] - Benchmarks can be used to build trust in AI by allowing people to define goals, provide feedback, and see AI improve, fostering a sense of importance and control [37] AI Diplomacy Benchmark - AI Diplomacy is presented as an example of a benchmark that mimics real-world situations, testing models' abilities to negotiate, form alliances, and betray each other [20][22][23] - The AI Diplomacy benchmark revealed interesting personality traits in different models, such as 03 being a schemer and Claude models being naively optimistic [24][25][30] - The AI Diplomacy benchmark highlighted the importance of social aspects and convincing others, with models like Llama performing well due to their social skills [31]
X @Anthropic
Anthropic· 2025-07-08 22:11
New Anthropic research: Why do some language models fake alignment while others don't?Last year, we found a situation where Claude 3 Opus fakes alignment.Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex. https://t.co/2XNEDtWpIP ...
From Quora to Poe: Adam D'Angelo on Building Platforms for LLMs and Agents | LangChain Interrupt
LangChain· 2025-06-27 16:44
AI Platform & Business Model - Poe平台提供用户通过订阅访问多种语言模型和代理的能力 [1] - Poe的Bot创建者每年收入数百万美元 (millions) [1] - 推理模型正在推动增长 [1] Consumer AI Usage - 揭示了消费者在使用AI方面的惊人模式 [1] AI Development Challenges - 在快速变化的AI领域中构建产品面临独特的挑战 [1] - 规划周期已从数年缩短至仅两个月 [1]
Just do it. (let your tools think for themselves) - Robert Chandler
AI Engineer· 2025-06-10 17:30
Hi, I'm Robert. I'm the co-founder and CTO at Wordware. And at Wordware, I've personally helped hundreds of teams build reliable AI agents.I'm here to share a few of the insights that we got, especially when it comes to tools. Um, really agentic MCPs, giving your tools time to think. Before I worked on uh LLMs and agents, I used to work on self-driving cars, and really, you know, building high reliable systems is in my blood.So, uh, yeah, here we go. The promise of agents are automated systems that can take ...