LLMs

Search documents
X @Sam Altman
Sam Altman· 2025-08-05 17:27
Model Release - Company releases two open-weight LLMs: gpt-oss-120b (120 billion parameters) and gpt-oss-20b (20 billion parameters) [1] - The models demonstrate strong performance and agentic tool use [1] Safety Analysis - Company conducted a safety analysis by fine-tuning the models to maximize their bio and cyber capabilities [1]
SEMrush (SEMR) - 2025 Q2 - Earnings Call Transcript
2025-08-05 13:30
Financial Data and Key Metrics Changes - Revenue for the quarter was $108.9 million, representing a 20% year-over-year growth [4][13] - Non-GAAP operating margin was 11%, down approximately 240 basis points year-over-year due to a weaker U.S. Dollar [16][22] - Annual recurring revenue (ARR) grew 15.3% year-over-year to $435.3 million, with average ARR per paying customer increasing to $3,756, marking over 15% growth compared to the same quarter last year [17][18] Business Line Data and Key Metrics Changes - The Enterprise segment is now the largest contributor to overall company growth, with enterprise SEO solutions growing to 260 customers and an average ARR of approximately $60,000 [4][5] - The AI Toolkit, launched at the end of Q1, became the fastest-growing product in the company's history, achieving $3 million in ARR within a few months [6][8] - ARR from enterprise and AI products is expected to approach $50 million by the end of the year [8][19] Market Data and Key Metrics Changes - Approximately 116,000 paying customers were reported, down sequentially from the prior quarter, primarily due to softness among freelancers and less sophisticated customer segments [14] - Dollar-based net revenue retention was 105%, with strong retention in the Enterprise segment consistently above 120% [14][19] Company Strategy and Development Direction - The company is focusing on high-growth areas, specifically enterprise and AI search, reallocating resources away from lower-value customer segments [9][20] - A strategic decision was made to not increase marketing spend in response to rising customer acquisition costs in the lower end of the market, instead prioritizing investments in enterprise and AI products [9][20] - The company announced a $150 million share repurchase program, reflecting confidence in its business and valuation [25] Management's Comments on Operating Environment and Future Outlook - Management expressed optimism about the growth potential in the enterprise and AI segments, despite experiencing softness in the lower end of the market [10][12] - The company believes that the shift to AI and LLMs (Large Language Models) presents significant opportunities for growth [11][12] - Management anticipates that the current pressures in the lower end of the market are temporary and expects stabilization in the future [36][64] Other Important Information - The company adjusted its full-year 2025 revenue guidance to a range of $443 million to $446 million, reflecting approximately 18% growth at the midpoint [21] - The non-GAAP operating margin guidance remains at 12%, despite the reduced revenue outlook and foreign exchange headwinds [21][24] Q&A Session Summary Question: Pressures in the low-end customer segment - Management indicated that the pressures are fairly contained to freelancers and less sophisticated customers, primarily impacted by rising cost per click [28][29] Question: Liquidity of the stock and buyback program - The share repurchase program is seen as a way to express confidence in the company's future potential and momentum in enterprise and AI [30][32] Question: Down market weakness and macro factors - Management believes the weakness is contained to the low-end segment and not reflective of broader macroeconomic conditions [36][38] Question: Customer acquisition costs and market dynamics - The increase in customer acquisition costs is primarily affecting the low-end segment, while other segments continue to perform well [51][56] Question: Future trajectory of the low-end customer base - Management expects stabilization in the low-end segment, with ongoing strength in the SMB and enterprise segments [62][64]
X @Avi Chawla
Avi Chawla· 2025-08-05 06:35
LLM Evaluation - The industry is focusing on evaluating conversational LLM applications like ChatGPT in a multi-turn context [1] - Unlike single-turn tasks, conversations require LLMs to maintain consistency, compliance, and context-awareness across multiple messages [1] Key Considerations - LLM behavior should be consistent, compliant, and context-aware across turns, not just accurate in one-shot output [1]
X @Demis Hassabis
Demis Hassabis· 2025-08-04 18:26
To kick off, @Kaggle is hosting a 3-day exhibition chess tournament with matches between some of the top LLMs - w/commentary from chess legends @MagnusCarlsen, @GMHikaru, @GothamChess. Tune in at 10:30am PT starting tmrw (Aug 5th), should be a lot of fun: https://t.co/PNTk1vLlp2 ...
X @Demis Hassabis
Demis Hassabis· 2025-08-04 18:26
Thrilled to announce the @Kaggle Game Arena, a new leaderboard testing how modern LLMs perform on games (spoiler: not very well atm!). AI systems play each other, making it an objective & evergreen benchmark that will scale in difficulty as they improve.https://t.co/0e2dF2pbtX ...
X @CoinGecko
CoinGecko· 2025-08-04 07:20
Product Features - CoinGecko MCP enables LLMs to access real-time market data, including token prices, market capitalization, and trading volume [1] - The guide details the features of CoinGecko's MCP, setup instructions, and use cases for enhancing crypto research [1]
Vision AI in 2025 — Peter Robicheaux, Roboflow
AI Engineer· 2025-08-03 17:45
AI Vision Challenges & Opportunities - Computer vision lags behind human vision and language models in intelligence and leveraging big pre-training [3][8][11] - Current vision evaluations like ImageNet and COCO are saturated and primarily measure pattern matching, hindering the development of true visual intelligence [5][22] - Vision models struggle with tasks requiring visual understanding, such as determining the time on a watch or understanding spatial relationships in images [9][10] - Vision-language pre-training, exemplified by CLIP, may fail to capture subtle visual details not explicitly included in image captions [14][15] Rooflow's Solution & Innovation - Rooflow introduces RF DTOR, a real-time object detection model leveraging the Dinov2 pre-trained backbone to address the underutilization of large pre-trainings in visual models [20] - Rooflow created R100VL, a new dataset comprising 100 diverse object detection datasets, to better measure the intelligence and domain adaptability of visual models [24][25] - R100VL includes challenging domains like aerial imagery, microscopy, and X-rays, and incorporates visual language tasks to assess contextual understanding [25][26][27][28][29] - Rooflow's benchmark reveals that current vision language models struggle to generalize in the visual domain compared to the linguistic domain [30] - Fine-tuning a YOLO V8 nano model from scratch on 10-shot examples performs better than zero-shot Grounding DINO on R100VL, highlighting the need for improved visual generalization [30][36][37] Industry Trends & Future Directions - Transformers are proving more effective than convolutional models in leveraging large pre-training datasets for vision tasks [18] - The scale of pre-training in the vision world is significantly smaller compared to the language world, indicating room for growth [19] - Rooflow makes its platform freely available to researchers, encouraging open-source data contributions to the community [33]
Using LLMs Instead of Government Consulting
Y Combinator· 2025-08-03 15:54
Government Consulting Market & Trends - US government spends hundreds of billions of dollars annually on consulting [1] - Political pressure exists to cut wasteful consulting and spending [1] - Government increasingly relies on software, often custom-built [1] LLM Impact & Opportunities - LLMs are capable of performing tasks currently done by consulting firms [2] - Funding is being directed towards startups assisting with government sales approvals (Fed Ramp) [2][3] - Funding is also supporting companies using LLMs to improve government regulation and policy legality [3] Investment Focus - The company aims to fund startups developing LLM software for government consulting tasks [3]
Alphabet: Why An Antitrust Breakup Is Good
Seeking Alpha· 2025-08-02 14:21
Core Viewpoint - Alphabet's defeat in antitrust court and the perceived threat from large language models (LLMs) to its search engine advertising revenue contribute to a narrative of an existential crisis for the company [1] Group 1: Antitrust Issues - Alphabet has faced a significant defeat in antitrust court, which raises concerns about its market position and regulatory challenges [1] Group 2: Impact of LLMs - The rise of LLMs is viewed as potentially positive for the industry, suggesting that these technologies could enhance overall market dynamics rather than pose a direct threat to Alphabet [1]
The 2025 AI Engineering Report — Barr Yaron, Amplify
AI Engineer· 2025-08-01 22:51
AI Engineering Landscape - The AI engineering community is broad, technical, and growing, with the "AI Engineer" title expected to gain more ground [5] - Many seasoned software developers are AI newcomers, with nearly half of those with 10+ years of experience having worked with AI for three years or less [7] LLM Usage and Customization - Over half of respondents are using LLMs for both internal and external use cases, with OpenAI models dominating external, customer-facing applications [8] - LLM users are leveraging them across multiple use cases, with 94% using them for at least two and 82% for at least three [9] - Retrieval-Augmented Generation (RAG) is the most popular customization method, with 70% of respondents using it [10] - Parameter-efficient fine-tuning methods like LoRA/Q-LoRA are strongly preferred, mentioned by 40% of fine-tuners [12] Model and Prompt Management - Over 50% of respondents are updating their models at least monthly, with 17% doing so weekly [14] - 70% of respondents are updating prompts at least monthly, and 10% are doing so daily [14] - A significant 31% of respondents lack any system for managing their prompts [15] Multimodal AI and Agents - Image, video, and audio usage lag text usage significantly, indicating a "multimodal production gap" [16][17] - Audio has the highest intent to adopt among those not currently using it, with 37% planning to eventually adopt audio [18] - While 80% of respondents say LLMs are working well, less than 20% say the same about agents [20] Monitoring and Evaluation - Most respondents use multiple methods to monitor their AI systems, with 60% using standard observability and over 50% relying on offline evaluation [22] - Human review remains the most popular method for evaluating model and system accuracy and quality [23] - 65% of respondents are using a dedicated vector database [24] Industry Outlook - The mean guess for the percentage of the US Gen Z population that will have AI girlfriends/boyfriends is 26% [27] - Evaluation is the number one most painful thing about AI engineering today [28]