LLMs
Search documents
X @Avi Chawla
Avi Chawla· 2025-08-11 06:31
General Overview - The document is a wrap-up message encouraging readers to reshare the content if they found it insightful [1] - It promotes tutorials and insights on Data Science (DS), Machine Learning (ML), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAGs) [1] Call to Action - The author, Avi Chawla (@_avichawla), invites readers to find him for more content [1] Specific Topic - The document mentions fine-tuning OpenAI gpt-oss (100% locally) [1]
The Future of Evals - Ankur Goyal, Braintrust
AI Engineer· 2025-08-09 15:12
Product & Technology - Brain Trust introduces "Loop," an agent integrated into its platform designed to automate and improve prompts, datasets, and scorers for AI model evaluation [4][5][7] - Loop leverages advancements in frontier models, particularly noting Claude 4's significant improvement (6x better) in prompt engineering capabilities compared to previous models [6] - Loop allows users to compare suggested edits to data and prompts side-by-side within the UI, maintaining data visibility [9][10] - Loop supports various models, including OpenAI, Gemini, and custom LLMs [9] User Engagement & Adoption - The average organization using Brain Trust runs approximately 13 evaluations (EVELs) per day [3] - Some advanced customers are running over 3,000 evaluations daily and spending more than two hours per day using the product [3] - Brain Trust encourages users to try Loop and provide feedback [12] Future Vision - Brain Trust anticipates a revolution in AI model evaluation, driven by advancements in frontier models [11] - The company is focused on incorporating these advancements into its platform [11] Hiring - Brain Trust is actively hiring for UI, AI, and infrastructure roles [12]
X @Avi Chawla
Avi Chawla· 2025-08-08 06:34
RAG技术应用 - 企业正在构建基于超过 100 个数据源的 RAG 系统 [1] - Microsoft 在 M365 产品中提供 RAG 技术 [1] - Google 在 Vertex AI Search 中提供 RAG 技术 [1] - AWS 在 Amazon Q Business 中提供 RAG 技术 [1] 技术趋势 - 行业正在构建基于 MCP 驱动的 RAG 系统,数据源超过 200 个,并且 100% 本地化 [1]
X @Avi Chawla
Avi Chawla· 2025-08-08 06:34
In this demo, we used mcp-use.It lets us connect LLMs to MCP servers & build local MCP clients in a few lines of code.- Compatible with Ollama & LangChain- Stream Agent output async- Built-in debugging mode, etcRepo: https://t.co/PWcuwMFvzi(don't forget to star ⭐) ...
X @Avi Chawla
Avi Chawla· 2025-08-06 19:13
AI Engineering Resources - The document provides 12 cheat sheets for AI engineers covering various topics [1] - The cheat sheets include visuals to aid understanding [1] Key AI Topics Covered - Function calling & MCP (likely Mean Cumulative Probability) for LLMs (Large Language Models) is covered [1] - The cheat sheets detail 4 stages of training LLMs from scratch [1] - Training LLMs using other LLMs is explained [1] - Supervised & Reinforcement fine-tuning techniques are included [1] - RAG (Retrieval-Augmented Generation) vs Agentic RAG is differentiated [1]
Evals Are Not Unit Tests — Ido Pesok, Vercel v0
AI Engineer· 2025-08-06 16:14
Key Takeaways on LLM Evaluation - LLMs can be unreliable, impacting user experience and application usability [6] - AI applications are prone to failure in production despite successful demos [7] - It is crucial to build reliable software using LLMs through methods like prompt engineering [8] Evaluation Strategies and Best Practices - Evals should focus on relevant user queries and avoid out-of-bounds scenarios [19] - Data collection methods include thumbs up/down feedback, log analysis, and community forums [21][22][23] - Evals should test across the entire data distribution to understand system performance [20][24] - Constants should be factored into data, and variables into tasks for clarity and reuse [25][26] - Evaluation scores should be deterministic and simple for easier debugging and team collaboration [29][30] - Evals should be integrated into CI pipelines to detect improvements and regressions [34][35] Vercel's Perspective - Vercel's Vzero is a full-stack web coding platform designed for rapid prototyping and building [1] - Vzero recently launched GitHub sync, enabling code push and pull directly from the platform [2] - Vercel emphasizes the importance of continuous evaluation to improve AI app reliability and quality [37] - Vercel has reached 100 million messages sent on Vzero [2]
X @Sam Altman
Sam Altman· 2025-08-05 17:27
Model Release - Company releases two open-weight LLMs: gpt-oss-120b (120 billion parameters) and gpt-oss-20b (20 billion parameters) [1] - The models demonstrate strong performance and agentic tool use [1] Safety Analysis - Company conducted a safety analysis by fine-tuning the models to maximize their bio and cyber capabilities [1]
SEMrush (SEMR) - 2025 Q2 - Earnings Call Transcript
2025-08-05 13:30
Financial Data and Key Metrics Changes - Revenue for the quarter was $108.9 million, representing a 20% year-over-year growth [4][13] - Non-GAAP operating margin was 11%, down approximately 240 basis points year-over-year due to a weaker U.S. Dollar [16][22] - Annual recurring revenue (ARR) grew 15.3% year-over-year to $435.3 million, with average ARR per paying customer increasing to $3,756, marking over 15% growth compared to the same quarter last year [17][18] Business Line Data and Key Metrics Changes - The Enterprise segment is now the largest contributor to overall company growth, with enterprise SEO solutions growing to 260 customers and an average ARR of approximately $60,000 [4][5] - The AI Toolkit, launched at the end of Q1, became the fastest-growing product in the company's history, achieving $3 million in ARR within a few months [6][8] - ARR from enterprise and AI products is expected to approach $50 million by the end of the year [8][19] Market Data and Key Metrics Changes - Approximately 116,000 paying customers were reported, down sequentially from the prior quarter, primarily due to softness among freelancers and less sophisticated customer segments [14] - Dollar-based net revenue retention was 105%, with strong retention in the Enterprise segment consistently above 120% [14][19] Company Strategy and Development Direction - The company is focusing on high-growth areas, specifically enterprise and AI search, reallocating resources away from lower-value customer segments [9][20] - A strategic decision was made to not increase marketing spend in response to rising customer acquisition costs in the lower end of the market, instead prioritizing investments in enterprise and AI products [9][20] - The company announced a $150 million share repurchase program, reflecting confidence in its business and valuation [25] Management's Comments on Operating Environment and Future Outlook - Management expressed optimism about the growth potential in the enterprise and AI segments, despite experiencing softness in the lower end of the market [10][12] - The company believes that the shift to AI and LLMs (Large Language Models) presents significant opportunities for growth [11][12] - Management anticipates that the current pressures in the lower end of the market are temporary and expects stabilization in the future [36][64] Other Important Information - The company adjusted its full-year 2025 revenue guidance to a range of $443 million to $446 million, reflecting approximately 18% growth at the midpoint [21] - The non-GAAP operating margin guidance remains at 12%, despite the reduced revenue outlook and foreign exchange headwinds [21][24] Q&A Session Summary Question: Pressures in the low-end customer segment - Management indicated that the pressures are fairly contained to freelancers and less sophisticated customers, primarily impacted by rising cost per click [28][29] Question: Liquidity of the stock and buyback program - The share repurchase program is seen as a way to express confidence in the company's future potential and momentum in enterprise and AI [30][32] Question: Down market weakness and macro factors - Management believes the weakness is contained to the low-end segment and not reflective of broader macroeconomic conditions [36][38] Question: Customer acquisition costs and market dynamics - The increase in customer acquisition costs is primarily affecting the low-end segment, while other segments continue to perform well [51][56] Question: Future trajectory of the low-end customer base - Management expects stabilization in the low-end segment, with ongoing strength in the SMB and enterprise segments [62][64]
X @Avi Chawla
Avi Chawla· 2025-08-05 06:35
LLM Evaluation - The industry is focusing on evaluating conversational LLM applications like ChatGPT in a multi-turn context [1] - Unlike single-turn tasks, conversations require LLMs to maintain consistency, compliance, and context-awareness across multiple messages [1] Key Considerations - LLM behavior should be consistent, compliant, and context-aware across turns, not just accurate in one-shot output [1]