Workflow
LLMs
icon
Search documents
X @Demis Hassabis
Demis Hassabis· 2025-08-04 18:26
To kick off, @Kaggle is hosting a 3-day exhibition chess tournament with matches between some of the top LLMs - w/commentary from chess legends @MagnusCarlsen, @GMHikaru, @GothamChess. Tune in at 10:30am PT starting tmrw (Aug 5th), should be a lot of fun: https://t.co/PNTk1vLlp2 ...
X @Demis Hassabis
Demis Hassabis· 2025-08-04 18:26
Thrilled to announce the @Kaggle Game Arena, a new leaderboard testing how modern LLMs perform on games (spoiler: not very well atm!). AI systems play each other, making it an objective & evergreen benchmark that will scale in difficulty as they improve.https://t.co/0e2dF2pbtX ...
X @CoinGecko
CoinGecko· 2025-08-04 07:20
Product Features - CoinGecko MCP enables LLMs to access real-time market data, including token prices, market capitalization, and trading volume [1] - The guide details the features of CoinGecko's MCP, setup instructions, and use cases for enhancing crypto research [1]
Vision AI in 2025 — Peter Robicheaux, Roboflow
AI Engineer· 2025-08-03 17:45
AI Vision Challenges & Opportunities - Computer vision lags behind human vision and language models in intelligence and leveraging big pre-training [3][8][11] - Current vision evaluations like ImageNet and COCO are saturated and primarily measure pattern matching, hindering the development of true visual intelligence [5][22] - Vision models struggle with tasks requiring visual understanding, such as determining the time on a watch or understanding spatial relationships in images [9][10] - Vision-language pre-training, exemplified by CLIP, may fail to capture subtle visual details not explicitly included in image captions [14][15] Rooflow's Solution & Innovation - Rooflow introduces RF DTOR, a real-time object detection model leveraging the Dinov2 pre-trained backbone to address the underutilization of large pre-trainings in visual models [20] - Rooflow created R100VL, a new dataset comprising 100 diverse object detection datasets, to better measure the intelligence and domain adaptability of visual models [24][25] - R100VL includes challenging domains like aerial imagery, microscopy, and X-rays, and incorporates visual language tasks to assess contextual understanding [25][26][27][28][29] - Rooflow's benchmark reveals that current vision language models struggle to generalize in the visual domain compared to the linguistic domain [30] - Fine-tuning a YOLO V8 nano model from scratch on 10-shot examples performs better than zero-shot Grounding DINO on R100VL, highlighting the need for improved visual generalization [30][36][37] Industry Trends & Future Directions - Transformers are proving more effective than convolutional models in leveraging large pre-training datasets for vision tasks [18] - The scale of pre-training in the vision world is significantly smaller compared to the language world, indicating room for growth [19] - Rooflow makes its platform freely available to researchers, encouraging open-source data contributions to the community [33]
Using LLMs Instead of Government Consulting
Y Combinator· 2025-08-03 15:54
Government Consulting Market & Trends - US government spends hundreds of billions of dollars annually on consulting [1] - Political pressure exists to cut wasteful consulting and spending [1] - Government increasingly relies on software, often custom-built [1] LLM Impact & Opportunities - LLMs are capable of performing tasks currently done by consulting firms [2] - Funding is being directed towards startups assisting with government sales approvals (Fed Ramp) [2][3] - Funding is also supporting companies using LLMs to improve government regulation and policy legality [3] Investment Focus - The company aims to fund startups developing LLM software for government consulting tasks [3]
Alphabet: Why An Antitrust Breakup Is Good
Seeking Alpha· 2025-08-02 14:21
Core Viewpoint - Alphabet's defeat in antitrust court and the perceived threat from large language models (LLMs) to its search engine advertising revenue contribute to a narrative of an existential crisis for the company [1] Group 1: Antitrust Issues - Alphabet has faced a significant defeat in antitrust court, which raises concerns about its market position and regulatory challenges [1] Group 2: Impact of LLMs - The rise of LLMs is viewed as potentially positive for the industry, suggesting that these technologies could enhance overall market dynamics rather than pose a direct threat to Alphabet [1]
The 2025 AI Engineering Report — Barr Yaron, Amplify
AI Engineer· 2025-08-01 22:51
AI Engineering Landscape - The AI engineering community is broad, technical, and growing, with the "AI Engineer" title expected to gain more ground [5] - Many seasoned software developers are AI newcomers, with nearly half of those with 10+ years of experience having worked with AI for three years or less [7] LLM Usage and Customization - Over half of respondents are using LLMs for both internal and external use cases, with OpenAI models dominating external, customer-facing applications [8] - LLM users are leveraging them across multiple use cases, with 94% using them for at least two and 82% for at least three [9] - Retrieval-Augmented Generation (RAG) is the most popular customization method, with 70% of respondents using it [10] - Parameter-efficient fine-tuning methods like LoRA/Q-LoRA are strongly preferred, mentioned by 40% of fine-tuners [12] Model and Prompt Management - Over 50% of respondents are updating their models at least monthly, with 17% doing so weekly [14] - 70% of respondents are updating prompts at least monthly, and 10% are doing so daily [14] - A significant 31% of respondents lack any system for managing their prompts [15] Multimodal AI and Agents - Image, video, and audio usage lag text usage significantly, indicating a "multimodal production gap" [16][17] - Audio has the highest intent to adopt among those not currently using it, with 37% planning to eventually adopt audio [18] - While 80% of respondents say LLMs are working well, less than 20% say the same about agents [20] Monitoring and Evaluation - Most respondents use multiple methods to monitor their AI systems, with 60% using standard observability and over 50% relying on offline evaluation [22] - Human review remains the most popular method for evaluating model and system accuracy and quality [23] - 65% of respondents are using a dedicated vector database [24] Industry Outlook - The mean guess for the percentage of the US Gen Z population that will have AI girlfriends/boyfriends is 26% [27] - Evaluation is the number one most painful thing about AI engineering today [28]
X @CoinGecko
CoinGecko· 2025-07-31 19:09
Hackathon Overview - CoinGecko is hosting an MCP Hackathon focused on building with crypto price data and AI [1] - The hackathon encourages participation from builders, researchers, and tinkerers [1] Prizes and Incentives - The hackathon offers prizes worth up to $13,000 [1] - Over $1,300 in prizes are specifically allocated for projects utilizing CoinGecko's crypto price data in AI and LLMs [1] Participation Details - Participants are invited to BuildwithCoinGecko and AI [1] - Interested individuals can find participation details at the provided URL [1]
X @Avi Chawla
Avi Chawla· 2025-07-30 06:32
Key Features - MCP-use 简化了 LLMs 连接到 MCP 服务器和构建本地 MCP 客户端的过程 [1] - 该工具与 Ollama 和 LangChain 兼容 [2] - 支持异步流式传输 Agent 的输出 [2] - 内置调试模式 [2] - 可以限制 MCP 工具的使用 [2]
Booking Holdings(BKNG) - 2025 Q2 - Earnings Call Transcript
2025-07-29 21:30
Financial Data and Key Metrics Changes - Booking Holdings reported a strong quarter with adjusted EBITDA increasing by 28% year over year, driven by revenue outperformance and disciplined expense management [3][32] - Room nights reached 309 million, an 8% year over year increase, with gross bookings up 13% and revenue up 16%, both exceeding prior expectations [5][29] - Adjusted earnings per share grew 32% year over year, benefiting from a 5% lower average share count [32] Business Line Data and Key Metrics Changes - Alternative accommodations room nights grew by 10%, outpacing the core hotel business, with total listings reaching 8.4 million, an 8% increase year over year [8][25] - The Genius loyalty program saw over 30% of active travelers in higher tiers, contributing to a mid-50% share of total room nights booked [9][27] - Non-accommodation verticals showed strong growth, with flight tickets booked increasing by 44% and attractions ticket growth more than doubling year over year [12][27] Market Data and Key Metrics Changes - Asia experienced low double-digit room night growth, while the U.S. remained the slowest growing region, though growth improved slightly from the first quarter [10][22] - Europe saw high single-digit growth, and the Rest of World region also experienced high single-digit growth [22] - The U.S. market showed lower average daily rates (ADRs) and shorter lengths of stay, indicating cautious consumer spending [23] Company Strategy and Development Direction - The company is focused on expanding alternative accommodations, enhancing the Genius loyalty program, and developing AI capabilities to improve the travel experience [7][12] - The connected trip vision aims to provide a more personalized travel experience by integrating various travel services [11][82] - The company is investing in technology and partnerships to leverage AI for better service and operational efficiency [16][17] Management's Comments on Operating Environment and Future Outlook - Management remains optimistic about long-term growth in the travel industry despite geopolitical and macroeconomic uncertainties [18][39] - The company expects third quarter room night growth to moderate, with guidance reflecting a cautious outlook due to tougher year-over-year comparisons [35][72] - Full-year guidance has been increased, with expectations for low double-digit growth in gross bookings and revenue [39] Other Important Information - The company generated approximately $3.1 billion in free cash flow during the quarter, with an ending cash and investments balance of $18.2 billion [34] - The transformation program is expected to yield approximately $350 million in annual run rate savings [33] Q&A Session Questions and Answers Question: Can you provide details on the performance of different markets in Asia? - Management expressed satisfaction with overall performance in Asia, highlighting that while they do not compete strongly in China, inbound travel to China remains beneficial [45][46] Question: What is the potential impact of large language models (LLMs) on the business? - Management sees LLMs as an exciting opportunity for improved service and efficiency, although it is still early to quantify their impact [48][50] Question: What initiatives are being taken to boost growth in the U.S. market? - The company is focusing on small initiatives across product, supply, and marketing to gradually gain market share in the U.S. [58][60] Question: What are the key investments needed for scaling the Connected Trip? - Management emphasized the importance of expanding inventory across all travel verticals and leveraging data for personalized customer experiences [82][90]