How Intuit uses LLMs to explain taxes to millions of taxpayers - Jaspreet Singh, Intuit

Intuit's Use of LLMs in TurboTax - Intuit successfully processed 44 million tax returns for tax year 2023, aiming to provide users with high confidence in their tax filings and ensure they receive the best deductions [2] - Intuit's Geni experiences are built on GenOS, a proprietary generative OS platform designed to address the limitations of out-of-the-box tooling, especially concerning regulatory compliance, safety, and security in the tax domain [4][5] - Intuit uses Claude (Anthropic) for static queries related to tax refunds and OpenAI's GPT-4 for dynamic question answering, such as user-specific tax inquiries [9][10][12] - Intuit is one of the biggest users of Claude, with a multi-million dollar contract [9][10] Development and Evaluation - Intuit emphasizes a phased evaluation system, starting with manual evaluations by tax analysts and transitioning to automated evaluations using LLM as a judge [16][17] - Tax analysts also serve as prompt engineers, leveraging their expertise to ensure accurate evaluations and prompt design [16][17] - Key evaluation pillars include accuracy, relevancy, and coherence, with a strong focus on tax accuracy [20][24] - Intuit uses AWS Ground Truth for creating golden datasets for evaluations [22] Challenges and Learnings - LLM contracts are expensive, and long-term contracts are slightly cheaper but create vendor lock-in [25][26] - LLM models have higher latency compared to backend services (3-10 seconds), which can be exacerbated during peak tax season [27][28] - Intuit employs safety guardrails and ML models to prevent hallucination of numbers in LLM responses, ensuring data accuracy [40][41] - Graph RAG outperforms regular RAG in providing personalized and helpful answers to users [42][43]