Large Language Models
Search documents
Make your LLM app a Domain Expert: How to Build an Expert System — Christopher Lovejoy, Anterior
AI Engineer· 2025-07-28 19:55
Core Problem & Solution - Vertical AI applications face a "last mile problem" in understanding industry-specific context and workflows, which is more critical than model sophistication [4][6] - Anterior proposes an "adaptive domain intelligence engine" to convert customer-specific domain insights into performance improvements [17] - The engine consists of measurement (performance evaluation) and improvement (iterative refinement) components [17] Measurement & Metrics - Defining key performance metrics that users care about is crucial, such as minimizing false approvals in healthcare or preventing dollar loss from fraud [18][19][20] - Developing a failure mode ontology helps categorize and analyze different ways the AI can fail, enabling targeted improvements [21][22] - Combining metric tracking with failure mode analysis allows prioritization of development efforts based on the impact on key metrics [26][27] Iteration & Improvement - Failure mode labeling creates ready-made datasets for iterative model improvement, using production data to ensure relevance [29] - Domain experts can suggest changes to the application pipeline and provide new domain knowledge to enhance performance [32][33] - This process enables rapid iteration, potentially fixing issues the same day by adding relevant domain knowledge and validating with evals [37] Domain Expertise - The level of domain expertise required depends on the specific workflow and optimization goals, with clinical reasoning requiring experienced doctors [38][39] - Bespoke tooling is recommended for integrating domain expert feedback into the platform and workflows [41] - Domain expert reviews provide performance metrics, failure modes, and suggested improvements, all in one [38] Results & Performance - Anterior achieved a 95% accuracy baseline in approving care requests, which was further improved to 99% through iterative refinement using the described system [14][15]
AI: Inclusive and Transformative | Manish Gupta | TEDxIITGandhinagar
TEDx Talks· 2025-07-28 16:02
AI发展与应用 - DeepMind 的使命是负责任地构建 AI,以造福人类,深度学习已成为解决图像分类、语音识别和机器翻译等问题的最佳方法 [5][6] - Transformer 架构促成了大型语言模型的构建,这些模型在大量公开数据上进行训练,能够解决广泛的问题 [8] - 现代基础模型(LLM)已超越文本,成为多模态模型,能够处理文本、手写文本和图像,为个性化辅导等学习方式带来可能性 [11][12] - Gemini 1.5 Pro 能够处理高达 1 million 多模态 tokens 的上下文窗口,可以处理大量信息作为输入 [15] - AI Agents 不仅限于简单的聊天机器人,还可以进行语音交互,甚至在 3D 世界中进行实时交互 [16] AI的包容性与可及性 - 行业致力于弥合英语和其他语言(特别是印度语言)之间 AI 能力的差距,目标是开发能够理解 125 种以上印度语言的模型 [19][20][21][22] - Vani 项目与印度科学研究所合作,旨在收集印度各个角落的语音数据,目标是从印度每个地区收集数据,以覆盖更多零语料库语言 [24][25] AI在特定领域的应用 - 行业正在构建数字农业堆栈的基础层,利用卫星图像识别农田边界、作物类型和水源,为农民提供个性化服务,如作物保险 [26][27][28] - AlphaFold 通过预测蛋白质结构,将原本需要 5 年的研究缩短到几秒钟,并在不到一年的时间内完成了 200 million 个蛋白质结构的预测,并免费提供数据,极大地加速了科学发现 [29][30][31][32] 未来展望 - 行业期望 AI 能够帮助更多人,使他们能够做出诺贝尔奖级别的贡献 [35]
X @Avi Chawla
Avi Chawla· 2025-07-28 06:30
Overview - Taipy is an open-source Python AI & data web application builder [1] - Taipy can build prototypes and robust production-ready data apps [1] Technology & Features - Taipy eliminates the need to learn JavaScript, CSS, or HTML [1] - Taipy's VS Code extension provides no-code functionalities to build data apps [2] - Taipy is presented as a more robust version of Streamlit [1] - Taipy has a noticeable latency difference compared to other apps [1] Community & Adoption - Taipy is fully open-source with over 18 thousand stars [2]
Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith
AI Engineer· 2025-07-27 16:15
LLM Evaluation Challenges - Traditional benchmarks often fail to reflect real-world LLM performance, reliability, and user satisfaction [1] - Evaluating reasoning quality, agent consistency, MCP integration, and user-focused outcomes requires going beyond standard benchmarks [1] - Benchmarks and leaderboards rarely reflect the realities of production AI [1] Evaluation Strategies & Frameworks - The industry needs tangible evaluation strategies using open-source frameworks like GuideLLM and lm-eval-harness [1] - Custom eval suites tailored to specific use cases are crucial for accurate assessment [1] - Integrating human-in-the-loop feedback is essential for better user-aligned outcomes [1] Key Evaluation Areas - Evaluating reasoning skills, consistency, and reliability in agentic AI applications is critical [1] - Validating MCP (Model Context Protocol) and agent interactions with practical reliability tests is necessary [1] - Agent reliability checks should reflect production conditions [1] Deployment Considerations - Robust evaluation is critical for confidently deploying LLMs in real-world applications like chatbots, copilots, or autonomous AI agents [1]
Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo
AI Engineer· 2025-07-26 17:00
Autonomous Driving History and Challenges - Autonomous driving research started in the 1980s with simple neural networks and evolved to end-to-end driving models by 2020 [2] - Scaling autonomous driving presents challenges, requiring solutions for long-tail events and rare scenarios [5][7] - Foundation models, like Gemini, show promise in generalizing to rare driving events and providing appropriate responses [8][9][10][11] Emma: A Multimodal Large Language Model for Autonomous Driving - The company is exploring Emma, a driving system leveraging Gemini, which uses routing text and camera input to predict future waypoints [11][12][13][14] - Emma is self-supervised, camera-only, and high-dimension map-free, achieving state-of-the-art quality on the nuScenes benchmark [15][16][17] - Channel reasoning is incorporated into Emma, allowing the model to explain its driving decisions and improve performance on a 100k dataset [17] Evaluation and Validation - Evaluation is crucial for the success of autonomous driving models, including open loop evaluation, simulations, and real-world testing [25] - Generative models are being explored for sensor simulation to evaluate the planner under various conditions like rain and different times of day [26][27][28] Future Directions - The company aims to improve generalization and scale autonomous driving by leveraging foundation models [30] - Training on larger datasets improves the quality of the planner [19][20] - The company is exploring training on various tasks, such as 3D detection and rograph estimation, to create a more generalizable model [21][22][23][24]
X @The Wall Street Journal
The Wall Street Journal· 2025-07-23 10:59
Vulnerability & Mitigation - Large language models like Grok have vulnerabilities that need to be addressed immediately [1] - Addressing vulnerabilities is crucial as these models gain capabilities beyond language generation [1]
X @The Wall Street Journal
The Wall Street Journal· 2025-07-22 19:57
Large language models aren’t replacing traditional browsers anytime soon, but they have become another responsibility for brands https://t.co/n8m7uemRHr ...
X @Bloomberg
Bloomberg· 2025-07-22 11:22
Technology & Finance Convergence - Large language models are predicted to possess the technical capability to make real investment decisions for clients within five years [1]
X @Avi Chawla
Avi Chawla· 2025-07-21 06:39
LLM Development Stages - The document outlines four stages for building Large Language Models (LLMs) from scratch for real-world applications [1] - These stages include pre-training, instruction fine-tuning, preference fine-tuning, and reasoning fine-tuning [1] Techniques Overview - The document indicates that these techniques are visually summarized [1]
X @Avi Chawla
Avi Chawla· 2025-07-20 06:34
Expertise & Focus - The author has 9 years of experience training neural networks [1] - The content focuses on optimizing model training in the fields of Data Science (DS), Machine Learning (ML), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAGs) [1] Content Type - The author shares tutorials and insights daily on DS, ML, LLMs, and RAGs [1] - The content includes 16 ways to actively optimize model training [1]