LLMs
Search documents
X @Avi Chawla
Avi Chawla· 2025-11-11 20:14
Mixture of Experts (MoE) Architecture - MoE is a popular architecture leveraging different experts to enhance Transformer models [1] - MoE differs from Transformer in the decoder block, utilizing experts (smaller feed-forward networks) instead of a single feed-forward network [2][3] - During inference, only a subset of experts are selected in MoE, leading to faster inference [4] - A router, a multi-class classifier, selects the top K experts by producing softmax scores [5] - The router is trained with the network to learn the best expert selection [5] Training Challenges and Solutions - Challenge 1: Some experts may become under-trained due to the overselection of a few experts [5] - Solution 1: Add noise to the router's feed-forward output and set all but the top K logits to negative infinity to allow other experts to train [5][6] - Challenge 2: Some experts may be exposed to more tokens than others, leading to under-trained experts [6] - Solution 2: Limit the number of tokens an expert can process; if the limit is reached, the token is passed to the next best expert [6] MoE Characteristics and Examples - Text passes through different experts across layers, and chosen experts differ between tokens [7] - MoEs have more parameters to load, but only a fraction are activated during inference, resulting in faster inference [9] - Mixtral 8x7B and Llama 4 are examples of popular MoE-based LLMs [9]
X @mert | helius.dev
mert | helius.dev· 2025-11-09 18:58
technological paradigm shifts are what define entire markets but theyre rarein crypto:1st shift: bitcoin2nd: programmability (ethereum)3rd: scale (solana)4th: privacy (zcash)note how privacy also coincides with the paradigm shift of AI and LLMstrillions ...
X @Avi Chawla
Avi Chawla· 2025-11-08 06:31
AI Agent Workflow Platforms - Sim AI is a user-friendly, open-source platform for building AI agent workflows, supporting major LLMs, MCP servers, and vectorDBs [1] - Transformer Lab offers tools like RAGFlow for deep document understanding and AutoAgent, a zero-code framework for building and deploying Agents [2] - Anything LLM is an all-in-one AI app for chatting with documents and using AI Agents, designed for multi-user environments and local operation [6] Open-Source LLM Tools - Llama Factory allows training and fine-tuning of open-source LLMs and VLMs without coding, supporting over 100 models [6] - RAGFlow is a RAG engine for building enterprise-grade RAG workflows on complex documents with citations, supporting multimodal data [2][4] - AutoAgent is a zero-code framework for building and deploying Agents using natural language, with universal LLM support and a native Vector DB [2][5] Key Features & Technologies - Sim AI's Finance Agent uses Firecrawl for web searches and Alpha Vantage's API for stock data via MCP servers [1] - RAGFlow supports multimodal data and deep research capabilities [2] - AutoAgent features function-calling and ReAct interaction modes [5] Community & Popularity - Sim AI is 100% open-source with 18 thousand stars [1] - Transformer Lab is 100% open-source with over 68 thousand stars [2] - LLaMA-Factory is 100% open-source with 62 thousand stars [6] - Anything LLM is 100% open-source with 48 thousand stars [6] - One project is 100% open-source with 8 thousand stars [3]
X @Avi Chawla
Avi Chawla· 2025-11-07 19:00
RT Avi Chawla (@_avichawla)5 Agentic AI design patterns, explained visually!Agentic behaviors allow LLMs to refine their output by incorporating self-evaluation, planning, and collaboration!The visual depicts the 5 most popular design patterns for building AI Agents.1️⃣ Reflection patternThe AI reviews its own work to spot mistakes and iterate until it produces the final response.2️⃣ Tool use patternTools allow LLMs to gather more information by:- Querying a vector database- Executing Python scripts- Invoki ...
X @Avi Chawla
Avi Chawla· 2025-11-07 06:36
AI Agent Design Patterns - Agentic AI 通过整合自我评估、规划和协作来优化输出 [1] - 报告描述了构建 AI Agents 的 5 种最流行的设计模式 [1] Key Design Patterns - 反思模式:AI 审查自身工作以发现错误并迭代,直到产生最终响应 [1] - 工具使用模式:工具允许 LLM 通过查询向量数据库、执行 Python 脚本、调用 API 等来收集更多信息 [1][3] - ReAct (Reason and Act) 模式:ReAct 结合了反思模式和工具使用模式 [2] - 规划模式:AI 通过细分任务和大纲目标来创建路线图,从而更有效地解决任务 [2] - 多代理模式:有多个代理,每个代理都有特定的角色和任务,每个代理也可以访问工具 [3] Frameworks and Implementation - 像 CrewAI 这样的框架主要默认使用 ReAct 模式 [2] - 在 CrewAI 中,指定 `planning=True` 以使用规划模式 [2]
X @Avi Chawla
Avi Chawla· 2025-11-06 11:53
AI Engineering & RAG - The document discusses building a unified query engine over data spread across several sources using vector DB and RAG (Retrieval-Augmented Generation) [1] - It presents a scenario of an AI engineer interview at Google, focusing on querying data from sources like Gmail and Drive [1] - The author shares tutorials and insights on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs [1]
X @Avi Chawla
Avi Chawla· 2025-11-03 12:42
Product Update - Firecrawl's v2 endpoint enables scraping visual content for multimodal LLM applications [1] - The new endpoint supports filtering by resolution, aspect ratio, and image type [1] Community Engagement - The project has garnered over 66 thousand stars on GitHub [1] Application - The scraped images can be used to fine-tune LLMs [1]
X @TechCrunch
TechCrunch· 2025-11-01 15:02
Research Focus - Andon Labs 的 AI 研究人员测试了 LLM 在真空机器人中的应用,以评估其具身智能的准备程度 [1] - 该研究观察到 LLM 嵌入机器人后产生的有趣现象 [1]
X @Elon Musk
Elon Musk· 2025-10-31 21:35
RT DataRepublican (small r) (@DataRepublican)This is how fast Grok has evolved in 10 months.I like to evaluate LLMs by asking them to do SVG images. Why? SVG is a markup language - you have to reason around it. It's not generative AI photos. You have to write code that says "draw a circle for the cat's face, draw two triangles for the ear" and so on. You have to be able to describe what a cat looks like at an abstract level.The left is Grok creating a SVG of a Siamese cat from December 2021. The right is Gr ...
X @mert | helius.dev
mert | helius.dev· 2025-10-30 10:45
Negative Impact of LLMs on Social Media - LLMs significantly increased the number of bots on social media platforms [1] - LLMs have led to a surge in misinformation and falsehoods [1] - LLMs amplified the presence of individuals with Dunning-Kruger effect, falsely believing they are technical experts [1] Concerns Regarding LLM Reliability - LLMs are prone to hallucinations, raising concerns about their reliability [1]