LLMs
Search documents
X @Avi Chawla
Avi Chawla· 2025-11-24 06:31
There are primarily 4 stages of building LLMs from scratch:- Pre-training- Instruction fine-tuning- Preference fine-tuning- Reasoning fine-tuningLet's understand each of them!0️⃣ Randomly initialized LLMAt this point, the model knows nothing.You ask it “What is an LLM?” and get gibberish like “try peter hand and hello 448Sn”.It hasn’t seen any data yet and possesses just random weights.1️⃣ Pre-trainingThis stage teaches the LLM the basics of language by training it on massive corpora to predict the next tok ...
How Generative AI Could Change Shopping Forever — With Rubail Birwadker
Alex Kantrowitz· 2025-11-20 17:30
Let's talk about whether AI will have us all transacting within chat bots and whether your purchase data might help AI agents give you better recommendations. We're joined today by Rubel Burwalker, the global head of growth products and strategic partnerships of Visa in a conversation brought to you by Visa and Rubel. It's so great to see you.Welcome. >> Thanks for having Alex. Okay, so let me start with this question of conversational commerce or um you know that's a jargon way of saying maybe we'll just a ...
Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai
AI Engineer· 2025-11-20 14:14
Model Performance & Ranking - GLM 4.6 is currently ranked 1 on the LMSYS Chatbot Arena, on par with GPT-4o and Claude 3.5 Sonnet [1] - The GLM family of models has achieved over 100 million downloads [1] Training & Architecture - zAI utilized a single-stage Reinforcement Learning (RL) approach for training GLM 4.6 [1] - zAI developed the "SLIME" RL framework for handling complex agent trajectories [1] - The pre-training data for GLM 4.6 consisted of 15 trillion tokens [1] - zAI filters 15T tokens, moves to repo-level code contexts, and integrates agentic reasoning data [1] - Token-Weighted Loss is used for coding [1] Multimodal Capabilities - GLM 4.5V features native resolution processing to improve UI navigation and video understanding [1] Deployment & Integration - GLM models can be deployed using vLLM, SGLang, and Hugging Face [1] Research & Development - zAI is actively researching models such as GLM-4.5, GLM-4.5V, CogVideoX, and CogAgent [1] - zAI is researching the capabilities of model Agents and integration with Agent frameworks like langchain-chatchat and chatpdf [1]
X @Nick Szabo
Nick Szabo· 2025-11-20 06:10
RT Nick Szabo (@NickSzabo4)The problem isn't so much responsibility, it's legal barriers preventing end users from effective use of the automation, usually via preventing effective supply of these needs, to artificially protect the supposedly "responsible" professionals. Recent changes to ChatGPT, Grok, etc. regarding legal advice and personalized education, and medical advice and personalized education, are a good case in point. These changes will deprive billions of people of effective, very-low-cost, at- ...
X @Avi Chawla
Avi Chawla· 2025-11-18 19:15
You're in an AI engineer interview at OpenAI.The interviewer asks:"We're ready to launch GPT-5.How would you make sure it's secure and bias-free?"You: "I'll fine-tune it on safe datasets and validate outputs."Interview over!The post below explains what you missed:Avi Chawla (@_avichawla):OpenAI.Google.Meta.Everyone's facing the same problem with LLMs:How to prevent them from adversarial attacks via prompts.OpenAI even paid $500k in a Kaggle contest to find vulnerabilities in gpt-oss-20b.Why?Because despite ...
X @Avi Chawla
Avi Chawla· 2025-11-18 12:19
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/ly7ZBo29vdAvi Chawla (@_avichawla):OpenAI.Google.Meta.Everyone's facing the same problem with LLMs:How to prevent them from adversarial attacks via prompts.OpenAI even paid $500k in a Kaggle contest to find vulnerabilities in gpt-oss-20b.Why?Because despite evaluating LLMs against correctness, https://t.co/QIb6V28KgQ ...
X @Avi Chawla
Avi Chawla· 2025-11-18 06:31
LLM Security Challenges - LLMs face adversarial attacks via prompts, requiring focus on security beyond correctness, faithfulness, and factual accuracy [1] - A well-crafted prompt can lead to PII leakage, bypassing safety filters, and generating harmful content [2] - Red teaming is core to model development, demanding SOTA adversarial strategies like prompt injections and jailbreaking [2] Red Teaming and Vulnerability Detection - Evaluating LLM responses against PII leakage, bias, toxic outputs, unauthorized access, and harmful content generation is crucial [3] - Single-turn and multi-turn chatbots require different tests, focusing on immediate jailbreaks versus conversational grooming, respectively [3] - DeepTeam, an open-source framework, performs end-to-end LLM red teaming, detecting 40+ vulnerabilities and simulating 10+ attack methods [4][6] DeepTeam Framework Features - DeepTeam automatically generates prompts to detect specified vulnerabilities and produces detailed reports [5] - The framework implements SOTA red teaming techniques and offers guardrails to prevent issues in production [5] - DeepTeam dynamically simulates adversarial attacks at run-time based on specified vulnerabilities, eliminating the need for datasets [6] Core Insight - LLM security is a red teaming problem, not a benchmarking problem; thinking like an attacker from day one is essential [6]
X @Avi Chawla
Avi Chawla· 2025-11-15 12:22
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/pxlp7JJJ4VAvi Chawla (@_avichawla):How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works https://t.co/YcTgvXbJlb ...
X @Avi Chawla
Avi Chawla· 2025-11-11 20:14
Mixture of Experts (MoE) Architecture - MoE is a popular architecture leveraging different experts to enhance Transformer models [1] - MoE differs from Transformer in the decoder block, utilizing experts (smaller feed-forward networks) instead of a single feed-forward network [2][3] - During inference, only a subset of experts are selected in MoE, leading to faster inference [4] - A router, a multi-class classifier, selects the top K experts by producing softmax scores [5] - The router is trained with the network to learn the best expert selection [5] Training Challenges and Solutions - Challenge 1: Some experts may become under-trained due to the overselection of a few experts [5] - Solution 1: Add noise to the router's feed-forward output and set all but the top K logits to negative infinity to allow other experts to train [5][6] - Challenge 2: Some experts may be exposed to more tokens than others, leading to under-trained experts [6] - Solution 2: Limit the number of tokens an expert can process; if the limit is reached, the token is passed to the next best expert [6] MoE Characteristics and Examples - Text passes through different experts across layers, and chosen experts differ between tokens [7] - MoEs have more parameters to load, but only a fraction are activated during inference, resulting in faster inference [9] - Mixtral 8x7B and Llama 4 are examples of popular MoE-based LLMs [9]