LLMs
Search documents
Compilers in the Age of LLMs — Yusuf Olokoba, Muna
AI Engineer· 2025-11-24 20:16
AI Model Deployment Challenges - AI 工程团队面临着基础设施复杂性的问题,需要在不同平台和模型之间进行部署,并希望简化流程,使用户能够使用统一的客户端访问任何模型,而无需复杂的代码更改 [2][3][4] - 行业需要一种简单且标准化的方法,使开发人员能够轻松地将其内部构建的 AI 模型或在 GitHub 上找到的开源模型集成到其代码库中,并易于执行 [7] - 行业预测 AI 部署的未来是混合推理,即小型模型在本地或边缘位置与大型云 AI 模型协同工作,因此开发人员需要转向更低级别、更接近硬件且响应更快的解决方案 [8][9] Python Compiler Solution - 该方案构建了一个 Python 编译器,允许开发人员编写简单的 Python 代码,并将其转换为可在任何地方运行的微型自包含二进制文件,包括云、Apple 芯片等 [5] - 该编译器使用 LLM 在编译管道中生成 C++ 和 Rust 代码,从而能够运行各种 AI 模型,并扩展到服务器端以外的更多位置 [6][33] - 编译器通过 tracing 技术生成函数内部所有操作的图表示,最初尝试使用 PyTorch 的 Torch FX,但由于其对 PyTorch 代码的关注和对 fake 输入的依赖而放弃,转而使用 LLM 生成 traces,最终通过分析 Python 代码的抽象语法树并使用内部启发式方法构建内部表示 [13][14][15][16][17][18] - 编译器采用类型传播技术,通过分析 Python 函数的签名和 C++ 的原生类型信息,推断并约束生成代码中的变量类型,从而解决 Python 动态类型与 C++ 静态类型之间的差异 [25][26][27][28] Implementation and Usage - 通过类型信息传播,编译器能够生成正确的 C++ 代码,并将其编译为可在任何设备或平台上本地运行的动态库 [34][35][36] - 可以使用 FFI(外部函数接口)从 JavaScript 和 Node.js 调用编译后的库,从而允许在各种环境中使用编译后的 AI 模型 [37][38][39] - 通过创建一个 OpenAI 风格的客户端,可以将编译后的嵌入模型暴露给用户,从而使用户能够像使用官方 OpenAI 客户端一样访问任何开源模型 [40][41]
X @Avi Chawla
Avi Chawla· 2025-11-24 06:31
There are primarily 4 stages of building LLMs from scratch:- Pre-training- Instruction fine-tuning- Preference fine-tuning- Reasoning fine-tuningLet's understand each of them!0️⃣ Randomly initialized LLMAt this point, the model knows nothing.You ask it “What is an LLM?” and get gibberish like “try peter hand and hello 448Sn”.It hasn’t seen any data yet and possesses just random weights.1️⃣ Pre-trainingThis stage teaches the LLM the basics of language by training it on massive corpora to predict the next tok ...
How Generative AI Could Change Shopping Forever — With Rubail Birwadker
Alex Kantrowitz· 2025-11-20 17:30
Conversational Commerce & AI - Conversational commerce, facilitated by generative AI, is poised to become a significant channel for transactions, complementing existing online and physical commerce [3][4][9] - Visa views the agentic era as the next evolution of commerce, emphasizing autonomy and personalization through better data and improved user identification [5][6] - The key to success lies in personalization and context, leveraging LLMs to understand user preferences and connect them to the outside world [13][14] - Visa is focused on ensuring that Visa cards work seamlessly within these new platforms, regardless of what consumers are buying, from whom, and where [30] - Visa is exploring how personalization, with consumer consent, can enhance the discovery process by leveraging spending preferences to provide more relevant recommendations [31][36][37] Security & Fraud Prevention - Visa is prioritizing security in agentic commerce by implementing high authentication measures, biometric authentication, and payment instructions to minimize fraud [45][47][48] - Visa is developing the Trusted Agent Protocol to differentiate between good and bad bots in the e-commerce infrastructure, enabling merchants to provide better experiences to legitimate agents [57][59][60] - Visa is actively monitoring agentic transactions to identify patterns and behaviors, allowing for continuous improvement in fraud prevention and dispute resolution [48][49][51] Market Trends & Future Outlook - Visa has observed a 1,200% increase in agentic traffic to merchant websites, which has further quadrupled in the last 5 months, indicating growing consumer interest [67] - Visa anticipates statistically relevant data on agentic commerce within approximately 6 months, which will provide insights into adoption rates and inform product development [53] - Visa believes agentic commerce will be transformative for the payments industry, driven by increased traffic and the conversion of discovery into actual purchases [66][69]
Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai
AI Engineer· 2025-11-20 14:14
Model Performance & Ranking - GLM 4.6 is currently ranked 1 on the LMSYS Chatbot Arena, on par with GPT-4o and Claude 3.5 Sonnet [1] - The GLM family of models has achieved over 100 million downloads [1] Training & Architecture - zAI utilized a single-stage Reinforcement Learning (RL) approach for training GLM 4.6 [1] - zAI developed the "SLIME" RL framework for handling complex agent trajectories [1] - The pre-training data for GLM 4.6 consisted of 15 trillion tokens [1] - zAI filters 15T tokens, moves to repo-level code contexts, and integrates agentic reasoning data [1] - Token-Weighted Loss is used for coding [1] Multimodal Capabilities - GLM 4.5V features native resolution processing to improve UI navigation and video understanding [1] Deployment & Integration - GLM models can be deployed using vLLM, SGLang, and Hugging Face [1] Research & Development - zAI is actively researching models such as GLM-4.5, GLM-4.5V, CogVideoX, and CogAgent [1] - zAI is researching the capabilities of model Agents and integration with Agent frameworks like langchain-chatchat and chatpdf [1]
X @Nick Szabo
Nick Szabo· 2025-11-20 06:10
Regulatory & Ethical Concerns - Legal barriers prevent end users from effectively utilizing automation, hindering the supply of these needs and protecting professionals [1] - Changes to ChatGPT, Grok, etc, regarding legal, educational, and medical advice will deprive billions (1 billion = 10^9) of people of personalized knowledge [1] Impact of AI on Healthcare - Millions (1 million = 10^6) of deaths will needlessly result from restrictions on AI in healthcare [1] - LLMs surpass doctors in ultra-personalized, very-low-cost, and at-home healthcare [1]
X @Avi Chawla
Avi Chawla· 2025-11-18 19:15
Security Concerns - The industry faces challenges in preventing adversarial attacks via prompts in LLMs [1] - OpenAI paid $500k in a Kaggle contest to find vulnerabilities in gpt-oss-20b [1] Model Evaluation - LLMs are evaluated against correctness [1]
X @Avi Chawla
Avi Chawla· 2025-11-18 12:19
LLM Security Concerns - The industry faces a common challenge: preventing adversarial attacks on LLMs via prompts [1] - OpenAI invested $500 thousand in a Kaggle contest to identify vulnerabilities in gpt-oss-20b [1] Key Players - OpenAI, Google, and Meta are all grappling with prompt-based adversarial attacks on LLMs [1]
X @Avi Chawla
Avi Chawla· 2025-11-18 06:31
LLM Security Challenges - LLMs face adversarial attacks via prompts, requiring focus on security beyond correctness, faithfulness, and factual accuracy [1] - A well-crafted prompt can lead to PII leakage, bypassing safety filters, and generating harmful content [2] - Red teaming is core to model development, demanding SOTA adversarial strategies like prompt injections and jailbreaking [2] Red Teaming and Vulnerability Detection - Evaluating LLM responses against PII leakage, bias, toxic outputs, unauthorized access, and harmful content generation is crucial [3] - Single-turn and multi-turn chatbots require different tests, focusing on immediate jailbreaks versus conversational grooming, respectively [3] - DeepTeam, an open-source framework, performs end-to-end LLM red teaming, detecting 40+ vulnerabilities and simulating 10+ attack methods [4][6] DeepTeam Framework Features - DeepTeam automatically generates prompts to detect specified vulnerabilities and produces detailed reports [5] - The framework implements SOTA red teaming techniques and offers guardrails to prevent issues in production [5] - DeepTeam dynamically simulates adversarial attacks at run-time based on specified vulnerabilities, eliminating the need for datasets [6] Core Insight - LLM security is a red teaming problem, not a benchmarking problem; thinking like an attacker from day one is essential [6]
X @Avi Chawla
Avi Chawla· 2025-11-15 12:22
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/pxlp7JJJ4VAvi Chawla (@_avichawla):How to build a RAG app on AWS!The visual below shows the exact flow of how a simple RAG system works inside AWS, using services you already know.At its core, RAG is a two-stage pattern:- Ingestion (prepare knowledge)- Querying (use knowledge)Below is how each stage works https://t.co/YcTgvXbJlb ...