nanoGPT

Search documents
4小时喜提专属 ChatGPT、卡帕西又整活,自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
3 6 Ke· 2025-10-14 12:52
Core Insights - Andrej Karpathy, former AI director at Tesla and co-founder of OpenAI, has released a new open-source project called nanochat, which has gained 7.9k stars on GitHub [1] - Nanochat is a minimalistic end-to-end training and inference toolchain designed to replicate a simplified version of ChatGPT, differing from Karpathy's previous project, nanoGPT [1][6] Project Overview - Nanochat allows users to train a conversational language model for approximately $100, achieving performance that surpasses GPT-2's CORE metric after about 12 hours of training [2][3] - The project can be initiated by launching a cloud GPU server and running a script, enabling users to interact with their trained model via a web interface [2] Technical Specifications - The project consists of around 8000 lines of code, primarily handwritten by Karpathy, emphasizing a clear code structure [7] - The architecture of nanochat is similar to the Llama model but is designed to be simpler, incorporating elements from modded-nanoGPT [7][8] - Key features include dense transformers, rotary embeddings, and a unique optimizer combining Muon and AdamW [8][9] Performance Metrics - Performance metrics for various training stages are provided, showing improvements in CORE, ARC-Challenge, ARC-Easy, GSM8K, HumanEval, and MMLU scores [5] Community Impact - The release of nanochat has generated significant interest on social media, with users expressing excitement about its potential to democratize access to language model training [10] - The project is expected to serve as a valuable resource for researchers and machine learning enthusiasts, enabling them to experiment with language models more easily [10]
AI大神卡帕西开源项目爆火,仅用4小时、8000行代码克隆ChatGPT
3 6 Ke· 2025-10-14 09:28
Core Insights - Andrej Karpathy launched a new open-source project called "nanochat," which he describes as one of his most unrestrained projects, providing a simplified full-stack training and inference process for creating a ChatGPT-like model from scratch [2][5]. Summary by Sections Project Overview - Nanochat is a minimalistic, full-stack project that allows users to create a chatbot by renting a cloud GPU server and running a single script, enabling interaction with a trained large language model (LLM) within approximately four hours [2][10]. Key Components of Nanochat 1. **Data Preparation**: Involves creating a tokenizer from raw web text to convert vast amounts of text into numerical data [5]. 2. **Model Pre-training**: A foundational Transformer model is trained on large datasets to learn language syntax, facts, and basic reasoning, which is the most time-consuming and critical step [5]. 3. **Alignment Fine-tuning**: - **Instruction Fine-tuning**: Uses high-quality Q&A and dialogue data to teach the model to follow instructions and converse like an assistant [6]. - **Reinforcement Learning**: An optional stage to enhance model performance on specific tasks through rewards and penalties [6]. 4. **Model Inference**: Provides an efficient engine for real-time interaction with the trained model via command line or a web interface [6]. 5. **Evaluation**: Automatically generates a detailed report showcasing the model's performance across various standard tests [6]. Educational and Research Significance - Nanochat serves as an educational tool, allowing developers and researchers to build their own small chat models at a low cost, experiencing the entire process from raw text to intelligent dialogue assistant [7]. - It provides a lightweight, controllable, and reproducible experimental platform for researchers to test new model architectures and training methods without needing expensive computational resources [7]. Cost and Efficiency - The total cost to train a small ChatGPT clone using nanochat is approximately $100, with a training duration of about four hours on an 8XH100 node [10]. - Training for around 12 hours can surpass GPT-2 on the CORE metric, and with a budget of about $1000, the model can become more coherent and capable of solving simple math and programming problems [14]. Technical Insights - The architecture of nanochat is similar to the Meta Llama model but simplified, aiming to establish a robust baseline for models of this scale [15]. - Key features include the use of a Muon + AdamW optimizer and various design choices that enhance model performance [16][20].
100美元、仅8000行代码,复现ChatGPT,Karpathy:这是我写过的最疯狂的项目
Founder Park· 2025-10-14 04:18
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to build a ChatGPT-like model with minimal resources [3][10]. - The project aims to democratize access to large language model (LLM) research, enabling anyone to train their own models easily [12][22]. Project Overview - "nanochat" is described as a complete training framework for creating a ChatGPT-like model from scratch, consisting of approximately 8000 lines of clean code [6][26]. - The entire system can be set up on a single GPU machine, requiring only about 4 hours of training time and costing around $100 [10][13]. - The project includes all stages of model development, from data preparation to fine-tuning and deployment [6][12]. Performance Metrics - A model trained for about 12 hours can surpass the core metrics of GPT-2, while a 24-hour training session can achieve performance comparable to GPT-3 Small [11][13]. - Specific performance metrics include scores on various benchmarks such as MMLU and GSM8K, indicating the model's capabilities in reasoning and code generation [11][27]. Development Philosophy - Karpathy emphasizes a philosophy of making LLM research accessible and reproducible, similar to his previous work with nanoGPT [12][22]. - The project is seen as a potential baseline for future research and experimentation within the open-source community [8][16]. Community Engagement - The article mentions a growing community around AI products, with over 15,000 members in the "AI Product Marketplace" group, highlighting the interest in AI applications [9].
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
3 6 Ke· 2025-10-14 02:25
Core Insights - Andrej Karpathy has released a new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5] - The project consists of around 8,000 lines of code and was quickly adopted by the community, gaining over 4,500 stars on GitHub within 12 hours [2][5] - nanochat provides a complete training and inference pipeline for large language models (LLMs), differing from Karpathy's previous project, nanoGPT, which only covered the pre-training phase [2][5] Project Details - Users can train their own LLM by running a script on a cloud GPU machine, achieving a functional model in about 4 hours [2][3] - The project includes features such as a new Rust-based tokenizer, a high-efficiency inference engine, and automatic generation of Markdown scorecards summarizing the training process [3][5] - Karpathy estimates that with a budget of $1,000 and 41.6 hours of training, users can achieve significant improvements in model coherence and performance on various tasks [4][5] Performance Metrics - Initial CORE scores for the model were recorded at 0.2219, with improvements noted during different training phases [7] - The model's performance on specific benchmarks includes scores such as 40+ on MMLU and 70+ on ARC-Easy after sufficient training [4][7] Community and Future Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, similar to nanoGPT, and encourages community collaboration for further improvements [5][8] - Despite its capabilities, Karpathy cautions that nanochat is not suitable for personalized applications without significant additional work and data preparation [9][10]
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
机器之心· 2025-10-14 02:06
| | | 「这是我写过最疯狂的代码之一。」 本周一,AI 领域大神 Andrej Karpathy 发布了自己的最新开源项目,瞬间引来了整个社区的关注。 这个名为 nanochat 的项目据说可以教你从零开始,以 100 美元的成本自建 ChatGPT。它覆盖 LLM 的训练和推理,只要跟着学就可以了解构建大模型的所有步骤 了。 总共是 8000 行代码,在 GitHub 上放出不到 12 个小时,star 量就已经超过 4500: GitHub 链接:https://github.com/karpathy/nanochat 与 Karpathy 之前发布的 nanoGPT 仓库(只覆盖了预训练阶段)不同, nanochat 是一个从零开始实现的、极简但完整的 ChatGPT 克隆版训练 / 推理全流程项目, 所有内容都集中在一个依赖极少、结构干净的代码库中 。 你只需要启动一台云 GPU 机器,运行一个脚本,大约 4 小时后就可以在 ChatGPT 风格的 Web 界面里和你自己的 LLM 聊天。 仓库大约 8,000 行代码 ,但已经实现了以下全部功能: 使用全新的 Rust 实现训练分词器。 在 Fi ...