个性化模型 - filings, earnings calls, financial reports, news

个性化模型

Search documents

100美元、8000行代码手搓ChatGPT，Karpathy最新开源项目爆火，一夜近5k star

机器之心· 2025-10-14 02:06

Core Insights - The article discusses Andrej Karpathy's new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5]. - The project consists of around 8,000 lines of code and provides a complete training and inference process for a simplified version of ChatGPT [2][4]. - Users can set up a cloud GPU machine and run a script to interact with their own language model (LLM) via a web interface after about 4 hours of training [3][5]. Project Features - nanochat includes a new Rust implementation for training tokenizers and pre-trains a Transformer LLM on the FineWeb dataset, evaluating its performance across multiple metrics [4]. - The project allows for fine-tuning and evaluation of the model on various tasks, including world knowledge multiple-choice questions, mathematics, and coding [4][5]. - Karpathy aims to create a unified, readable, and easily modifiable codebase that can serve as a strong baseline for future developments in LLMs [5][6]. Performance Metrics - Initial training costs around $100, achieving a model that can engage in basic conversations and perform simple tasks [5]. - With a budget of $1,000 and extended training time, the model's coherence improves significantly, enabling it to tackle basic math and coding tasks [5]. - Performance metrics indicate that a model trained for 24 hours can achieve scores above 40 in MMLU and 70 in ARC-Easy, showcasing its capabilities [5][10]. Community and Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, encouraging community collaboration for iterative improvements [6]. - The project is positioned as a capstone for an upcoming LLM101 course, which is still under development [5]. Limitations and Considerations - Karpathy cautions that nanochat is not designed for personalized applications and should be viewed as a rudimentary model lacking advanced intelligence [12][13]. - To achieve effective personalization, significant steps involving data preparation, synthetic data generation, and fine-tuning with robust models are necessary [13].