nanoGPT
Search documents
没博士没论文,这些人靠什么「野路子」杀进OpenAI等顶级AI大厂?
机器之心· 2026-01-25 04:01
Core Insights - The article emphasizes that individuals without traditional academic backgrounds can still secure opportunities in leading AI research labs like OpenAI through personal effort and strategic actions [2][25]. Group 1: Success Stories - Keller Jordan, who graduated from UC San Diego without any published papers, improved a research paper by a Google researcher, which led to a collaboration and a published paper [5][6]. - Keller's project, NanoGPT speed run, gained significant attention in the community, showcasing his ability to optimize a Transformer model and document his work thoroughly [6][7]. - Sholto Douglas transitioned from McKinsey to AI by engaging in independent research and asking insightful questions on GitHub, which caught the attention of a Google engineer and led to an interview opportunity [10][11]. - Andy L. Jones, a semi-retired quantitative trader, wrote a self-published paper that impressed xAI's Igor Babuschkin, leading to his recruitment at Anthropic [14][19]. - Kevin Wang, a student with a strong recommendation and a notable paper at NeurIPS, successfully joined OpenAI, highlighting the importance of mentorship in the recruitment process [21][23]. Group 2: Industry Trends - The article notes that AI research is becoming increasingly closed, with fewer public projects, but improving existing work remains a viable way to demonstrate capability [6]. - It highlights that many successful researchers in AI are not active on social media or traditional academic platforms, yet they contribute significantly to advancements in the field [13]. - The current era presents unique opportunities in AI research, where individuals can influence technology development while also receiving competitive compensation [26][28]. - The article concludes that a PhD is not a strict requirement for becoming a successful researcher or engineer; proactive engagement and impactful independent projects are key [28][29].
全球首个太空AI诞生,H100在轨炼出,马斯克爆赞
3 6 Ke· 2025-12-11 03:46
Core Insights - The first AI model trained in space using NVIDIA's H100 GPU has been successfully developed, marking a significant milestone in technology [1][3][9] - Google's Gemma model has also successfully operated in space, sending its first greeting message to Earth [1][11] Group 1: Space AI Development - The Starcloud-1 satellite, equipped with an H100 GPU, achieved a computational power 100 times stronger than any previous GPU sent to space [9] - The AI model trained in space is based on Karpathy's nanoGPT and utilizes Shakespearean texts for its training, allowing it to converse in a Renaissance language style [12][4] - The satellite has demonstrated real-time intelligence analysis capabilities, such as identifying wildfire signals and providing situational updates [16] Group 2: Industry Implications - Starcloud aims to establish space as a viable location for data centers, addressing the increasing pressure on Earth's data infrastructure [17][19] - The company plans to leverage solar energy to significantly reduce operational costs, projecting costs to be one-tenth of terrestrial data centers [20] - Starcloud's long-term vision includes creating a 5GW orbital data center with extensive solar panels and cooling systems [20][22] Group 3: Competitive Landscape - The space computing race is intensifying, with major players like Google, SpaceX, and Blue Origin entering the field [25][26] - Google's Project Suncatcher aims to deploy solar-powered GPU satellites, with plans for early testing by 2027 [26] - Musk's Starlink V3 satellites are expected to form a backbone for orbital computing infrastructure, potentially exceeding the average U.S. electricity consumption within two years [30]
4小时喜提专属 ChatGPT、卡帕西又整活,自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
3 6 Ke· 2025-10-14 12:52
Core Insights - Andrej Karpathy, former AI director at Tesla and co-founder of OpenAI, has released a new open-source project called nanochat, which has gained 7.9k stars on GitHub [1] - Nanochat is a minimalistic end-to-end training and inference toolchain designed to replicate a simplified version of ChatGPT, differing from Karpathy's previous project, nanoGPT [1][6] Project Overview - Nanochat allows users to train a conversational language model for approximately $100, achieving performance that surpasses GPT-2's CORE metric after about 12 hours of training [2][3] - The project can be initiated by launching a cloud GPU server and running a script, enabling users to interact with their trained model via a web interface [2] Technical Specifications - The project consists of around 8000 lines of code, primarily handwritten by Karpathy, emphasizing a clear code structure [7] - The architecture of nanochat is similar to the Llama model but is designed to be simpler, incorporating elements from modded-nanoGPT [7][8] - Key features include dense transformers, rotary embeddings, and a unique optimizer combining Muon and AdamW [8][9] Performance Metrics - Performance metrics for various training stages are provided, showing improvements in CORE, ARC-Challenge, ARC-Easy, GSM8K, HumanEval, and MMLU scores [5] Community Impact - The release of nanochat has generated significant interest on social media, with users expressing excitement about its potential to democratize access to language model training [10] - The project is expected to serve as a valuable resource for researchers and machine learning enthusiasts, enabling them to experiment with language models more easily [10]
AI大神卡帕西开源项目爆火,仅用4小时、8000行代码克隆ChatGPT
3 6 Ke· 2025-10-14 09:28
Core Insights - Andrej Karpathy launched a new open-source project called "nanochat," which he describes as one of his most unrestrained projects, providing a simplified full-stack training and inference process for creating a ChatGPT-like model from scratch [2][5]. Summary by Sections Project Overview - Nanochat is a minimalistic, full-stack project that allows users to create a chatbot by renting a cloud GPU server and running a single script, enabling interaction with a trained large language model (LLM) within approximately four hours [2][10]. Key Components of Nanochat 1. **Data Preparation**: Involves creating a tokenizer from raw web text to convert vast amounts of text into numerical data [5]. 2. **Model Pre-training**: A foundational Transformer model is trained on large datasets to learn language syntax, facts, and basic reasoning, which is the most time-consuming and critical step [5]. 3. **Alignment Fine-tuning**: - **Instruction Fine-tuning**: Uses high-quality Q&A and dialogue data to teach the model to follow instructions and converse like an assistant [6]. - **Reinforcement Learning**: An optional stage to enhance model performance on specific tasks through rewards and penalties [6]. 4. **Model Inference**: Provides an efficient engine for real-time interaction with the trained model via command line or a web interface [6]. 5. **Evaluation**: Automatically generates a detailed report showcasing the model's performance across various standard tests [6]. Educational and Research Significance - Nanochat serves as an educational tool, allowing developers and researchers to build their own small chat models at a low cost, experiencing the entire process from raw text to intelligent dialogue assistant [7]. - It provides a lightweight, controllable, and reproducible experimental platform for researchers to test new model architectures and training methods without needing expensive computational resources [7]. Cost and Efficiency - The total cost to train a small ChatGPT clone using nanochat is approximately $100, with a training duration of about four hours on an 8XH100 node [10]. - Training for around 12 hours can surpass GPT-2 on the CORE metric, and with a budget of about $1000, the model can become more coherent and capable of solving simple math and programming problems [14]. Technical Insights - The architecture of nanochat is similar to the Meta Llama model but simplified, aiming to establish a robust baseline for models of this scale [15]. - Key features include the use of a Muon + AdamW optimizer and various design choices that enhance model performance [16][20].
100美元、仅8000行代码,复现ChatGPT,Karpathy:这是我写过的最疯狂的项目
Founder Park· 2025-10-14 04:18
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to build a ChatGPT-like model with minimal resources [3][10]. - The project aims to democratize access to large language model (LLM) research, enabling anyone to train their own models easily [12][22]. Project Overview - "nanochat" is described as a complete training framework for creating a ChatGPT-like model from scratch, consisting of approximately 8000 lines of clean code [6][26]. - The entire system can be set up on a single GPU machine, requiring only about 4 hours of training time and costing around $100 [10][13]. - The project includes all stages of model development, from data preparation to fine-tuning and deployment [6][12]. Performance Metrics - A model trained for about 12 hours can surpass the core metrics of GPT-2, while a 24-hour training session can achieve performance comparable to GPT-3 Small [11][13]. - Specific performance metrics include scores on various benchmarks such as MMLU and GSM8K, indicating the model's capabilities in reasoning and code generation [11][27]. Development Philosophy - Karpathy emphasizes a philosophy of making LLM research accessible and reproducible, similar to his previous work with nanoGPT [12][22]. - The project is seen as a potential baseline for future research and experimentation within the open-source community [8][16]. Community Engagement - The article mentions a growing community around AI products, with over 15,000 members in the "AI Product Marketplace" group, highlighting the interest in AI applications [9].
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
3 6 Ke· 2025-10-14 02:25
Core Insights - Andrej Karpathy has released a new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5] - The project consists of around 8,000 lines of code and was quickly adopted by the community, gaining over 4,500 stars on GitHub within 12 hours [2][5] - nanochat provides a complete training and inference pipeline for large language models (LLMs), differing from Karpathy's previous project, nanoGPT, which only covered the pre-training phase [2][5] Project Details - Users can train their own LLM by running a script on a cloud GPU machine, achieving a functional model in about 4 hours [2][3] - The project includes features such as a new Rust-based tokenizer, a high-efficiency inference engine, and automatic generation of Markdown scorecards summarizing the training process [3][5] - Karpathy estimates that with a budget of $1,000 and 41.6 hours of training, users can achieve significant improvements in model coherence and performance on various tasks [4][5] Performance Metrics - Initial CORE scores for the model were recorded at 0.2219, with improvements noted during different training phases [7] - The model's performance on specific benchmarks includes scores such as 40+ on MMLU and 70+ on ARC-Easy after sufficient training [4][7] Community and Future Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, similar to nanoGPT, and encourages community collaboration for further improvements [5][8] - Despite its capabilities, Karpathy cautions that nanochat is not suitable for personalized applications without significant additional work and data preparation [9][10]
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
机器之心· 2025-10-14 02:06
Core Insights - The article discusses Andrej Karpathy's new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5]. - The project consists of around 8,000 lines of code and provides a complete training and inference process for a simplified version of ChatGPT [2][4]. - Users can set up a cloud GPU machine and run a script to interact with their own language model (LLM) via a web interface after about 4 hours of training [3][5]. Project Features - nanochat includes a new Rust implementation for training tokenizers and pre-trains a Transformer LLM on the FineWeb dataset, evaluating its performance across multiple metrics [4]. - The project allows for fine-tuning and evaluation of the model on various tasks, including world knowledge multiple-choice questions, mathematics, and coding [4][5]. - Karpathy aims to create a unified, readable, and easily modifiable codebase that can serve as a strong baseline for future developments in LLMs [5][6]. Performance Metrics - Initial training costs around $100, achieving a model that can engage in basic conversations and perform simple tasks [5]. - With a budget of $1,000 and extended training time, the model's coherence improves significantly, enabling it to tackle basic math and coding tasks [5]. - Performance metrics indicate that a model trained for 24 hours can achieve scores above 40 in MMLU and 70 in ARC-Easy, showcasing its capabilities [5][10]. Community and Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, encouraging community collaboration for iterative improvements [6]. - The project is positioned as a capstone for an upcoming LLM101 course, which is still under development [5]. Limitations and Considerations - Karpathy cautions that nanochat is not designed for personalized applications and should be viewed as a rudimentary model lacking advanced intelligence [12][13]. - To achieve effective personalization, significant steps involving data preparation, synthetic data generation, and fine-tuning with robust models are necessary [13].