Workflow
nanoGPT
icon
Search documents
没博士没论文,这些人靠什么「野路子」杀进OpenAI等顶级AI大厂?
机器之心· 2026-01-25 04:01
最近,OpenAI 资深研究科学家 Noam Brown 在 X 上分享了几个真实故事,证明了通过个人努力和巧妙策略,即使没有传统学术履历,也能获得机会。 编辑|杨文 许多人梦想进入像 OpenAI 这样的前沿实验室从事研究工作,然而对于那些缺乏传统学术背景,比如没有发表过论文或知名导师推荐的人来说,这条路似乎格外艰 难。 Keller Jordan:从改进他人论文开始 Keller Jordan 从加州大学圣地亚哥分校毕业时,简历上没有任何论文发表记录。当时他在一家做 AI 内容审核的初创公司工作。 按照常规路径,想进入 OpenAI 这样的顶尖实验室,至少需要名校博士学位,外加几篇顶会论文,最好还有业内知名学者的推荐,而 Keller 什么都没有。 但他做了一件关键的事,主动联系了当时在谷歌工作的研究员 Behnam Neyshabur,向对方展示了一个改进其最新论文的想法。这次「冷接触」获得了积极回应。 Behnam 同意指导他,最终合作完成了一篇 ICLR 论文。 Noam Brown 在帖子中强调,如今 AI 研究越来越封闭,公开项目越来越少,但「改进他人已发表的工作」仍是展示个人能力的绝佳方式。这 ...
全球首个太空AI诞生,H100在轨炼出,马斯克爆赞
3 6 Ke· 2025-12-11 03:46
Core Insights - The first AI model trained in space using NVIDIA's H100 GPU has been successfully developed, marking a significant milestone in technology [1][3][9] - Google's Gemma model has also successfully operated in space, sending its first greeting message to Earth [1][11] Group 1: Space AI Development - The Starcloud-1 satellite, equipped with an H100 GPU, achieved a computational power 100 times stronger than any previous GPU sent to space [9] - The AI model trained in space is based on Karpathy's nanoGPT and utilizes Shakespearean texts for its training, allowing it to converse in a Renaissance language style [12][4] - The satellite has demonstrated real-time intelligence analysis capabilities, such as identifying wildfire signals and providing situational updates [16] Group 2: Industry Implications - Starcloud aims to establish space as a viable location for data centers, addressing the increasing pressure on Earth's data infrastructure [17][19] - The company plans to leverage solar energy to significantly reduce operational costs, projecting costs to be one-tenth of terrestrial data centers [20] - Starcloud's long-term vision includes creating a 5GW orbital data center with extensive solar panels and cooling systems [20][22] Group 3: Competitive Landscape - The space computing race is intensifying, with major players like Google, SpaceX, and Blue Origin entering the field [25][26] - Google's Project Suncatcher aims to deploy solar-powered GPU satellites, with plans for early testing by 2027 [26] - Musk's Starlink V3 satellites are expected to form a backbone for orbital computing infrastructure, potentially exceeding the average U.S. electricity consumption within two years [30]
4小时喜提专属 ChatGPT、卡帕西又整活,自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
3 6 Ke· 2025-10-14 12:52
Core Insights - Andrej Karpathy, former AI director at Tesla and co-founder of OpenAI, has released a new open-source project called nanochat, which has gained 7.9k stars on GitHub [1] - Nanochat is a minimalistic end-to-end training and inference toolchain designed to replicate a simplified version of ChatGPT, differing from Karpathy's previous project, nanoGPT [1][6] Project Overview - Nanochat allows users to train a conversational language model for approximately $100, achieving performance that surpasses GPT-2's CORE metric after about 12 hours of training [2][3] - The project can be initiated by launching a cloud GPU server and running a script, enabling users to interact with their trained model via a web interface [2] Technical Specifications - The project consists of around 8000 lines of code, primarily handwritten by Karpathy, emphasizing a clear code structure [7] - The architecture of nanochat is similar to the Llama model but is designed to be simpler, incorporating elements from modded-nanoGPT [7][8] - Key features include dense transformers, rotary embeddings, and a unique optimizer combining Muon and AdamW [8][9] Performance Metrics - Performance metrics for various training stages are provided, showing improvements in CORE, ARC-Challenge, ARC-Easy, GSM8K, HumanEval, and MMLU scores [5] Community Impact - The release of nanochat has generated significant interest on social media, with users expressing excitement about its potential to democratize access to language model training [10] - The project is expected to serve as a valuable resource for researchers and machine learning enthusiasts, enabling them to experiment with language models more easily [10]
AI大神卡帕西开源项目爆火,仅用4小时、8000行代码克隆ChatGPT
3 6 Ke· 2025-10-14 09:28
Core Insights - Andrej Karpathy launched a new open-source project called "nanochat," which he describes as one of his most unrestrained projects, providing a simplified full-stack training and inference process for creating a ChatGPT-like model from scratch [2][5]. Summary by Sections Project Overview - Nanochat is a minimalistic, full-stack project that allows users to create a chatbot by renting a cloud GPU server and running a single script, enabling interaction with a trained large language model (LLM) within approximately four hours [2][10]. Key Components of Nanochat 1. **Data Preparation**: Involves creating a tokenizer from raw web text to convert vast amounts of text into numerical data [5]. 2. **Model Pre-training**: A foundational Transformer model is trained on large datasets to learn language syntax, facts, and basic reasoning, which is the most time-consuming and critical step [5]. 3. **Alignment Fine-tuning**: - **Instruction Fine-tuning**: Uses high-quality Q&A and dialogue data to teach the model to follow instructions and converse like an assistant [6]. - **Reinforcement Learning**: An optional stage to enhance model performance on specific tasks through rewards and penalties [6]. 4. **Model Inference**: Provides an efficient engine for real-time interaction with the trained model via command line or a web interface [6]. 5. **Evaluation**: Automatically generates a detailed report showcasing the model's performance across various standard tests [6]. Educational and Research Significance - Nanochat serves as an educational tool, allowing developers and researchers to build their own small chat models at a low cost, experiencing the entire process from raw text to intelligent dialogue assistant [7]. - It provides a lightweight, controllable, and reproducible experimental platform for researchers to test new model architectures and training methods without needing expensive computational resources [7]. Cost and Efficiency - The total cost to train a small ChatGPT clone using nanochat is approximately $100, with a training duration of about four hours on an 8XH100 node [10]. - Training for around 12 hours can surpass GPT-2 on the CORE metric, and with a budget of about $1000, the model can become more coherent and capable of solving simple math and programming problems [14]. Technical Insights - The architecture of nanochat is similar to the Meta Llama model but simplified, aiming to establish a robust baseline for models of this scale [15]. - Key features include the use of a Muon + AdamW optimizer and various design choices that enhance model performance [16][20].
100美元、仅8000行代码,复现ChatGPT,Karpathy:这是我写过的最疯狂的项目
Founder Park· 2025-10-14 04:18
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to build a ChatGPT-like model with minimal resources [3][10]. - The project aims to democratize access to large language model (LLM) research, enabling anyone to train their own models easily [12][22]. Project Overview - "nanochat" is described as a complete training framework for creating a ChatGPT-like model from scratch, consisting of approximately 8000 lines of clean code [6][26]. - The entire system can be set up on a single GPU machine, requiring only about 4 hours of training time and costing around $100 [10][13]. - The project includes all stages of model development, from data preparation to fine-tuning and deployment [6][12]. Performance Metrics - A model trained for about 12 hours can surpass the core metrics of GPT-2, while a 24-hour training session can achieve performance comparable to GPT-3 Small [11][13]. - Specific performance metrics include scores on various benchmarks such as MMLU and GSM8K, indicating the model's capabilities in reasoning and code generation [11][27]. Development Philosophy - Karpathy emphasizes a philosophy of making LLM research accessible and reproducible, similar to his previous work with nanoGPT [12][22]. - The project is seen as a potential baseline for future research and experimentation within the open-source community [8][16]. Community Engagement - The article mentions a growing community around AI products, with over 15,000 members in the "AI Product Marketplace" group, highlighting the interest in AI applications [9].
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
3 6 Ke· 2025-10-14 02:25
Core Insights - Andrej Karpathy has released a new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5] - The project consists of around 8,000 lines of code and was quickly adopted by the community, gaining over 4,500 stars on GitHub within 12 hours [2][5] - nanochat provides a complete training and inference pipeline for large language models (LLMs), differing from Karpathy's previous project, nanoGPT, which only covered the pre-training phase [2][5] Project Details - Users can train their own LLM by running a script on a cloud GPU machine, achieving a functional model in about 4 hours [2][3] - The project includes features such as a new Rust-based tokenizer, a high-efficiency inference engine, and automatic generation of Markdown scorecards summarizing the training process [3][5] - Karpathy estimates that with a budget of $1,000 and 41.6 hours of training, users can achieve significant improvements in model coherence and performance on various tasks [4][5] Performance Metrics - Initial CORE scores for the model were recorded at 0.2219, with improvements noted during different training phases [7] - The model's performance on specific benchmarks includes scores such as 40+ on MMLU and 70+ on ARC-Easy after sufficient training [4][7] Community and Future Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, similar to nanoGPT, and encourages community collaboration for further improvements [5][8] - Despite its capabilities, Karpathy cautions that nanochat is not suitable for personalized applications without significant additional work and data preparation [9][10]
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
机器之心· 2025-10-14 02:06
Core Insights - The article discusses Andrej Karpathy's new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5]. - The project consists of around 8,000 lines of code and provides a complete training and inference process for a simplified version of ChatGPT [2][4]. - Users can set up a cloud GPU machine and run a script to interact with their own language model (LLM) via a web interface after about 4 hours of training [3][5]. Project Features - nanochat includes a new Rust implementation for training tokenizers and pre-trains a Transformer LLM on the FineWeb dataset, evaluating its performance across multiple metrics [4]. - The project allows for fine-tuning and evaluation of the model on various tasks, including world knowledge multiple-choice questions, mathematics, and coding [4][5]. - Karpathy aims to create a unified, readable, and easily modifiable codebase that can serve as a strong baseline for future developments in LLMs [5][6]. Performance Metrics - Initial training costs around $100, achieving a model that can engage in basic conversations and perform simple tasks [5]. - With a budget of $1,000 and extended training time, the model's coherence improves significantly, enabling it to tackle basic math and coding tasks [5]. - Performance metrics indicate that a model trained for 24 hours can achieve scores above 40 in MMLU and 70 in ARC-Easy, showcasing its capabilities [5][10]. Community and Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, encouraging community collaboration for iterative improvements [6]. - The project is positioned as a capstone for an upcoming LLM101 course, which is still under development [5]. Limitations and Considerations - Karpathy cautions that nanochat is not designed for personalized applications and should be viewed as a rudimentary model lacking advanced intelligence [12][13]. - To achieve effective personalization, significant steps involving data preparation, synthetic data generation, and fine-tuning with robust models are necessary [13].