Workflow
nanochat
icon
Search documents
X @Nick Szabo
Nick Szabo· 2025-11-06 05:37
RT TuringPost (@TheTuringPost).@karpathy's nanochat is bigger that you thinkHe calls it a ramp, but it's actually a lab of its own – a miniature system where anyone can experimentAnd most importantly – it’s deeply connected to education, allowing us to understand machine intelligence through a tiny model:1. What is nanochat and how you can use it?It's a miniature LM that costs anything from $100 (~4 hours on an 8XH100 node) to train and behaves like a small, curious creature.Karpathy described it as a “kind ...
大佬开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
自动驾驶之心· 2025-10-22 00:03
Core Insights - The article discusses the current state and future of AI, particularly focusing on the limitations of reinforcement learning and the timeline for achieving Artificial General Intelligence (AGI) [5][6][10]. Group 1: AGI and AI Development - AGI is expected to take about ten years to develop, contrary to the belief that this year would be the year of agents [12][13]. - Current AI agents, such as Claude and Codex, are impressive but still lack essential capabilities, including multi-modal abilities and continuous learning [13][14]. - The industry has been overly optimistic about the pace of AI development, leading to inflated expectations [12][15]. Group 2: Limitations of Reinforcement Learning - Reinforcement learning is criticized as being inadequate for replicating human learning processes, as it often relies on trial and error without a deep understanding of the problem [50][51]. - The approach of reinforcement learning can lead to noise in the learning process, as it weights every action based on the final outcome rather than the quality of the steps taken [51][52]. - Human learning involves a more complex reflection on successes and failures, which current AI models do not replicate [52][53]. Group 3: Future of AI and Learning Mechanisms - The future of AI may involve more sophisticated attention mechanisms and learning algorithms that better mimic human cognitive processes [33][32]. - There is a need for AI models to develop mechanisms for long-term memory and knowledge retention, which are currently lacking [31][32]. - The integration of AI into programming and development processes is seen as a continuous evolution rather than a sudden leap to superintelligence [45][47].
Karpathy泼冷水:AGI要等10年,根本没有「智能体元年」
3 6 Ke· 2025-10-21 02:15
Core Insights - Andrej Karpathy discusses the future of AGI and AI over the next decade, emphasizing that current "agents" are still in their early stages and require significant development [1][3][4] - He predicts that the core architecture of AI will likely remain similar to Transformer models, albeit with some evolution [8][10] Group 1: Current State of AI - Karpathy expresses skepticism about the notion of an "agent era," suggesting it should be termed "the decade of agents" as they still need about ten years of research to become truly functional [4][5] - He identifies key issues with current agents, including lack of intelligence, weak multimodal capabilities, and inability to operate computers autonomously [4][5] - The cognitive limitations of these agents stem from their inability to learn continuously, which Karpathy believes will take approximately ten years to address [5][6] Group 2: AI Architecture and Learning - Karpathy predicts that the fundamental architecture of AI will still be based on Transformer models in the next decade, although it may evolve [8][10] - He emphasizes the importance of algorithm, data, hardware, and software system advancements, stating that all are equally crucial for progress [12] - The best way to learn about AI, according to Karpathy, is through hands-on experience in building systems rather than theoretical approaches [12] Group 3: Limitations of Current Models - Karpathy critiques current large models for their fundamental cognitive limitations, noting that they often require manual coding rather than relying solely on AI assistance [13][18] - He categorizes coding approaches into three types: fully manual, manual with auto-completion, and fully AI-driven, with the latter being less effective for complex tasks [15][18] - The industry is moving too quickly, sometimes producing subpar results while pretending to achieve significant advancements [19] Group 4: Reinforcement Learning Challenges - Karpathy acknowledges that while reinforcement learning is not perfect, it remains the best solution compared to previous methods [22] - He highlights the challenges of reinforcement learning, including the complexity of problem-solving and the unreliability of evaluation models [23][24] - Future improvements may require higher-level "meta-learning" or synthetic data mechanisms, but no successful large-scale implementations exist yet [26] Group 5: Human vs. Machine Learning - Karpathy contrasts human learning, which involves reflection and integration of knowledge, with the current models that lack such processes [28][30] - He argues that true intelligence lies in understanding and generalization rather than mere memory retention [30] - The future of AI should focus on reducing mechanical memory and enhancing cognitive processes similar to human learning [30] Group 6: AI's Role in Society - Karpathy views AI as an extension of computation and believes that AGI will be capable of performing any economically valuable task [31] - He emphasizes the importance of AI complementing human work rather than replacing it, suggesting a collaborative approach [34][36] - The emergence of superintelligence is seen as a natural extension of societal automation, leading to a world where understanding and control may diminish [37][38]
Andrej Karpathy 开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
机器之心· 2025-10-18 05:44
Core Viewpoint - AI is projected to contribute an annual GDP increase of 2%, but the current state of the industry is criticized for being overly optimistic and disconnected from reality [2][5]. Group 1: AGI and Learning - AGI is expected to take about ten years to develop, as current AI agents lack the necessary cognitive abilities and continuous learning capabilities [9][11]. - Current AI models, particularly large language models (LLMs), exhibit cognitive deficiencies that hinder their performance [34][36]. - The concept of reinforcement learning is deemed inadequate for replicating human learning processes, as it oversimplifies the complexity of human decision-making [44][46]. Group 2: AI Development and Challenges - The industry is experiencing a phase of rapid development, but there is skepticism about the actual capabilities of AI models, which are often overhyped [5][41]. - Current AI agents struggle with understanding and integrating unique coding implementations, leading to inefficiencies and misunderstandings in code generation [36][41]. - The reliance on pre-trained models and the limitations of current AI tools highlight the need for further advancements in AI technology [20][42]. Group 3: Future of AI - The future of AI is expected to involve more sophisticated attention mechanisms and potentially a shift towards more efficient learning algorithms [29][30]. - There is a belief that while AI will continue to evolve, it will still rely on foundational principles such as gradient descent for training large neural networks [29][30]. - The ongoing improvements in AI tools and models suggest a continuous integration of new techniques and methodologies to enhance performance [42][43].
卡帕西 8000 行代码手搓 ChatGPT,成本仅100美元,训练 12 小时 CORE 表现超越GPT-2
程序员的那些事· 2025-10-15 00:44
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, which can be built with minimal resources and code [1][2][4]. - The project aims to provide an accessible framework for training language models, emphasizing ease of use and modification [11][13]. Project Overview - "Nanochat" is a full-stack training and inference pipeline that allows users to create a basic ChatGPT-like model with approximately 8000 lines of code [2][4]. - The total cost to train this model is around $100, using a cloud GPU server for about 4 hours [4][16]. - The model is built using Rust and includes a custom tokenizer, with training conducted on the FineWeb dataset [5][19]. Performance Metrics - After approximately 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [8]. - Specific performance metrics include: - CORE: 0.2219 - ARC-Easy: 0.3876 - GSM8K: 0.0758 - HumanEval: 0.0854 - MMLU: 0.3151 [7][56]. Training Process - The training process involves several stages: pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) [45][50]. - The pre-training phase utilizes a large dataset to teach the model about the world, while mid-training focuses on adapting the model for conversational tasks [28][45]. - The SFT phase further refines the model using high-quality dialogue data [48]. Community Engagement - The project has gained significant attention, with over 4.8k stars on GitHub shortly after its release, indicating strong community interest [14]. - The framework is designed to be easily modifiable, allowing users to experiment with different parameters and configurations [59]. Future Potential - Karpathy envisions "nanochat" evolving into a research tool or benchmark framework, similar to previous projects like nanoGPT [13]. - The project is still in its early stages, with potential for further optimization and enhancement [13][50].
腾讯研究院AI速递 20251015
腾讯研究院· 2025-10-14 16:01
Group 1: Nvidia's AI Supercomputer - Nvidia has launched the DGX Spark personal AI supercomputer priced at $3999, featuring the Grace Blackwell GB10 super chip, delivering 1 Petaflop AI computing performance and 128GB unified memory [1] - The device utilizes NVLink-C2C technology for seamless CPU-GPU connection, with a bandwidth five times that of PCIe 5, capable of running 200 billion parameter models locally, and two units can handle 400 billion parameter models [1] - It comes pre-installed with the complete NVIDIA AI software stack, including CUDA and TensorRT, available for purchase starting October 15 through Nvidia's website and global partners [1] Group 2: Karpathy's Open Source Project - AI expert Andrej Karpathy has released the open-source project nanochat, which implements a ChatGPT clone from scratch in 8000 lines of code, gaining nearly 5000 stars on GitHub within 12 hours [2] - The project encompasses all functionalities including tokenizer training, pre-training, fine-tuning, reinforcement learning, and inference engine, with a training cost of only $100 (8×H100 for 4 hours) to create a mini chat model [2] - Karpathy emphasizes that the project is more suitable for learning and research rather than personalized applications, as achieving personalization requires complex synthetic data generation and extensive pre-training data [2] Group 3: Microsoft's Text-to-Image Model - Microsoft AI has introduced its first fully self-developed text-to-image model, MAI-Image-1, which ranks 9th on the LMArena text-to-image leaderboard with a score of 1096 [3] - The model excels in generating hyper-realistic images, particularly in lighting effects and natural landscapes, with a focus on avoiding content repetition and homogenization [3] - MAI-Image-1 will be integrated into Microsoft's core products such as Copilot and Bing Image Creator, marking a significant step in building a multi-modal autonomous technology matrix in AI [3] Group 4: Tencent's Youtu-Embedding - Tencent's Youtu Lab has officially open-sourced the Youtu-Embedding model, capable of handling six mainstream tasks including text retrieval, intent understanding, and similarity judgment, addressing the "negative transfer" dilemma [4] - The model was trained from scratch using 3 trillion tokens of Chinese and English corpus, employing an innovative "collaborative-discriminative fine-tuning framework," achieving a top score of 77.46 on the CMTEB Chinese semantic evaluation benchmark [4] - It supports integration into mainstream frameworks like LangChain and LlamaIndex, lowering development barriers and is particularly suitable for building enterprise-level RAG (retrieval-augmented generation) systems [4] Group 5: AI Research on Communication Style - Research from Penn State University indicates that using a rude tone when questioning LLMs results in a higher accuracy rate of 84.8% for GPT-4o, compared to 80.8% when using a polite tone [5] - Researchers explain that direct expressions help AI grasp core tasks more accurately, while polite expressions may introduce unnecessary distractions [5] Group 6: QQ Browser AI Upgrade - QQ Browser has introduced the "Serious AI" feature in version 19.7.5, leveraging Tencent News' 10 years of verification experience and a database of millions of debunked claims to quickly assess information credibility [7] - The "AI Video Assistant" feature supports intelligent summarization, recognition and translation in 16 languages, and one-click export of subtitled videos, addressing challenges in understanding foreign language videos [7] - Both features are now available in the QQ Browser Agent Center for free, targeting the pain points of information verification and efficient video content retrieval [7] Group 7: SpaceX Starship Test - SpaceX has completed the eleventh integrated flight test of the Starship, utilizing a second-hand booster B15.2 and S38 spacecraft, which serves as the final flight for the second-generation Starship, collecting landing burn configuration and propulsion data for the third generation [8] - The booster validated the configuration switch for 13 engine initial ignitions, 5 engine steering, and 3 engine hovering, while the spacecraft completed dynamic tilt maneuvers, in-space ignition, and thermal limit tests [8] - The third-generation Starship will exceed 124 meters in height, using third-generation Raptor engines with a single thrust of 280 tons and an effective payload capacity of 100 tons, with ground testing expected to commence by the end of 2025 [8] Group 8: Tencent's Qinyun Scholarship - Tencent has launched the "Qinyun Scholarship" aimed at top AI talents, targeting master's and doctoral students in cutting-edge AI research, with the first selection expected to award 15 outstanding students, each receiving up to 500,000 yuan [9] - The scholarship includes a cash reward of 200,000 yuan and 300,000 yuan in cloud heterogeneous computing resources, with winners also having the opportunity for internships or employment at Tencent [9] - This initiative focuses on students in computer science, artificial intelligence, and related fields, encouraging engagement in frontier research directions [9] Group 9: Cathie Wood's Predictions - Cathie Wood, founder of ARK Invest, predicts that the global real GDP growth rate will increase from 3% to over 7% in the next decade, with inflation rates potentially dropping to 0% or even negative [10] - She believes that the simultaneous maturation of five key technology platforms—AI, robotics, blockchain, energy storage, and multi-omics sequencing—will redefine productivity, with "technological convergence" accelerating the transition of each S-curve into an explosive growth phase [10] - Wood anticipates that truly disruptive innovation assets could achieve annualized returns of 40%-50% in capital markets over the next five years, with Bitcoin's official bull market forecast reaching $1.5 million per coin [10] Group 10: n8n's AI Opportunity - Jan Oberhauser, founder of n8n, reported a fourfold increase in company revenue within eight months, attributing this to a strategy shift from targeting potential customers to focusing on community building [12] - He views the AI wave as either a significant opportunity or a potential company-ending threat, with n8n enabling users to build AI-driven applications rather than merely adding AI features [12] - n8n employs a dual licensing model of "open source but non-commercial," emphasizing a bottom-up approach from the builder market, noting that no one has successfully won the entire race starting from the enterprise market [12]
4小时喜提专属 ChatGPT、卡帕西又整活,自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
3 6 Ke· 2025-10-14 12:52
Core Insights - Andrej Karpathy, former AI director at Tesla and co-founder of OpenAI, has released a new open-source project called nanochat, which has gained 7.9k stars on GitHub [1] - Nanochat is a minimalistic end-to-end training and inference toolchain designed to replicate a simplified version of ChatGPT, differing from Karpathy's previous project, nanoGPT [1][6] Project Overview - Nanochat allows users to train a conversational language model for approximately $100, achieving performance that surpasses GPT-2's CORE metric after about 12 hours of training [2][3] - The project can be initiated by launching a cloud GPU server and running a script, enabling users to interact with their trained model via a web interface [2] Technical Specifications - The project consists of around 8000 lines of code, primarily handwritten by Karpathy, emphasizing a clear code structure [7] - The architecture of nanochat is similar to the Llama model but is designed to be simpler, incorporating elements from modded-nanoGPT [7][8] - Key features include dense transformers, rotary embeddings, and a unique optimizer combining Muon and AdamW [8][9] Performance Metrics - Performance metrics for various training stages are provided, showing improvements in CORE, ARC-Challenge, ARC-Easy, GSM8K, HumanEval, and MMLU scores [5] Community Impact - The release of nanochat has generated significant interest on social media, with users expressing excitement about its potential to democratize access to language model training [10] - The project is expected to serve as a valuable resource for researchers and machine learning enthusiasts, enabling them to experiment with language models more easily [10]
4小时喜提专属 ChatGPT、卡帕西又整活!自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
AI前线· 2025-10-14 09:46
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to train a simplified version of ChatGPT with minimal resources [2][4][6] - Karpathy claims that with just $100 and approximately 4 hours of training on a cloud GPU server, users can create a conversational model that surpasses GPT-2 in performance [6][7] Project Overview - "nanochat" is a streamlined training and inference toolchain built from scratch, differing from Karpathy's previous project, "nanoGPT," which only included pre-training functionalities [2][5] - The entire codebase consists of around 8000 lines of code, emphasizing clarity and simplicity, making it suitable for modification and branch development [11][12] Technical Specifications - The project utilizes a new tokenizer implemented in Rust and pre-trains a Transformer-based language model on the FineWeb dataset [5] - Key features include instruction fine-tuning, reinforcement learning options, and an efficient inference engine with a user-friendly interface [6][9] Performance Metrics - After approximately 12 hours of training, the model's performance metrics exceed those of GPT-2, with specific scores on various benchmarks such as MMLU and GSM8K [7][8] - The CORE score for the model after different training stages is provided, showing improvements across various metrics [8] Community and Future Development - Karpathy envisions "nanochat" as a core project for an upcoming course and a potential research tool framework, inviting community contributions for further enhancements [9][14] - The project has generated significant interest on social media, with users expressing excitement about its potential for machine learning education and experimentation [14]
AI大神卡帕西开源项目爆火,仅用4小时、8000行代码克隆ChatGPT
3 6 Ke· 2025-10-14 09:28
Core Insights - Andrej Karpathy launched a new open-source project called "nanochat," which he describes as one of his most unrestrained projects, providing a simplified full-stack training and inference process for creating a ChatGPT-like model from scratch [2][5]. Summary by Sections Project Overview - Nanochat is a minimalistic, full-stack project that allows users to create a chatbot by renting a cloud GPU server and running a single script, enabling interaction with a trained large language model (LLM) within approximately four hours [2][10]. Key Components of Nanochat 1. **Data Preparation**: Involves creating a tokenizer from raw web text to convert vast amounts of text into numerical data [5]. 2. **Model Pre-training**: A foundational Transformer model is trained on large datasets to learn language syntax, facts, and basic reasoning, which is the most time-consuming and critical step [5]. 3. **Alignment Fine-tuning**: - **Instruction Fine-tuning**: Uses high-quality Q&A and dialogue data to teach the model to follow instructions and converse like an assistant [6]. - **Reinforcement Learning**: An optional stage to enhance model performance on specific tasks through rewards and penalties [6]. 4. **Model Inference**: Provides an efficient engine for real-time interaction with the trained model via command line or a web interface [6]. 5. **Evaluation**: Automatically generates a detailed report showcasing the model's performance across various standard tests [6]. Educational and Research Significance - Nanochat serves as an educational tool, allowing developers and researchers to build their own small chat models at a low cost, experiencing the entire process from raw text to intelligent dialogue assistant [7]. - It provides a lightweight, controllable, and reproducible experimental platform for researchers to test new model architectures and training methods without needing expensive computational resources [7]. Cost and Efficiency - The total cost to train a small ChatGPT clone using nanochat is approximately $100, with a training duration of about four hours on an 8XH100 node [10]. - Training for around 12 hours can surpass GPT-2 on the CORE metric, and with a budget of about $1000, the model can become more coherent and capable of solving simple math and programming problems [14]. Technical Insights - The architecture of nanochat is similar to the Meta Llama model but simplified, aiming to establish a robust baseline for models of this scale [15]. - Key features include the use of a Muon + AdamW optimizer and various design choices that enhance model performance [16][20].
100美元、仅8000行代码,复现ChatGPT,Karpathy:这是我写过的最疯狂的项目
Founder Park· 2025-10-14 04:18
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to build a ChatGPT-like model with minimal resources [3][10]. - The project aims to democratize access to large language model (LLM) research, enabling anyone to train their own models easily [12][22]. Project Overview - "nanochat" is described as a complete training framework for creating a ChatGPT-like model from scratch, consisting of approximately 8000 lines of clean code [6][26]. - The entire system can be set up on a single GPU machine, requiring only about 4 hours of training time and costing around $100 [10][13]. - The project includes all stages of model development, from data preparation to fine-tuning and deployment [6][12]. Performance Metrics - A model trained for about 12 hours can surpass the core metrics of GPT-2, while a 24-hour training session can achieve performance comparable to GPT-3 Small [11][13]. - Specific performance metrics include scores on various benchmarks such as MMLU and GSM8K, indicating the model's capabilities in reasoning and code generation [11][27]. Development Philosophy - Karpathy emphasizes a philosophy of making LLM research accessible and reproducible, similar to his previous work with nanoGPT [12][22]. - The project is seen as a potential baseline for future research and experimentation within the open-source community [8][16]. Community Engagement - The article mentions a growing community around AI products, with over 15,000 members in the "AI Product Marketplace" group, highlighting the interest in AI applications [9].