nanochat

Search documents
大佬开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
自动驾驶之心· 2025-10-22 00:03
编译 | 泽南、杨文 来源 | 机器之心 原文链接: Andrej Karpathy 开炮:智能体都在装样子 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 Andrej Karpathy(安德烈・卡帕斯)是人工智能领域里大家耳熟能详的学者,他 2016 年博士毕业于斯坦福大学(师从李飞飞),随后成为 OpenAI 的创始成员,后 又加入特斯拉任人工智能总监。在短暂重返 OpenAI 之后,他现在是 AI 教育公司 Eureka Labs 的创始人。 在与知名播客主持人 Dwarkesh Patel 的采访中,Andrej 针对目前 AI 领域人们最关心的一系列问题发表了意见,他解释了为什么强化学习很糟糕,为什么模型崩溃 会阻止 LLM 像人类一样学习,为什么 AGI 会融入约 2% GDP 增长,为什么自动驾驶需要这么长时间才能实现,以及他所看到的教育的未来。 该视频上架不到半天,已经有了超过 130 万播放量。 时间戳: AI 会给世界带来每年 2% 的 GDP ...
Karpathy泼冷水:AGI要等10年,根本没有「智能体元年」
3 6 Ke· 2025-10-21 02:15
在近日的一次访谈中,Andrej Karpathy深入探讨了AGI、智能体与AI未来十年的走向。他认为当前的「智能体」仍处早期阶段,强化学习虽不 完美,却是目前的最优解。他预测未来10年的AI架构仍然可能是类似Transformer的巨大神经网络。 在最新一期的《Dwarkesh Podcast》里,Andrej Karpathy,这位OpenAI的创始元老、前特斯拉AI团队负责人、深度学习的大佬,与主持人Dwarkesh Patel进 行了两个多小时的对话。 Karpathy畅谈了他对AGI时间线、智能体、大模型认知、强化学习等AI圈核心问题的看法。 Andrej Karpathy(左)与主持人Dwarkesh Patel(右) AGI还有十年之遥 谈话从当前火热的「智能体」切入。 对于业内普遍鼓吹的「智能体元年」Karpathy显得非常冷静。 他认为行业里存在一些过度预测。更准确地说,应该称为「智能体的十年」。 Karpathy所说的「智能体的十年」,是指这些智能体还需要经过大约十年的持续研究工作,才能让它们真正能用。 Karpathy列举了当前智能体的主要问题:智能不足、多模态能力弱、无法自主操作电脑等 ...
Andrej Karpathy 开炮:智能体都在装样子,强化学习很糟糕,AGI 十年也出不来
机器之心· 2025-10-18 05:44
Core Viewpoint - AI is projected to contribute an annual GDP increase of 2%, but the current state of the industry is criticized for being overly optimistic and disconnected from reality [2][5]. Group 1: AGI and Learning - AGI is expected to take about ten years to develop, as current AI agents lack the necessary cognitive abilities and continuous learning capabilities [9][11]. - Current AI models, particularly large language models (LLMs), exhibit cognitive deficiencies that hinder their performance [34][36]. - The concept of reinforcement learning is deemed inadequate for replicating human learning processes, as it oversimplifies the complexity of human decision-making [44][46]. Group 2: AI Development and Challenges - The industry is experiencing a phase of rapid development, but there is skepticism about the actual capabilities of AI models, which are often overhyped [5][41]. - Current AI agents struggle with understanding and integrating unique coding implementations, leading to inefficiencies and misunderstandings in code generation [36][41]. - The reliance on pre-trained models and the limitations of current AI tools highlight the need for further advancements in AI technology [20][42]. Group 3: Future of AI - The future of AI is expected to involve more sophisticated attention mechanisms and potentially a shift towards more efficient learning algorithms [29][30]. - There is a belief that while AI will continue to evolve, it will still rely on foundational principles such as gradient descent for training large neural networks [29][30]. - The ongoing improvements in AI tools and models suggest a continuous integration of new techniques and methodologies to enhance performance [42][43].
卡帕西 8000 行代码手搓 ChatGPT,成本仅100美元,训练 12 小时 CORE 表现超越GPT-2
程序员的那些事· 2025-10-15 00:44
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, which can be built with minimal resources and code [1][2][4]. - The project aims to provide an accessible framework for training language models, emphasizing ease of use and modification [11][13]. Project Overview - "Nanochat" is a full-stack training and inference pipeline that allows users to create a basic ChatGPT-like model with approximately 8000 lines of code [2][4]. - The total cost to train this model is around $100, using a cloud GPU server for about 4 hours [4][16]. - The model is built using Rust and includes a custom tokenizer, with training conducted on the FineWeb dataset [5][19]. Performance Metrics - After approximately 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [8]. - Specific performance metrics include: - CORE: 0.2219 - ARC-Easy: 0.3876 - GSM8K: 0.0758 - HumanEval: 0.0854 - MMLU: 0.3151 [7][56]. Training Process - The training process involves several stages: pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) [45][50]. - The pre-training phase utilizes a large dataset to teach the model about the world, while mid-training focuses on adapting the model for conversational tasks [28][45]. - The SFT phase further refines the model using high-quality dialogue data [48]. Community Engagement - The project has gained significant attention, with over 4.8k stars on GitHub shortly after its release, indicating strong community interest [14]. - The framework is designed to be easily modifiable, allowing users to experiment with different parameters and configurations [59]. Future Potential - Karpathy envisions "nanochat" evolving into a research tool or benchmark framework, similar to previous projects like nanoGPT [13]. - The project is still in its early stages, with potential for further optimization and enhancement [13][50].
腾讯研究院AI速递 20251015
腾讯研究院· 2025-10-14 16:01
Group 1: Nvidia's AI Supercomputer - Nvidia has launched the DGX Spark personal AI supercomputer priced at $3999, featuring the Grace Blackwell GB10 super chip, delivering 1 Petaflop AI computing performance and 128GB unified memory [1] - The device utilizes NVLink-C2C technology for seamless CPU-GPU connection, with a bandwidth five times that of PCIe 5, capable of running 200 billion parameter models locally, and two units can handle 400 billion parameter models [1] - It comes pre-installed with the complete NVIDIA AI software stack, including CUDA and TensorRT, available for purchase starting October 15 through Nvidia's website and global partners [1] Group 2: Karpathy's Open Source Project - AI expert Andrej Karpathy has released the open-source project nanochat, which implements a ChatGPT clone from scratch in 8000 lines of code, gaining nearly 5000 stars on GitHub within 12 hours [2] - The project encompasses all functionalities including tokenizer training, pre-training, fine-tuning, reinforcement learning, and inference engine, with a training cost of only $100 (8×H100 for 4 hours) to create a mini chat model [2] - Karpathy emphasizes that the project is more suitable for learning and research rather than personalized applications, as achieving personalization requires complex synthetic data generation and extensive pre-training data [2] Group 3: Microsoft's Text-to-Image Model - Microsoft AI has introduced its first fully self-developed text-to-image model, MAI-Image-1, which ranks 9th on the LMArena text-to-image leaderboard with a score of 1096 [3] - The model excels in generating hyper-realistic images, particularly in lighting effects and natural landscapes, with a focus on avoiding content repetition and homogenization [3] - MAI-Image-1 will be integrated into Microsoft's core products such as Copilot and Bing Image Creator, marking a significant step in building a multi-modal autonomous technology matrix in AI [3] Group 4: Tencent's Youtu-Embedding - Tencent's Youtu Lab has officially open-sourced the Youtu-Embedding model, capable of handling six mainstream tasks including text retrieval, intent understanding, and similarity judgment, addressing the "negative transfer" dilemma [4] - The model was trained from scratch using 3 trillion tokens of Chinese and English corpus, employing an innovative "collaborative-discriminative fine-tuning framework," achieving a top score of 77.46 on the CMTEB Chinese semantic evaluation benchmark [4] - It supports integration into mainstream frameworks like LangChain and LlamaIndex, lowering development barriers and is particularly suitable for building enterprise-level RAG (retrieval-augmented generation) systems [4] Group 5: AI Research on Communication Style - Research from Penn State University indicates that using a rude tone when questioning LLMs results in a higher accuracy rate of 84.8% for GPT-4o, compared to 80.8% when using a polite tone [5] - Researchers explain that direct expressions help AI grasp core tasks more accurately, while polite expressions may introduce unnecessary distractions [5] Group 6: QQ Browser AI Upgrade - QQ Browser has introduced the "Serious AI" feature in version 19.7.5, leveraging Tencent News' 10 years of verification experience and a database of millions of debunked claims to quickly assess information credibility [7] - The "AI Video Assistant" feature supports intelligent summarization, recognition and translation in 16 languages, and one-click export of subtitled videos, addressing challenges in understanding foreign language videos [7] - Both features are now available in the QQ Browser Agent Center for free, targeting the pain points of information verification and efficient video content retrieval [7] Group 7: SpaceX Starship Test - SpaceX has completed the eleventh integrated flight test of the Starship, utilizing a second-hand booster B15.2 and S38 spacecraft, which serves as the final flight for the second-generation Starship, collecting landing burn configuration and propulsion data for the third generation [8] - The booster validated the configuration switch for 13 engine initial ignitions, 5 engine steering, and 3 engine hovering, while the spacecraft completed dynamic tilt maneuvers, in-space ignition, and thermal limit tests [8] - The third-generation Starship will exceed 124 meters in height, using third-generation Raptor engines with a single thrust of 280 tons and an effective payload capacity of 100 tons, with ground testing expected to commence by the end of 2025 [8] Group 8: Tencent's Qinyun Scholarship - Tencent has launched the "Qinyun Scholarship" aimed at top AI talents, targeting master's and doctoral students in cutting-edge AI research, with the first selection expected to award 15 outstanding students, each receiving up to 500,000 yuan [9] - The scholarship includes a cash reward of 200,000 yuan and 300,000 yuan in cloud heterogeneous computing resources, with winners also having the opportunity for internships or employment at Tencent [9] - This initiative focuses on students in computer science, artificial intelligence, and related fields, encouraging engagement in frontier research directions [9] Group 9: Cathie Wood's Predictions - Cathie Wood, founder of ARK Invest, predicts that the global real GDP growth rate will increase from 3% to over 7% in the next decade, with inflation rates potentially dropping to 0% or even negative [10] - She believes that the simultaneous maturation of five key technology platforms—AI, robotics, blockchain, energy storage, and multi-omics sequencing—will redefine productivity, with "technological convergence" accelerating the transition of each S-curve into an explosive growth phase [10] - Wood anticipates that truly disruptive innovation assets could achieve annualized returns of 40%-50% in capital markets over the next five years, with Bitcoin's official bull market forecast reaching $1.5 million per coin [10] Group 10: n8n's AI Opportunity - Jan Oberhauser, founder of n8n, reported a fourfold increase in company revenue within eight months, attributing this to a strategy shift from targeting potential customers to focusing on community building [12] - He views the AI wave as either a significant opportunity or a potential company-ending threat, with n8n enabling users to build AI-driven applications rather than merely adding AI features [12] - n8n employs a dual licensing model of "open source but non-commercial," emphasizing a bottom-up approach from the builder market, noting that no one has successfully won the entire race starting from the enterprise market [12]
4小时喜提专属 ChatGPT、卡帕西又整活,自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
3 6 Ke· 2025-10-14 12:52
Core Insights - Andrej Karpathy, former AI director at Tesla and co-founder of OpenAI, has released a new open-source project called nanochat, which has gained 7.9k stars on GitHub [1] - Nanochat is a minimalistic end-to-end training and inference toolchain designed to replicate a simplified version of ChatGPT, differing from Karpathy's previous project, nanoGPT [1][6] Project Overview - Nanochat allows users to train a conversational language model for approximately $100, achieving performance that surpasses GPT-2's CORE metric after about 12 hours of training [2][3] - The project can be initiated by launching a cloud GPU server and running a script, enabling users to interact with their trained model via a web interface [2] Technical Specifications - The project consists of around 8000 lines of code, primarily handwritten by Karpathy, emphasizing a clear code structure [7] - The architecture of nanochat is similar to the Llama model but is designed to be simpler, incorporating elements from modded-nanoGPT [7][8] - Key features include dense transformers, rotary embeddings, and a unique optimizer combining Muon and AdamW [8][9] Performance Metrics - Performance metrics for various training stages are provided, showing improvements in CORE, ARC-Challenge, ARC-Easy, GSM8K, HumanEval, and MMLU scores [5] Community Impact - The release of nanochat has generated significant interest on social media, with users expressing excitement about its potential to democratize access to language model training [10] - The project is expected to serve as a valuable resource for researchers and machine learning enthusiasts, enabling them to experiment with language models more easily [10]
4小时喜提专属 ChatGPT、卡帕西又整活!自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
AI前线· 2025-10-14 09:46
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to train a simplified version of ChatGPT with minimal resources [2][4][6] - Karpathy claims that with just $100 and approximately 4 hours of training on a cloud GPU server, users can create a conversational model that surpasses GPT-2 in performance [6][7] Project Overview - "nanochat" is a streamlined training and inference toolchain built from scratch, differing from Karpathy's previous project, "nanoGPT," which only included pre-training functionalities [2][5] - The entire codebase consists of around 8000 lines of code, emphasizing clarity and simplicity, making it suitable for modification and branch development [11][12] Technical Specifications - The project utilizes a new tokenizer implemented in Rust and pre-trains a Transformer-based language model on the FineWeb dataset [5] - Key features include instruction fine-tuning, reinforcement learning options, and an efficient inference engine with a user-friendly interface [6][9] Performance Metrics - After approximately 12 hours of training, the model's performance metrics exceed those of GPT-2, with specific scores on various benchmarks such as MMLU and GSM8K [7][8] - The CORE score for the model after different training stages is provided, showing improvements across various metrics [8] Community and Future Development - Karpathy envisions "nanochat" as a core project for an upcoming course and a potential research tool framework, inviting community contributions for further enhancements [9][14] - The project has generated significant interest on social media, with users expressing excitement about its potential for machine learning education and experimentation [14]
AI大神卡帕西开源项目爆火,仅用4小时、8000行代码克隆ChatGPT
3 6 Ke· 2025-10-14 09:28
Core Insights - Andrej Karpathy launched a new open-source project called "nanochat," which he describes as one of his most unrestrained projects, providing a simplified full-stack training and inference process for creating a ChatGPT-like model from scratch [2][5]. Summary by Sections Project Overview - Nanochat is a minimalistic, full-stack project that allows users to create a chatbot by renting a cloud GPU server and running a single script, enabling interaction with a trained large language model (LLM) within approximately four hours [2][10]. Key Components of Nanochat 1. **Data Preparation**: Involves creating a tokenizer from raw web text to convert vast amounts of text into numerical data [5]. 2. **Model Pre-training**: A foundational Transformer model is trained on large datasets to learn language syntax, facts, and basic reasoning, which is the most time-consuming and critical step [5]. 3. **Alignment Fine-tuning**: - **Instruction Fine-tuning**: Uses high-quality Q&A and dialogue data to teach the model to follow instructions and converse like an assistant [6]. - **Reinforcement Learning**: An optional stage to enhance model performance on specific tasks through rewards and penalties [6]. 4. **Model Inference**: Provides an efficient engine for real-time interaction with the trained model via command line or a web interface [6]. 5. **Evaluation**: Automatically generates a detailed report showcasing the model's performance across various standard tests [6]. Educational and Research Significance - Nanochat serves as an educational tool, allowing developers and researchers to build their own small chat models at a low cost, experiencing the entire process from raw text to intelligent dialogue assistant [7]. - It provides a lightweight, controllable, and reproducible experimental platform for researchers to test new model architectures and training methods without needing expensive computational resources [7]. Cost and Efficiency - The total cost to train a small ChatGPT clone using nanochat is approximately $100, with a training duration of about four hours on an 8XH100 node [10]. - Training for around 12 hours can surpass GPT-2 on the CORE metric, and with a budget of about $1000, the model can become more coherent and capable of solving simple math and programming problems [14]. Technical Insights - The architecture of nanochat is similar to the Meta Llama model but simplified, aiming to establish a robust baseline for models of this scale [15]. - Key features include the use of a Muon + AdamW optimizer and various design choices that enhance model performance [16][20].
100美元、仅8000行代码,复现ChatGPT,Karpathy:这是我写过的最疯狂的项目
Founder Park· 2025-10-14 04:18
Core Insights - The article discusses the launch of "nanochat," an open-source project by Andrej Karpathy, which allows users to build a ChatGPT-like model with minimal resources [3][10]. - The project aims to democratize access to large language model (LLM) research, enabling anyone to train their own models easily [12][22]. Project Overview - "nanochat" is described as a complete training framework for creating a ChatGPT-like model from scratch, consisting of approximately 8000 lines of clean code [6][26]. - The entire system can be set up on a single GPU machine, requiring only about 4 hours of training time and costing around $100 [10][13]. - The project includes all stages of model development, from data preparation to fine-tuning and deployment [6][12]. Performance Metrics - A model trained for about 12 hours can surpass the core metrics of GPT-2, while a 24-hour training session can achieve performance comparable to GPT-3 Small [11][13]. - Specific performance metrics include scores on various benchmarks such as MMLU and GSM8K, indicating the model's capabilities in reasoning and code generation [11][27]. Development Philosophy - Karpathy emphasizes a philosophy of making LLM research accessible and reproducible, similar to his previous work with nanoGPT [12][22]. - The project is seen as a potential baseline for future research and experimentation within the open-source community [8][16]. Community Engagement - The article mentions a growing community around AI products, with over 15,000 members in the "AI Product Marketplace" group, highlighting the interest in AI applications [9].
卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
3 6 Ke· 2025-10-14 03:40
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, a former AI director at Tesla and co-founder of OpenAI, aimed at educational purposes [1][57]. - The project allows users to build a basic conversational AI model with a cost of approximately $100 and a training time of about 4 hours on a cloud GPU server [1][10]. Project Overview - "nanochat" consists of around 8000 lines of code and is implemented in Rust, featuring a tokenizer, a pre-trained Transformer model, and various training datasets [2][3]. - The model can perform basic conversational tasks, generate stories and poems, and answer simple questions [2][4]. Performance Metrics - After approximately 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [4][52]. - The model's performance metrics include CORE scores, ARC-Easy, GSM8K, and HumanEval, with notable improvements observed during different training phases [3][52]. Training Phases - The training process includes pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) stages, each contributing to the model's capabilities [41][46]. - Mid-training focuses on adapting the model for multi-turn conversations and teaching it to handle multiple-choice questions [35][36]. Community Engagement - The project has gained significant attention on GitHub, with over 4.8k stars shortly after its release, indicating strong community interest and potential for further optimization [8][7]. - The codebase is designed to be user-friendly, allowing modifications and enhancements by the community [54][55]. Educational Impact - Karpathy aims to integrate this technology into a broader educational framework, potentially transforming how AI can assist in learning [62]. - The project is part of a larger initiative to create a symbiotic relationship between teachers and AI, enhancing the learning experience [62].