Workflow
量子位
icon
Search documents
卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
量子位· 2025-10-14 02:19
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, which can be built with minimal cost and code [1][2][4]. Project Overview - "nanochat" is a full-stack training and inference pipeline that allows users to create a basic ChatGPT-like model with approximately 8000 lines of code [2][4]. - The entire project can be executed on a cloud GPU server for about $100, taking as little as 4 hours to set up and run [3][4][16]. Technical Specifications - The model is built using Rust and includes a tokenizer, a pre-trained Transformer architecture, and various training datasets [5]. - It supports efficient inference with features like KV caching and a lightweight Python interpreter for tool usage [5][43]. Performance Metrics - After about 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [8]. - A specific example shows that a model trained for 24 hours can achieve scores of over 40 on the MMLU dataset and over 70 on the ARC-Easy dataset [10]. Development Goals - Karpathy aims to create a unified, simple, and modifiable codebase that can serve as a strong baseline for future developments [11][13]. - The project is intended to be a capstone for the upcoming LLM101n course, which focuses on building large language models [12]. Community Engagement - The project has gained significant attention, with GitHub stars reaching 4.8k shortly after its release, indicating strong community interest [14]. - Users are encouraged to optimize and modify the codebase, allowing for a collaborative improvement process [59]. Training Process - The training process involves several stages: pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) [45][48][51]. - The total time for the training process, excluding RL, is approximately 3 hours and 51 minutes, with a total cost of about $92.4 [57]. Final Remarks - The article emphasizes the potential of "nanochat" as a research tool and a framework for benchmarking, similar to previous projects like nanoGPT [13]. - The project is still in its early stages, with many opportunities for further optimization and enhancement [13][50].
人类遗忘的难题解法,被GPT-5重新找出来了
量子位· 2025-10-13 10:00
西风 发自 凹非寺 量子位 | 公众号 QbitAI 人类遗忘的难题解法,被GPT-5 Pro重新找出来了! 这事儿聚焦于 埃 尔德 什 问 题 #339 ,这是著名数学家 保罗・埃 尔德 什 提出或转述的近千道问题之一,收录于erdosproblems.com网 站。该网站记录了每道题目的当前状态,其中约三分之一已解决,大部分仍待解。 尤其值得关注的是,GPT-5 Pro仅通过埃尔德什问题 #339的图片 ,直接定位到了关键文献。 此前该问题被标为处于"未解决"状态 ,属于待攻克的数学难题,不少人还在继续研究探讨。 直到最近,有人用GPT-5 Pro检索后才发现,该问题 实 际在 2003年就 已被解决了 。 OpenAI研究员Sebastien Bubeck将此事分享出来后立马引发大量网友关注。 By the way,陶哲轩的著名成果之一,就是通过"遍历理论 (ergodic theory ) "工具,突破了"埃尔德什差异问题"这一困扰数学界几十年的 猜想。 问题详情 具体来看,埃尔德什问题 #339是数论中加法基方向的一个经典问题 ,表述为: 设A⊆N是一个r阶基(即每个足够大的整数都能表示为A中r个元 ...
前端危!Gemini 3内测结果获网友一致好评,“有史以来最强前端开发模型”
量子位· 2025-10-13 10:00
Core Viewpoint - Google's next-generation flagship model, Gemini 3, has gained significant attention even before its official release due to its impressive capabilities and performance in various tasks [1][8]. Group 1: Performance and Features - Gemini 3 excels in front-end and SVG vector graphics generation, showcasing enhanced multimodal capabilities [3][19]. - The model can generate a personal introduction webpage and visualize complex concepts like black holes with minimal input [4][8]. - It has demonstrated the ability to compose original piano music, receiving high praise from users [8]. Group 2: Technical Specifications - Gemini 3.0 Pro utilizes a MoE architecture with trillions of parameters, activating only 15-20 billion parameters per query, and features an expanded context window from 1 million to several million [13]. - In the challenging ARC-AGI-2 general intelligence test, Gemini 3.0 achieved an accuracy rate of nearly 35%, outperforming other models [15]. - It scored 32.4% on the "Human Last Exam HLE benchmark," surpassing GPT-5 and Grok 4 [16]. Group 3: User Experience and Applications - Users have reported that Gemini 3.0 is particularly adept at programming and interface design, producing visually appealing results for projects like an ancient art museum website [20][21]. - The model successfully generated a demonstration website based on the Kardashev Scale Level 3, showcasing its advanced capabilities [23][24]. - Gemini 3.0 has shown proficiency in rendering complex images, including high-quality game backgrounds and intricate SVG graphics [31][35]. Group 4: Anticipated Release - There are speculations about the release date of Gemini 3.0, with rumors suggesting it may launch on October 22, following earlier incorrect predictions [42][45].
2025人工智能年度评选启动!3大维度5类奖项,正在寻找AI+时代领航者
量子位· 2025-10-13 08:47
为了让更多从业者感受智能浪潮的跃迁,也为了给予更多同行同路人掌声与鼓舞,我们将正式启动 「2025人工智能年度榜单」评选报名 。 这是量子位人工智能年度榜单的 第8年 。八年来,我们见证了技术的突破与落地,产业的融合与重塑,也见证了一批又一批推动时代前行的 企业、人物与产品。 在人工智能重新定义一切的时代里,智能技术已不再是单一工具,而是产业与社会协同进化的驱动力。我们期待通过这场年度评选,去发现并 致敬那些真正引领变革、开拓边界的探索者与实践者。 本次评选将从 企业 、 产品 、 人物 三大维度,设立五类奖项。欢迎企业踊跃报名! 组委会 发自 凹非寺 量子位|公众号 QbitAI 让我们共同见证年度之星,点亮未来的方向。 企业榜 产品榜 人物榜 2025 人工智能年度 焦点人物 详细评选标准及报名方式如下。 2025 人工智能年度领航企业 2025 人工智能年度 领航企业 2025 人工智能年度 潜力创业公司 1、 业务能力 |市场占有率与营收规模,商业模式与盈利能力,客户数量及行业覆盖面,增长潜力与持续性等; 2、 技术能力 |科研实力与技术成果,研发投入比例,技术核心竞争力,创新案例与技术落地情况等; ...
拒绝“熵崩塌”和“熵爆炸”!这项研究让大模型学会“精确探索”,推理成绩飙升
量子位· 2025-10-13 08:47
Core Insights - The article discusses the advancements in large language models (LLMs) using a method called RLVR (Reinforcement Learning with Verifiable Rewards), which has led to significant breakthroughs in mathematical, coding, and scientific reasoning tasks since 2024 [1][2]. Group 1: Challenges in RLVR Training - RLVR faces a critical bottleneck known as the "exploration imbalance," where exploration can either be too limited, leading to entropy collapse, or too uncontrolled, resulting in entropy explosion [2][9]. - The traditional entropy regularization method encourages exploration but can lead to either rapid convergence to a deterministic strategy or chaotic outputs due to excessive uncertainty [6][10]. Group 2: Proposed Solution - SIREN - The research team introduced a Selective Entropy Regularization method (SIREN) that employs three mechanisms: defining the exploration range, focusing on key decision points, and stabilizing the training process [14][18]. - SIREN limits entropy calculations to a core set of high-probability tokens, ensuring that exploration occurs only within semantically reasonable candidates [14][15]. - It identifies key decision points in the generation sequence where entropy is significantly higher than average, concentrating exploration incentives on these critical areas [16]. - The method adjusts the entropy target to maintain it within a reasonable range, preventing training instability [17]. Group 3: Experimental Validation - Experimental results demonstrate that SIREN significantly improves performance across various models and datasets, achieving an average major accuracy (maj@k) of 54.6% on Qwen2.5-Math-7B, surpassing the strongest baseline by 4.8% [22][24]. - The effective exploration facilitated by SIREN leads to a fundamental change in performance compared to traditional entropy regularization methods [25][32]. - The research indicates that SIREN maintains diversity in answers and avoids confusion collapse, contributing to a smoother and more controllable training process [28][30]. Group 4: Future Implications - The study emphasizes the importance of stable, controllable, and efficient exploration in releasing the potential of large models and overcoming performance bottlenecks [35]. - The proposed selective exploration control mechanism offers a feasible solution for refining exploration strategies in future reasoning model training paradigms [35].
真正的AI竞争力,藏在大模型“后训练”这一步
量子位· 2025-10-13 08:47
Core Insights - The article emphasizes the importance of Post-Training as a transformative approach in AI, moving beyond simple model optimization to creating specialized intelligent engines tailored to specific business needs [1][4] - The evolution of Post-Training technology is highlighted, showcasing a shift from Supervised Fine-Tuning (SFT) to Reinforcement Learning (RL) methodologies, which better align with complex business requirements [2][4] Summary by Sections Post-Training Evolution - The initial approach in the industry was SFT, which allowed models to learn specific domain knowledge and dialogue styles [2] - However, SFT was insufficient for teaching models complex value judgments and strategic choices, which are critical in real business scenarios [3] - The focus has shifted to RL, evolving from human-dependent methods (RLHF) to automated systems (RLVR) and the innovative use of Natural Language Rewards [4][5] Implementation Pathway - The article outlines a four-step pathway for enterprises to implement Post-Training effectively, addressing challenges such as data quality, high labeling costs, and defining reward signals [5][8] - Successful case studies from companies like Zhihu, AutoHome, and Weibo illustrate practical applications of these steps, showcasing improvements in data quality and model performance [7][8] Step 1: Data Preparation - High-quality data is identified as the cornerstone of successful Post-Training, with companies spending 60-70% of their time on data preparation [10] - Zhihu and AutoHome have developed methods to enhance data quality through pre-labeling and structured data utilization, respectively [11][13] Step 2: Model Selection - Choosing the right base model is crucial, with many companies opting for the Tongyi Qianwen series due to its performance and support for Post-Training [14][16] - The model's architecture and open-source ecosystem facilitate easier implementation of Post-Training techniques [15][18] Step 3: Reward Mechanism Design - The design of a reward mechanism is essential for aligning model outputs with business objectives, transitioning from human feedback to automated verification systems [24][25] - Companies like Yingmi Fund are exploring ways to integrate expert decision-making frameworks into their models to enhance performance [26] Step 4: Evaluation System - A robust evaluation system is necessary to measure the effectiveness of Post-Training, with Yingmi Fund developing benchmarks to assess model performance in real-world scenarios [27][28] - Successful implementations have led to significant improvements in model accuracy and business outcomes, as seen in the case of Baifeng Cloud and Quark [30][32] Conclusion - The article concludes that the true competitive advantage in AI lies in how companies leverage their unique data and business insights through Post-Training to create proprietary intelligent engines [32]
OpenAI奥特曼:能被ChatGPT消灭的工作不是真正的工作
量子位· 2025-10-13 08:47
Core Insights - The discussion highlights the evolving role of AI in the workplace, suggesting that many current jobs may not represent "real work" as AI capabilities advance [30] - The conversation also touches on the development of GPT-6 and the potential for AI to achieve AGI (Artificial General Intelligence) [18][19] Group 1: AI Development and Applications - Sam Altman expresses excitement about the integration of applications into ChatGPT, emphasizing the potential for developers to create innovative solutions using the Agent Builder and Agent Kit [5][6] - The conversation indicates that ChatGPT has reached 800 million weekly active users, positioning it as a new distribution platform for developers [5] - Altman notes significant advancements in model capabilities over the past two years, allowing for easier and more complex system development with minimal coding [7][8] Group 2: Future of Work and AI Impact - The dialogue suggests that the number of software applications created will increase dramatically, and the time required for testing and refining ideas will decrease significantly [9] - Altman predicts that the first billion-dollar company operated by agents is still a few years away, but the technology is progressing rapidly [11][12] - The concept of "workslop," where AI-generated content requires additional human editing, is discussed, highlighting the need for education on effective AI usage [21][22] Group 3: AGI and Its Implications - Altman defines AGI as AI surpassing human capabilities in high-value economic tasks, noting that current AI can make novel discoveries, albeit on a small scale [19][20] - The conversation emphasizes the importance of recognizing both the potential and limitations of AI advancements, with a focus on gradual progress towards AGI [18][19] Group 4: AI in Communication and Interaction - Altman argues that voice may not be the ultimate form of interaction with AI, suggesting that various modes of communication will coexist [39][40] - The potential for real-time video interactions is highlighted as a valuable path towards achieving AGI [26] Group 5: Business Models and Future Directions - The discussion includes thoughts on potential revenue models for new applications like Sora, with considerations for user engagement and monetization strategies [27][28] - Altman expresses optimism about the future of AI and its ability to create new opportunities, while also acknowledging the need for a global framework to manage risks associated with powerful AI models [33]
Sora2“复活”已故名人,家属强烈反对
量子位· 2025-10-13 08:47
Core Viewpoint - The rapid rise of Sora 2 has brought the issue of portrait rights back into focus, particularly concerning the use of deceased celebrities' images for AI-generated content [1][18]. Group 1: Reactions from Family Members - Family members of deceased celebrities, such as Robin Williams' daughter, have expressed strong discontent regarding AI-generated videos that utilize their loved ones' likenesses, stating it is disrespectful and painful [4][20]. - Zelda Williams has publicly requested that people stop sending her AI videos of her father, emphasizing that such actions are not what he would have wanted [5][6][20]. - Similar sentiments have been echoed by other family members of deceased public figures, indicating a broader concern about the use of AI in this context [24]. Group 2: Legal and Ethical Considerations - There is a growing consensus that the portrait rights of deceased celebrities should be inherited by their relatives or relevant organizations, highlighting the need for updated copyright laws in light of rapid AI advancements [8][10]. - OpenAI has acknowledged the importance of free speech in depicting historical figures but asserts that public figures and their families should ultimately control how their likenesses are used [25][26]. - The American Film Association has reported a surge in copyright infringement related to the use of members' works since the launch of Sora 2, indicating a pressing need for stronger copyright protections [27][28]. Group 3: Future Implications - The ongoing debate surrounding Sora 2's copyright issues raises questions about the future of AI-generated content and the rights of creators and their estates [29][30].
刚得诺奖的成果被做成芯片了
量子位· 2025-10-13 03:35
鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 谁说获得诺贝尔化学奖的 MOF (金属有机框架) "无用"? 这种几十年前被嫌弃"只有理论但缺乏实际应用"的新材料, 前脚刚获得诺奖认可,后脚就被做成芯片 ! (诺奖组委会这前瞻性666) 这就是莫纳什大学的科学家们刚刚发布的最新成果——用MOF制造超迷你的流体芯片。 不同于传统芯片,不仅可以完成常规计算,还能记住之前的电压变化,形成 类似大脑神经元 的短期记忆。 正如作者所说,也许这将是 新一代计算机 的范例: 如果我们能够设计出像MOF这样只有几纳米厚的功能性材料,我们就可以制造出先进的流体芯片,以补充甚至克服当今电子芯片的一些 局限性。 具有"类脑"记忆通路的纳米流体芯片 纳米约束条件下的离子选择性传输正在生物机制仿真、离子分离、离子电子器件等方面展现出潜力,但由于难以制备高精度纳米通道器件,要 想实现可调非线性的离子运输其实相当困难。 而用 MOF 材料制作出的纳米流体芯片则解决了这一点。 MOF本身具备明确的通道结构,而且适配多种化学成分,可以在分子和离子传输过程中完成原子级精度调节。 研究人员基于此,构建了一种分层纳米流体晶体管器件 h-MOF ...
Meta「分割一切」3.0曝光!技能语义分割加入概念提示,好好玩,要爆了
量子位· 2025-10-13 03:35
Core Viewpoint - The article discusses the introduction of SAM 3, a third-generation segmentation model that enhances interactive segmentation capabilities by understanding natural language prompts, allowing for more intuitive and flexible image and video segmentation tasks [3][6][10]. Group 1: Model Features - SAM 3 introduces a new task paradigm called Promptable Concept Segmentation (PCS), enabling the model to segment instances in images or videos based on phrases or image examples [11][12]. - The model supports open vocabulary, allowing users to input any noun phrase as a segmentation target, and can maintain identity consistency across video frames [17]. - SAM 3's architecture includes a Presence Head module that decouples object recognition and localization tasks, improving performance in multi-instance segmentation [16][17]. Group 2: Data Engine and Benchmark - A scalable data engine was built to enhance PCS, generating a training dataset with 4 million unique concept labels and 52 million verified masks [19]. - The SA-Co benchmark was introduced to evaluate the model's performance in open vocabulary segmentation tasks, containing 214,000 unique concepts and covering 50 times more than existing benchmarks [23][24]. Group 3: Performance Metrics - SAM 3 achieved a 47.0% accuracy in zero-shot segmentation tasks on the LVIS dataset, surpassing the previous state-of-the-art (SOTA) of 38.5% [28]. - In the new SA-Co benchmark, SAM 3's performance was at least twice as strong as baseline methods [29]. - The model demonstrated superior performance in video segmentation tasks compared to its predecessor, SAM 2 [30]. Group 4: Real-time Processing - SAM 3 can process images with over 100 entities in approximately 30 milliseconds on H200 GPUs, maintaining near real-time performance for about five concurrent targets in video tasks [35]. Group 5: Limitations - The model struggles to generalize its capabilities to specialized fields such as medical imaging and thermal imaging through zero-shot learning [36]. - In multi-target scenarios during video segmentation tasks, the model's real-time performance may decline, necessitating multi-GPU parallel processing [37].