Workflow
机器之心
icon
Search documents
100美元、8000行代码手搓ChatGPT,Karpathy最新开源项目爆火,一夜近5k star
机器之心· 2025-10-14 02:06
Core Insights - The article discusses Andrej Karpathy's new open-source project called nanochat, which allows users to build a ChatGPT-like model from scratch for approximately $100 [2][5]. - The project consists of around 8,000 lines of code and provides a complete training and inference process for a simplified version of ChatGPT [2][4]. - Users can set up a cloud GPU machine and run a script to interact with their own language model (LLM) via a web interface after about 4 hours of training [3][5]. Project Features - nanochat includes a new Rust implementation for training tokenizers and pre-trains a Transformer LLM on the FineWeb dataset, evaluating its performance across multiple metrics [4]. - The project allows for fine-tuning and evaluation of the model on various tasks, including world knowledge multiple-choice questions, mathematics, and coding [4][5]. - Karpathy aims to create a unified, readable, and easily modifiable codebase that can serve as a strong baseline for future developments in LLMs [5][6]. Performance Metrics - Initial training costs around $100, achieving a model that can engage in basic conversations and perform simple tasks [5]. - With a budget of $1,000 and extended training time, the model's coherence improves significantly, enabling it to tackle basic math and coding tasks [5]. - Performance metrics indicate that a model trained for 24 hours can achieve scores above 40 in MMLU and 70 in ARC-Easy, showcasing its capabilities [5][10]. Community and Development - Karpathy envisions nanochat evolving into a research platform or standard benchmark, encouraging community collaboration for iterative improvements [6]. - The project is positioned as a capstone for an upcoming LLM101 course, which is still under development [5]. Limitations and Considerations - Karpathy cautions that nanochat is not designed for personalized applications and should be viewed as a rudimentary model lacking advanced intelligence [12][13]. - To achieve effective personalization, significant steps involving data preparation, synthetic data generation, and fine-tuning with robust models are necessary [13].
NeurIPS 25 | GRPO进阶版来了,GVPO重构大模型后训练范式
机器之心· 2025-10-14 02:06
Core Viewpoint - Post-training of large models is becoming a key aspect of AI evolution, focusing on enhancing reasoning capabilities, aligning with human preferences, and maintaining stability and efficiency [1]. Summary by Sections GVPO Introduction - The team from Zuoyebang and Hong Kong University of Science and Technology proposed a new method called GVPO (Group Variance Policy Optimization) to address the instability issues of GRPO (Generalized Reward Policy Optimization) [2]. Design Motivation - Inspired by DPO (Direct Preference Optimization), the research team aims to maximize rewards under KL constraints in the GRPO scenario, which involves multiple samplings for each prompt [5]. Practical Challenges - A significant challenge is the expectation calculation of Z(x) across all possible samples, which is nearly impractical. The team found that ensuring the sum of gradient weights for all samples under the same prompt equals zero allows Z(x) to cancel out, thus avoiding this computational difficulty [6]. Key Advantages of GVPO 1. **Unique Optimal Solution Guarantee**: GVPO's MSE form provides a strict mathematical proof that it achieves a unique optimal solution when R_θ equals R, ensuring algorithm effectiveness and stability [13]. 2. **No Need for Importance Sampling**: GVPO's optimal solution has minimal restrictions on sampling distribution, allowing for off-policy training without the common instability issues associated with importance sampling [14]. Analytical Perspectives - GVPO can be understood from three complementary analytical perspectives, each corresponding to an equivalent loss function: 1. **Negative Log-Likelihood Perspective (NLL)**: GVPO's loss function can be viewed as a weighted negative log-likelihood, allowing for flexible integration of historical and heterogeneous data sources [17]. 2. **Mean Squared Error Perspective (MSE)**: The optimization goal is to minimize the deviation between implicit and actual rewards, ensuring convergence to a unique global optimal solution under KL constraints [18]. 3. **Reinforcement Learning Perspective (RL)**: This perspective highlights the three components of the GVPO loss function, emphasizing the balance between actual and predicted rewards [19]. Experimental Results - In mathematical reasoning tasks, GVPO outperformed GRPO and its improved version Dr.GRPO across five benchmark tests, significantly enhancing the base model's performance [21]. - Ablation studies indicate GVPO's insensitivity to hyperparameter β and its excellent scalability with increased sampling numbers, allowing smaller models to match larger ones [23]. Significance and Future Prospects - GVPO represents a paradigm shift in post-training, moving from experience-driven approaches to those with theoretical guarantees, enhancing stability, flexibility, and efficiency in large model training [25][26].
刚刚,OpenAI官宣自研造芯,联手博通开发10吉瓦规模的AI加速器
机器之心· 2025-10-13 23:56
| | | 今天凌晨,OpenAI 又搞出了一个大新闻! 这家 AI 巨头宣布与全球领先的芯片厂商之一博通建立战略合作,共同部署由前者设计的 10 吉瓦规模的 AI 加速器 。吉瓦是一个功率单位,1 吉瓦等于 100 万千 瓦。举例来说,一个普通家庭的峰值用电功率可能在 10 千瓦左右。这意味着,1 吉瓦的电力大约可以同时为 10 万个家庭供电。 预计双方将自 2026 年下半年起部署配备 AI 加速器与网络系统的机架,并在 2029 年底前完成全部部署。 就在上个月, OpenAI 宣布与英伟达建立战略合作伙伴关系 ,并将部署同样 10 吉瓦规模的英伟达系统。此次,与博通合作造芯势必将减少对英伟达 GPU 的高度 依赖,转向「自主 + 合作」并行的多元化算力策略。 正如一位网友所言,「OpenAI 简直等不及英伟达了,于是下场自己造芯。」 接下来看完整公告内容: 其中 OpenAI 将负责设计这些加速器及系统,并与博通联合开发与部署 。通过自研芯片与系统,OpenAI 能够将其在前沿模型和产品研发中积累的经验直接融入硬 件设计,从而释放出全新的能力与智能水平。 今日,OpenAI 与博通宣布展开合作,共同打 ...
只需1/4预算,性能反超基线:阿里高德提出Tree-GRPO,高效破解智能体RL难题
机器之心· 2025-10-13 23:56
Core Insights - The article discusses the Tree-GRPO method proposed by Alibaba Gaode, which enhances reinforcement learning (RL) for agents by transforming independent chain sampling into tree search at the agent step level, addressing high rollout costs and sparse reward signals [2][4][23]. Group 1: Agentic RL Challenges - Agentic RL faces two main challenges: high rollout costs involving thousands of tokens and tool calls, and sparse supervision signals that only evaluate the final reward, making it difficult to identify which actions contributed to success or failure [12][19]. - Existing tree search RL methods typically operate at the token or sentence level, which is not suitable for agents with clear step-level semantic structures [8][19]. Group 2: Tree-GRPO Methodology - The Tree-GRPO method uses "agent steps" as tree nodes, where each node corresponds to a complete think-action-observe step, allowing for more effective trajectory sampling within a given budget [6][8]. - The method initializes multiple independent trajectories and samples nodes to expand the tree, ultimately generating diverse agent trajectories under the same rollout budget [8][19]. Group 3: Performance and Results - In experiments across 11 knowledge-intensive question-answering tasks, Tree-GRPO consistently outperformed chain-based RL methods, achieving significant performance improvements, such as a 69% relative increase in multi-hop QA performance on the smaller Qwen2.5-1.5b model [15][19]. - The method demonstrated a 112% improvement over chain-based methods under extremely limited budget conditions, showcasing its efficiency [19][20]. Group 4: Future Directions - The Tree-GRPO algorithm presents a new approach to Agentic RL, effectively addressing the issues of high rollout budgets and sparse supervision signals, leading to more efficient and stable RL training in multi-turn agent tasks [23][24]. - The team emphasizes the importance of dynamically adjusting the balance between exploration and exploitation in RL training to optimize learning outcomes [24].
CoT 之后,CoF 如何让帧间逻辑从「隐式对齐」变成「显式思考」?
机器之心· 2025-10-13 09:24
Group 1 - The article discusses the limitations of Chain-of-Thought (CoT) reasoning in language models, suggesting that it may not represent true reasoning but rather a superficial narrative [5][6] - Researchers have introduced the Chain-of-Frames (CoF) concept in the visual domain, which aims to enhance temporal consistency in video generation and understanding by applying a reasoning framework similar to CoT [6][9] - CoF allows video models to "watch and think," enabling them to not only fill in visual details but also solidify reasoning logic through the continuous evolution of each frame [6][9] Group 2 - CoF provides a natural temporal reasoning framework for video models, allowing them to perform reasoning on a frame-by-frame basis, thus addressing the temporal consistency issues in video generation and understanding [11] - Unlike traditional methods that rely on implicit feature alignment or smooth transitions, CoF ensures that each frame follows a logical evolution, reducing inconsistencies and detail loss across frames [12] - The integration of frame-level semantic information into video models significantly enhances their reasoning capabilities and cross-frame consistency [13]
推理速度10倍提升,蚂蚁集团开源业内首个高性能扩散语言模型推理框架dInfer
机器之心· 2025-10-13 09:24
Core Insights - Ant Group has launched dInfer, the industry's first high-performance inference framework for diffusion large language models (dLLM), achieving over 10 times the inference speed compared to Fast-dLLM [2][29] - dInfer has set a new milestone in performance, reaching a throughput of 1011 tokens per second in single-batch inference scenarios, surpassing highly optimized autoregressive (AR) models [29] Group 1: dInfer Framework - dInfer is designed to support various dLLM architectures, including LLaDA, LLaDA-MoE, and LLaDA-MoE-TD, emphasizing modularity and scalability [9][20] - The framework integrates four core modules: Model, KV Cache Manager, Iteration Manager, and Decoder, allowing developers to customize and optimize strategies [11][13] - dInfer addresses three core challenges in dLLM inference: high computational costs, KV cache invalidation, and the complexities of parallel decoding [12][19] Group 2: Performance Enhancements - dInfer employs a "Vicinity KV-Cache Refresh" strategy to reduce computational costs while maintaining generation quality by selectively recalculating KV caches [15][17] - The framework optimizes the forward computation speed of dLLM to match that of AR models through various system enhancements [18] - It introduces hierarchical and credit decoding algorithms to maximize the number of tokens decoded in parallel without additional training [19][20] Group 3: Performance Metrics - In tests with 8 NVIDIA H800 GPUs, dInfer achieved an average inference speed of 681 tokens per second, which is 10.7 times faster than Fast-dLLM [29] - When combined with trajectory distillation technology, dInfer's average inference speed soared to 847 tokens per second, exceeding the performance of AR models by over 3 times [24][29] - dInfer's performance in code generation tasks has set a record, demonstrating significant speed advantages in latency-sensitive scenarios [29] Group 4: Open Source and Community Engagement - The release of dInfer marks a significant step in the practical efficiency of diffusion language models, inviting global developers and researchers to collaborate in building a more efficient and open AI ecosystem [28][25] - The complete code, technical reports, and experimental configurations for dInfer v0.1 have been made open source [27][28]
改变强化学习范式,Meta新作呼应Sutton「经验时代」预言
机器之心· 2025-10-13 06:37
Core Insights - The article discusses the transition from the data era to the experience era in AI, emphasizing the need for AI agents to learn from interactions with their environment rather than solely relying on data [1][2] - Meta's research introduces a new paradigm called "early experience," which allows AI agents to learn from their own actions and the resulting states, providing a way to generate supervisory signals without external rewards [2][3] Group 1: Early Experience Paradigm - The "early experience" paradigm combines imitation learning and reinforcement learning, enabling agents to learn from both curated data and their own experiences in the environment [2][3] - Meta's implementation of this paradigm improved task completion success rates by 9.6% and out-of-distribution generalization by 9.4%, indicating a significant advancement in AI training methodologies [3][25] Group 2: Methodologies - Two strategies were explored within the early experience framework: implicit world modeling and self-reflection [3][18] - Implicit world modeling uses collected states to predict future states, allowing agents to internalize environmental dynamics without separate modules [10][12] - Self-reflection enables agents to compare expert actions with their own generated actions, producing explanations that enhance decision-making and learning [13][14] Group 3: Experimental Results - Benchmark tests showed that the early experience methods outperformed traditional imitation learning across various scenarios, with implicit world modeling and self-reflection yielding notable improvements [21][22] - In out-of-distribution evaluations, early experience methods significantly reduced performance gaps, demonstrating their effectiveness in adapting to unseen environments [23] Group 4: Conclusion - The findings suggest that starting training with early experience leads to higher performance ceilings in subsequent reinforcement learning phases, acting as a bridge between the data and experience eras [25][26]
LLaVA-OneVision-1.5全流程开源,8B模型预训练只需4天、1.6万美元
机器之心· 2025-10-13 06:37
Core Insights - LLaVA represents a significant milestone in democratizing multimodal capabilities by efficiently aligning open-source visual encoders with large language models, enabling a "see - understand - converse" approach in an open ecosystem [2][5]. Group 1: LLaVA Development and Features - LLaVA-1.5 enhances understanding through larger and cleaner datasets and high-resolution inputs, while LLaVA-NeXT expands capabilities in OCR, mathematics, and multi-scenario tasks [5]. - The LLaVA-OneVision framework integrates various modalities, including images, documents, charts, and videos, ensuring both effectiveness and efficiency [5][7]. - The framework emphasizes the importance of reproducibility in open-source paths, highlighting the gap between merely open weights and fully reproducible models [5][6]. Group 2: Performance Metrics - LLaVA-OV-1.5 outperforms Qwen2.5-VL in several benchmarks, showcasing competitive or superior performance across various multimodal tasks [7][25]. - The average performance metrics across different benchmarks indicate LLaVA-OV-1.5's strong capabilities, with notable scores in areas such as General VQA and OCR & Chart tasks [6][19]. Group 3: Data and Training Strategies - The training strategy involves a three-stage process: language-image alignment, high-quality knowledge injection, and visual instruction alignment, utilizing a total of 85 million pre-training samples and 22 million instruction samples [20][25]. - The data construction emphasizes a concept balancing strategy to mitigate issues related to sparse long-tail concepts and noise in original captions, significantly improving performance metrics [12][13]. - Offline parallel data packaging is employed to enhance token utilization and reduce padding waste, achieving up to an 11x reduction in padding tokens [21][22]. Group 4: Engineering Optimizations - The model leverages mixed parallelism and native resolution strategies to optimize training efficiency and maintain structural details in dense text areas [23][24]. - The entire process is designed to be straightforward for reproduction, with all data, tools, scripts, and configurations made openly available, ensuring clarity and ease of use [26].
NeurIPS 2025 Spotlight | GeoSVR:稀疏体素的新潜力——超越3DGS系列的高精度三维表面重建
机器之心· 2025-10-13 04:21
Core Viewpoint - The article discusses the introduction of GeoSVR (Geometric Sparse Voxel Reconstruction), a new explicit geometric optimization framework that surpasses existing methods in geometric accuracy, detail capture, and completeness in surface reconstruction from multi-view images [2][32]. Methodology - The core of GeoSVR involves two main designs for harnessing sparse voxels: 1. Voxel-Uncertainty Depth Constraint, which models uncertainty and weights depth constraints to improve geometric accuracy [8][10]. 2. Sparse Voxel Surface Regularization, which employs various regularization strategies to maintain global consistency and prevent overfitting [14][22]. Experimental Results - GeoSVR significantly outperforms existing methods across multiple datasets, achieving a Chamfer distance that is notably better than state-of-the-art methods, with a training time of only 0.8 hours compared to over 12 hours for previous methods [24][30]. - In the DTU dataset, GeoSVR achieved a mean Chamfer distance of 0.32, demonstrating superior geometric precision and reconstruction quality [23][30]. - On the Mip-NeRF 360 dataset, GeoSVR achieved an F1-score of 0.56, marking it as the highest precision method currently available [27]. Significance and Future Outlook - GeoSVR showcases the potential of sparse voxels for high-quality surface reconstruction, providing a foundation for applications in robotics perception, autonomous driving, digital twins, and virtual reality [32][33]. - Future research will focus on scaling scene reconstruction and supporting complex light path conditions [33].
为MoE解绑:全新「专家即服务」推理架构发布,超细粒度扩展锐减37.5%成本
机器之心· 2025-10-13 04:21
Core Viewpoint - The article discusses the challenges and innovations in the inference of large language models, particularly focusing on the Mixture-of-Experts (MoE) architecture and the introduction of the Expert-as-a-Service (EaaS) model to enhance efficiency, scalability, and robustness in model inference [2][4][25]. Group 1: Challenges in MoE Inference - The inference cost of large language models has increased exponentially, prompting the need for cost reduction strategies [2]. - Existing MoE frameworks face scalability issues due to the requirement for large-scale synchronous communication, leading to resource wastage [2]. - MoE systems exhibit low fault tolerance, where a single node failure can cause the entire service cluster to restart, resulting in service interruptions [3]. - Load imbalance occurs as the activation of experts is dynamically sparse, leading to some GPU nodes being overloaded while others remain idle [4]. Group 2: Introduction of EaaS - EaaS transforms the MoE inference architecture into a microservices-like model, allowing for flexible scheduling and independent scaling of expert services [7]. - The architecture decouples the expert layer from the Attention layer, enabling asynchronous processing and improving pipeline utilization [10]. - EaaS employs a dynamic batching mechanism and a custom communication library based on InfiniBand GPUDirect Async (IBGDA) to minimize communication latency and kernel launch overhead [14]. Group 3: Performance and Scalability - EaaS demonstrates superior scalability and fault tolerance compared to traditional MoE inference systems, with the ability to maintain throughput even during GPU node failures [15][20]. - The system allows for fine-grained resource allocation, enabling cloud service providers to adjust computational resources dynamically based on real-time load [18]. - EaaS can achieve up to 37.5% GPU resource savings while maintaining performance levels comparable to static architectures [18]. Group 4: Future Potential - EaaS shows significant potential in cloud-based large model inference and model-as-a-service (MaaS) scenarios, aligning with the needs of multi-tenant environments and continuous delivery [25]. - The modular design of EaaS facilitates independent upgrades and maintenance, allowing the system to evolve with changing model scales and application demands [25].