Workflow
强化学习
icon
Search documents
卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
量子位· 2025-10-14 02:19
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, which can be built with minimal cost and code [1][2][4]. Project Overview - "nanochat" is a full-stack training and inference pipeline that allows users to create a basic ChatGPT-like model with approximately 8000 lines of code [2][4]. - The entire project can be executed on a cloud GPU server for about $100, taking as little as 4 hours to set up and run [3][4][16]. Technical Specifications - The model is built using Rust and includes a tokenizer, a pre-trained Transformer architecture, and various training datasets [5]. - It supports efficient inference with features like KV caching and a lightweight Python interpreter for tool usage [5][43]. Performance Metrics - After about 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [8]. - A specific example shows that a model trained for 24 hours can achieve scores of over 40 on the MMLU dataset and over 70 on the ARC-Easy dataset [10]. Development Goals - Karpathy aims to create a unified, simple, and modifiable codebase that can serve as a strong baseline for future developments [11][13]. - The project is intended to be a capstone for the upcoming LLM101n course, which focuses on building large language models [12]. Community Engagement - The project has gained significant attention, with GitHub stars reaching 4.8k shortly after its release, indicating strong community interest [14]. - Users are encouraged to optimize and modify the codebase, allowing for a collaborative improvement process [59]. Training Process - The training process involves several stages: pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) [45][48][51]. - The total time for the training process, excluding RL, is approximately 3 hours and 51 minutes, with a total cost of about $92.4 [57]. Final Remarks - The article emphasizes the potential of "nanochat" as a research tool and a framework for benchmarking, similar to previous projects like nanoGPT [13]. - The project is still in its early stages, with many opportunities for further optimization and enhancement [13][50].
《大模型的第一性思考》李建忠对话GPT5与Transformer发明者Lukasz Kaiser实录
3 6 Ke· 2025-10-13 10:46
Core Insights - The rapid development of large intelligent systems is reshaping industry dynamics, exemplified by OpenAI's recent release of Sora 2, which showcases advancements in model capabilities and the complexity of AI evolution [1][2] - The dialogue between industry leaders, including CSDN's Li Jianzhong and OpenAI's Lukasz Kaiser, focuses on foundational thoughts regarding large models and their implications for future AI development [2][5] Group 1: Language and Intelligence - Language plays a crucial role in AI, with some experts arguing that relying solely on language models for AGI is misguided, as language is a low-bandwidth representation of the physical world [6][9] - Kaiser emphasizes the importance of temporal dimensions in language, suggesting that the ability to generate sequences over time is vital for expressing intelligence [7][9] - The conversation highlights that while language models can form abstract concepts, they may not fully align with human concepts, particularly regarding physical experiences [11][12] Group 2: Multimodal Models and World Understanding - The industry trend is towards unified models that can handle multiple modalities, but current models like GPT-4 already demonstrate significant multimodal capabilities [12][13] - Kaiser acknowledges that while modern language models can process multimodal tasks, the integration of different modalities remains a challenge [13][15] - The discussion raises skepticism about whether AI can fully understand the physical world through observation alone, suggesting that language models may serve as effective world models in certain contexts [14][15] Group 3: AI Programming and Future Perspectives - AI programming is emerging as a key application of large language models, with two main perspectives on its future: one advocating for natural language as the primary programming interface and the other emphasizing the continued need for traditional programming languages [17][18] - Kaiser believes that language models will increasingly cover programming tasks, but a solid understanding of programming concepts will remain essential for professional developers [19][20] Group 4: Agent Models and Generalization Challenges - The concept of "agent models" in AI training faces challenges in generalizing to new tasks, raising questions about whether this is due to training methods or inherent limitations [21][22] - Kaiser suggests that the effectiveness of agent systems relies on their ability to learn from interactions with various tools and environments, which is currently limited [22][23] Group 5: Scaling Laws and Computational Limits - The belief in Scaling Laws as the key to stronger AI raises concerns about potential over-reliance on computational power at the expense of algorithmic and architectural advancements [24][25] - Kaiser differentiates between pre-training and reinforcement learning Scaling Laws, indicating that while pre-training has been effective, it may be approaching economic limits [25][26] Group 6: Embodied Intelligence and Data Efficiency - The slow progress in embodied intelligence, particularly in humanoid robots, is attributed to either data scarcity or fundamental differences between bits and atoms [29][30] - Kaiser argues that advancements in data efficiency and the development of multimodal models will be crucial for achieving effective embodied intelligence [30][31] Group 7: Reinforcement Learning and Scientific Discovery - The shift towards reinforcement learning-driven reasoning models presents both opportunities for innovation and challenges related to their effectiveness in generating new scientific insights [32][33] - Kaiser notes that while reinforcement learning offers high data efficiency, it has limitations compared to traditional gradient descent methods [33][34] Group 8: Organizational Collaboration and Future Models - Achieving large-scale collaboration among agents remains a significant challenge, with the need for more parallel processing and effective feedback mechanisms in training [35][36] - Kaiser emphasizes the necessity for next-generation reasoning models that can operate in a more parallel and efficient manner to facilitate organizational collaboration [36][37] Group 9: Memory Mechanisms in AI - Current AI models' memory capabilities are limited by context windows, resembling working memory rather than true long-term memory [37][38] - Kaiser suggests that future architectures may need to incorporate more sophisticated memory mechanisms to achieve genuine long-term memory capabilities [38][39] Group 10: Continuous Learning in AI - The potential for AI models to support continuous learning is being explored, with current models utilizing context as a form of ongoing memory [39][40] - Kaiser believes that while context learning is a step forward, more elegant solutions for continuous learning will be necessary in the future [40][41]
真正的AI竞争力,藏在大模型“后训练”这一步
量子位· 2025-10-13 08:47
Core Insights - The article emphasizes the importance of Post-Training as a transformative approach in AI, moving beyond simple model optimization to creating specialized intelligent engines tailored to specific business needs [1][4] - The evolution of Post-Training technology is highlighted, showcasing a shift from Supervised Fine-Tuning (SFT) to Reinforcement Learning (RL) methodologies, which better align with complex business requirements [2][4] Summary by Sections Post-Training Evolution - The initial approach in the industry was SFT, which allowed models to learn specific domain knowledge and dialogue styles [2] - However, SFT was insufficient for teaching models complex value judgments and strategic choices, which are critical in real business scenarios [3] - The focus has shifted to RL, evolving from human-dependent methods (RLHF) to automated systems (RLVR) and the innovative use of Natural Language Rewards [4][5] Implementation Pathway - The article outlines a four-step pathway for enterprises to implement Post-Training effectively, addressing challenges such as data quality, high labeling costs, and defining reward signals [5][8] - Successful case studies from companies like Zhihu, AutoHome, and Weibo illustrate practical applications of these steps, showcasing improvements in data quality and model performance [7][8] Step 1: Data Preparation - High-quality data is identified as the cornerstone of successful Post-Training, with companies spending 60-70% of their time on data preparation [10] - Zhihu and AutoHome have developed methods to enhance data quality through pre-labeling and structured data utilization, respectively [11][13] Step 2: Model Selection - Choosing the right base model is crucial, with many companies opting for the Tongyi Qianwen series due to its performance and support for Post-Training [14][16] - The model's architecture and open-source ecosystem facilitate easier implementation of Post-Training techniques [15][18] Step 3: Reward Mechanism Design - The design of a reward mechanism is essential for aligning model outputs with business objectives, transitioning from human feedback to automated verification systems [24][25] - Companies like Yingmi Fund are exploring ways to integrate expert decision-making frameworks into their models to enhance performance [26] Step 4: Evaluation System - A robust evaluation system is necessary to measure the effectiveness of Post-Training, with Yingmi Fund developing benchmarks to assess model performance in real-world scenarios [27][28] - Successful implementations have led to significant improvements in model accuracy and business outcomes, as seen in the case of Baifeng Cloud and Quark [30][32] Conclusion - The article concludes that the true competitive advantage in AI lies in how companies leverage their unique data and business insights through Post-Training to create proprietary intelligent engines [32]
改变强化学习范式,Meta新作呼应Sutton「经验时代」预言
机器之心· 2025-10-13 06:37
Core Insights - The article discusses the transition from the data era to the experience era in AI, emphasizing the need for AI agents to learn from interactions with their environment rather than solely relying on data [1][2] - Meta's research introduces a new paradigm called "early experience," which allows AI agents to learn from their own actions and the resulting states, providing a way to generate supervisory signals without external rewards [2][3] Group 1: Early Experience Paradigm - The "early experience" paradigm combines imitation learning and reinforcement learning, enabling agents to learn from both curated data and their own experiences in the environment [2][3] - Meta's implementation of this paradigm improved task completion success rates by 9.6% and out-of-distribution generalization by 9.4%, indicating a significant advancement in AI training methodologies [3][25] Group 2: Methodologies - Two strategies were explored within the early experience framework: implicit world modeling and self-reflection [3][18] - Implicit world modeling uses collected states to predict future states, allowing agents to internalize environmental dynamics without separate modules [10][12] - Self-reflection enables agents to compare expert actions with their own generated actions, producing explanations that enhance decision-making and learning [13][14] Group 3: Experimental Results - Benchmark tests showed that the early experience methods outperformed traditional imitation learning across various scenarios, with implicit world modeling and self-reflection yielding notable improvements [21][22] - In out-of-distribution evaluations, early experience methods significantly reduced performance gaps, demonstrating their effectiveness in adapting to unseen environments [23] Group 4: Conclusion - The findings suggest that starting training with early experience leads to higher performance ceilings in subsequent reinforcement learning phases, acting as a bridge between the data and experience eras [25][26]
摆脱即时爽感,用小事找回创业节奏
3 6 Ke· 2025-10-13 00:20
Core Insights - The article discusses the detrimental effects of dopamine addiction on entrepreneurs, highlighting that behaviors perceived as productive may actually stem from a reliance on external stimuli for short-term gratification [2][3][5] - It emphasizes the importance of shifting focus from immediate rewards to long-term benefits, suggesting that entrepreneurs should engage in activities that improve their internal state rather than relying on quick fixes like caffeine or constant data checking [1][7][11] Summary by Sections Misconceptions of Productivity - Many behaviors considered as "entrepreneurial hustle" are actually driven by dopamine addiction, leading to a cycle of ineffective busyness [2][3] - Entrepreneurs often mistake external stimuli, such as caffeine and data refreshes, for signs of productivity and resilience [2][3] Natural vs. Proxy Rewards - The article distinguishes between natural rewards, which improve internal bodily states, and proxy rewards, which provide only temporary satisfaction [4][5][6] - Engaging in meaningful tasks that address core business issues is identified as a natural reward, while superficial activities like browsing industry news are seen as proxy rewards [6] Practical Changes for Improvement - Suggestions include restructuring daily habits to prioritize natural rewards, such as having a nutritious breakfast instead of relying on caffeine [8] - During commutes, entrepreneurs are encouraged to limit passive information consumption and instead engage with content that offers real insights [9] - The article advocates for mindful breaks and rest periods that genuinely recharge the body rather than relying on digital distractions [10] Long-term Strategies - The importance of listening to bodily signals and making choices that align with long-term health and productivity is emphasized [11][12] - The article suggests integrating small, mindful practices into daily routines to gradually reduce dependence on immediate stimuli [13][14]
聊聊 AI Agent 到底有多大创新?
自动驾驶之心· 2025-10-12 23:33
作者 | sunnyzhao 编辑 | 大模型之心Tech 1,planing阶段带来了巨大的耗时,当tool变多后,turbo系列模型的准确率堪忧,因此不得不使用旗舰模型,这让延时进一步增 加。 2,planing的质量不够高,原来的task bot做任务所使用的workflow是人工决定的,现在改成了模型自助决定,从目前的测试来 看,由模型构建的复杂工作流的可用率远远不及人类水平。简单工作流使用判别式小模型反而性能更好。 3,reflection是一种时间换准确度的策略,然而这个策略非常容易重复进行自我内耗,和死循环。 这几个问题,确实是目前AI Agent技术的通病。如果把Agent当成"LLM+工具调用"的简单组合,没有认真处理工程细节,实际的 效果也确实未必比工作流编排就更好。主要结合看到一些论文,和一点实际经验,按题主说到的三点谈一下自己的看法。 本文只做学术分享,如有侵权,联系删文 ,自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 Planning慢的本质原因 原文链接: https://www.zhihu.com/question/657739588/ ...
智驾最后的窗口期,冲出AI新玩家
远川研究所· 2025-10-12 13:04
Core Insights - The intelligent assisted driving industry has experienced a stark contrast over the past year, with advancements in technology leading to increased consumer demand and cost reductions, allowing L2+ systems to penetrate the mid-to-low-end market [2][4][5] - The competitive landscape is intensifying, with a clear emergence of leading players, and companies must adapt to new technological paradigms to remain relevant [2][9] - The rise of multi-modal large models and end-to-end systems is reshaping the industry, with companies like Qianli Technology positioning themselves strategically to leverage these advancements [12][21] Industry Dynamics - The shift from modular to end-to-end architectures in intelligent driving systems is becoming a standard, as exemplified by Tesla's FSD V9.0, which emphasizes a pure vision-based approach [4][5][6] - The software value in intelligent driving systems is projected to exceed 40% of the total vehicle value, indicating a significant shift in the industry's focus towards software-driven solutions [6][18] - The competitive landscape is characterized by a mix of vertically integrated companies like Tesla and third-party suppliers, highlighting the importance of collaboration and resource integration [9][18] Company Developments - Qianli Technology, founded by AI pioneer Yin Qi, aims to become a platform-level AI company, focusing on intelligent assisted driving and smart cockpit solutions [11][21] - The company has established partnerships with major automotive players, including Geely, to enhance its market presence and technological capabilities [17][25] - Qianli Technology's RLM (Reinforcement Learning-Multi-modal) model is gaining attention for its ability to improve driving experience and safety through advanced perception and decision-making capabilities [21][24] Future Trends - The integration of multi-modal large models and reinforcement learning is expected to be crucial for the future of intelligent driving systems, enhancing their adaptability and safety [20][22] - The global market for automated and intelligent driving vehicles is projected to reach $1.2 trillion by 2040, with significant growth opportunities for companies like Qianli Technology [25] - The development of Robotaxi services is a key focus for Qianli Technology, aiming to establish a comprehensive operational framework within 18 months [27]
光会“看”和“说”还不够,还得会“算”!Tool-Use+强化学习:TIGeR让机器人实现精准操作
具身智能之心· 2025-10-11 16:02
Core Insights - The article discusses the limitations of current Vision-Language Models (VLMs) in accurately interpreting and executing spatial commands in robotics, emphasizing the need for precise geometric reasoning and tool integration [2][5]. Group 1: TIGeR Framework - The Tool-Integrated Geometric Reasoning (TIGeR) framework enhances VLMs by integrating tool usage and reinforcement learning to improve their ability to perform precise calculations in a three-dimensional space [2][6]. - TIGeR allows AI models to transition from qualitative perception to quantitative computation, addressing the core pain points of existing VLMs [2][7]. Group 2: Advantages of TIGeR - TIGeR provides precise localization by integrating depth information and camera parameters, enabling the accurate conversion of commands like "10 centimeters above" into three-dimensional coordinates [7]. - The framework supports multi-view unified reasoning, allowing information from various perspectives to be merged and reasoned within a consistent world coordinate system [7]. - The model's reasoning process is transparent, making it easier to debug and optimize by clearly showing the tools used, parameters input, and results obtained [7]. Group 3: Training Process - The training of TIGeR involves a two-phase process: first, supervised learning to teach basic tool usage and reasoning chains, followed by reinforcement learning to refine the model's tool usage skills through a hierarchical reward mechanism [8][10]. - The hierarchical reward mechanism evaluates not only the correctness of the final answer but also the accuracy of the process, including tool selection and parameter precision [8]. Group 4: Data Utilization - The TIGeR-300K dataset, consisting of 300,000 samples, was created to train the model in solving geometric problems, ensuring both accuracy and diversity in the tasks covered [10][13]. - The dataset construction involved template-based generation and large model rewriting to enhance generalization and flexibility, ensuring the model can handle complex real-world instructions [13]. Group 5: Performance Metrics - TIGeR outperforms other leading VLMs in spatial understanding benchmarks, achieving scores such as 93.85 in 2D-Rel and 96.33 in 3D-Depth [10][14]. - The model's performance in various spatial reasoning tasks demonstrates its capability to execute operations that require precise three-dimensional positioning, which other models struggle to achieve [16].
港中文(深圳)冀晓强教授实验室全奖招收博士/博士后
具身智能之心· 2025-10-11 16:02
Core Viewpoint - The article emphasizes the opportunities in the field of embodied intelligence, highlighting the need for skilled researchers and the benefits of joining a collaborative academic environment focused on artificial intelligence and robotics. Research Content - The research focuses on interdisciplinary areas such as artificial intelligence control theory, embodied intelligence control, and reinforcement learning control [11]. - Candidates are expected to have a deep understanding and interest in core research directions, with the ability to conduct theoretical innovation and experimental validation independently [2]. Candidate Requirements - **Postdoctoral Researchers**: Must hold a PhD in relevant fields from prestigious institutions, with a strong publication record in top-tier journals or conferences [2]. - **PhD Candidates**: Should possess a master's degree or an outstanding bachelor's degree in related disciplines [3]. - **Master's Candidates**: Expected to have a bachelor's degree in relevant fields from recognized universities [5]. - Candidates should demonstrate a solid foundation in mathematics and programming, with a keen interest in control theory, AI, and robotics [4]. Skills and Experience - Familiarity with deep learning and AI models such as CLIP, BLIP, and LLaVA is essential [6]. - Experience with classic models like VAE, Transformer, and BERT, along with strong algorithm design and programming skills, particularly in high-performance languages like C++ or Rust, is preferred [7][8]. - Practical experience in training, tuning, and deploying deep learning models is highly valued [12]. Mentor Introduction - Professor Ji Xiaoqiang, with a PhD from Columbia University, leads the AI Control and Decision Laboratory at The Chinese University of Hong Kong (Shenzhen) [13]. - His research focuses on intelligent control systems, and he has published over 50 papers in top international journals and conferences [13]. Benefits and Compensation - **Postdoctoral Researchers**: Eligible for annual pre-tax living allowances of 210,000 CNY, with additional subsidies and potential for significant research funding [14]. - **PhD Candidates**: Full or half scholarships available, with top candidates eligible for a principal's scholarship of 180,000 CNY per year [15]. - **Master's Candidates**: Opportunities for transitioning to PhD programs and additional living stipends for outstanding candidates [16]. Application Materials - Applicants must submit a complete CV in both Chinese and English, along with any published papers and evidence of research capabilities [19].
腾讯开源强化学习新算法!让智能体无需专家示范就“自学成才”,还即插即用零成本接入
量子位· 2025-10-11 06:04
Youtu-Agent 团队 投稿 量子位 | 公众号 QbitAI 让智能体自己摸索新方法,还模仿自己的成功经验。 腾讯优图实验室 开源 强化学习算法—— SPEAR (Self-imitation with Progressive Exploration for Agentic Reinforcement Learning)。 主打一个让AI自学成才! 该算法首次让大语言模型(LLM)驱动的智能体在无需大量专家示范的情况下,通过"自我模仿+渐进探索"实现熵稳定的学习过程。 在ALFWorld、WebShop、AIME24/25等基准上 平均提升16%以上 ,刷新业界最佳成绩,为长周期、稀疏奖励场景下的智能体训练提供了 即插即用 的新范式。 △ SPEAR算法核心概念示意图 简单来说,SPEAR算法既能大胆尝试新方法,又能靠谱地用已经验证过的有效策略,不用走极端。 下面具体来看。 传统自我模仿学习是什么? 想象一位新手厨师: 自我模仿学习(Self-Imitation Learning,SIL)就是把这套"只抄自己最好的作业"的思路搬进强化学习: 自我模仿 2.0:自己产出的"神操作"自己学 熵控崩溃终结者 ...