Workflow
大语言模型
icon
Search documents
卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
3 6 Ke· 2025-10-14 03:40
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, a former AI director at Tesla and co-founder of OpenAI, aimed at educational purposes [1][57]. - The project allows users to build a basic conversational AI model with a cost of approximately $100 and a training time of about 4 hours on a cloud GPU server [1][10]. Project Overview - "nanochat" consists of around 8000 lines of code and is implemented in Rust, featuring a tokenizer, a pre-trained Transformer model, and various training datasets [2][3]. - The model can perform basic conversational tasks, generate stories and poems, and answer simple questions [2][4]. Performance Metrics - After approximately 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [4][52]. - The model's performance metrics include CORE scores, ARC-Easy, GSM8K, and HumanEval, with notable improvements observed during different training phases [3][52]. Training Phases - The training process includes pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) stages, each contributing to the model's capabilities [41][46]. - Mid-training focuses on adapting the model for multi-turn conversations and teaching it to handle multiple-choice questions [35][36]. Community Engagement - The project has gained significant attention on GitHub, with over 4.8k stars shortly after its release, indicating strong community interest and potential for further optimization [8][7]. - The codebase is designed to be user-friendly, allowing modifications and enhancements by the community [54][55]. Educational Impact - Karpathy aims to integrate this technology into a broader educational framework, potentially transforming how AI can assist in learning [62]. - The project is part of a larger initiative to create a symbiotic relationship between teachers and AI, enhancing the learning experience [62].
卡帕西8000行代码手搓ChatGPT,成本仅100美元,训练12小时CORE表现超越GPT-2,手把手教程来了
量子位· 2025-10-14 02:19
Core Insights - The article discusses the launch of "nanochat," a simplified version of ChatGPT created by Andrej Karpathy, which can be built with minimal cost and code [1][2][4]. Project Overview - "nanochat" is a full-stack training and inference pipeline that allows users to create a basic ChatGPT-like model with approximately 8000 lines of code [2][4]. - The entire project can be executed on a cloud GPU server for about $100, taking as little as 4 hours to set up and run [3][4][16]. Technical Specifications - The model is built using Rust and includes a tokenizer, a pre-trained Transformer architecture, and various training datasets [5]. - It supports efficient inference with features like KV caching and a lightweight Python interpreter for tool usage [5][43]. Performance Metrics - After about 12 hours of training, the model's performance on the CORE metric surpasses that of GPT-2 [8]. - A specific example shows that a model trained for 24 hours can achieve scores of over 40 on the MMLU dataset and over 70 on the ARC-Easy dataset [10]. Development Goals - Karpathy aims to create a unified, simple, and modifiable codebase that can serve as a strong baseline for future developments [11][13]. - The project is intended to be a capstone for the upcoming LLM101n course, which focuses on building large language models [12]. Community Engagement - The project has gained significant attention, with GitHub stars reaching 4.8k shortly after its release, indicating strong community interest [14]. - Users are encouraged to optimize and modify the codebase, allowing for a collaborative improvement process [59]. Training Process - The training process involves several stages: pre-training, mid-training, supervised fine-tuning (SFT), and reinforcement learning (RL) [45][48][51]. - The total time for the training process, excluding RL, is approximately 3 hours and 51 minutes, with a total cost of about $92.4 [57]. Final Remarks - The article emphasizes the potential of "nanochat" as a research tool and a framework for benchmarking, similar to previous projects like nanoGPT [13]. - The project is still in its early stages, with many opportunities for further optimization and enhancement [13][50].
拒绝“熵崩塌”和“熵爆炸”!这项研究让大模型学会“精确探索”,推理成绩飙升
量子位· 2025-10-13 08:47
Core Insights - The article discusses the advancements in large language models (LLMs) using a method called RLVR (Reinforcement Learning with Verifiable Rewards), which has led to significant breakthroughs in mathematical, coding, and scientific reasoning tasks since 2024 [1][2]. Group 1: Challenges in RLVR Training - RLVR faces a critical bottleneck known as the "exploration imbalance," where exploration can either be too limited, leading to entropy collapse, or too uncontrolled, resulting in entropy explosion [2][9]. - The traditional entropy regularization method encourages exploration but can lead to either rapid convergence to a deterministic strategy or chaotic outputs due to excessive uncertainty [6][10]. Group 2: Proposed Solution - SIREN - The research team introduced a Selective Entropy Regularization method (SIREN) that employs three mechanisms: defining the exploration range, focusing on key decision points, and stabilizing the training process [14][18]. - SIREN limits entropy calculations to a core set of high-probability tokens, ensuring that exploration occurs only within semantically reasonable candidates [14][15]. - It identifies key decision points in the generation sequence where entropy is significantly higher than average, concentrating exploration incentives on these critical areas [16]. - The method adjusts the entropy target to maintain it within a reasonable range, preventing training instability [17]. Group 3: Experimental Validation - Experimental results demonstrate that SIREN significantly improves performance across various models and datasets, achieving an average major accuracy (maj@k) of 54.6% on Qwen2.5-Math-7B, surpassing the strongest baseline by 4.8% [22][24]. - The effective exploration facilitated by SIREN leads to a fundamental change in performance compared to traditional entropy regularization methods [25][32]. - The research indicates that SIREN maintains diversity in answers and avoids confusion collapse, contributing to a smoother and more controllable training process [28][30]. Group 4: Future Implications - The study emphasizes the importance of stable, controllable, and efficient exploration in releasing the potential of large models and overcoming performance bottlenecks [35]. - The proposed selective exploration control mechanism offers a feasible solution for refining exploration strategies in future reasoning model training paradigms [35].
马斯克AI公司开发“世界模型”,从英伟达挖专家将推游戏
Feng Huang Wang· 2025-10-13 03:21
Core Insights - xAI, led by Elon Musk, is intensifying efforts to develop a "world model" to compete with Meta and Google in the next generation of AI systems capable of autonomous navigation and design in physical environments [1][2] - The world model is a generative AI model that understands dynamic features of the real world, including physical and spatial properties, using various input data types [1] - xAI has hired experts from NVIDIA to advance the development of these models, which are expected to enhance AI capabilities beyond current large language models [1][2] Company Developments - xAI has recruited two AI researchers, Zeeshan Patel and Ethan He, with experience in world model development [2] - The company plans to launch an AI-generated game by the end of next year, reaffirming its commitment to this goal [2] - Recently, xAI released an upgraded image and video generation model, which is now available for free to users [2] Industry Context - Other leading AI labs, including Google and Meta, are also working on world models, indicating a competitive landscape [3] - The potential market size for world models is suggested to be close to the current global economic total, highlighting significant commercial interest [2] - Challenges remain in finding sufficient data to simulate the real world and train these models, which is both difficult and costly [3]
专访 AirPods 团队:一只小小的耳机,如何学会追踪 50 种运动?
3 6 Ke· 2025-10-13 02:31
Core Insights - The article discusses the advancements in heart rate monitoring technology with the introduction of AirPods Pro 3, which achieves accuracy comparable to traditional chest straps [1][4][10] - It highlights the innovative use of the ear canal as a more effective physiological signal collection point compared to the wrist, leveraging infrared light PPG technology [5][7][8] - The integration of multiple sensors and algorithms allows AirPods Pro 3 to accurately track various physical activities and heart rate in real-time [15][16][17] Group 1: Technology and Innovation - AirPods Pro 3 can monitor heart rate with precision, matching the accuracy of chest straps, especially during steady-state and interval running [1][3] - The device utilizes infrared light PPG for heart rate monitoring, which is more effective than traditional green LED light sources used in most wearables [7][10] - The combination of heart rate data with motion data from accelerometers and gyroscopes enhances the accuracy of heart rate readings during various physical activities [8][15] Group 2: Market Position and Competitive Edge - Apple aims to position AirPods Pro 3 as a comprehensive fitness device, comparable to the Apple Watch, by understanding user activities and calorie expenditure [14][16] - The development of a Motion Foundation Model, trained on extensive real-world data, enables the device to recognize over 50 different types of exercises [16][17] - The collaboration between AirPods and Apple Watch is designed to complement each other, providing a more complete digital representation of the user's body [9][10] Group 3: User Experience and Design - The design of AirPods Pro 3 focuses on achieving a snug fit, which is crucial for accurate physiological monitoring [10][11] - The device's ability to filter out external noise while monitoring internal signals reflects Apple's philosophy of technology enhancing human perception [17] - The advancements in sound quality and physiological monitoring capabilities indicate a shift in how audio devices are perceived, moving from mere sound output to becoming sensors for self-awareness [17]
全球AI数据视角看机器人市场
2025-10-13 01:00
Summary of Conference Call on AI and Robotics Industry Industry Overview - The AI industry is still in its early stages, with significant investments from major companies amounting to hundreds of billions to trillions of dollars, indicating substantial potential for growth [1][3] - AI-related computing power currently represents a small fraction of the overall economy, suggesting significant room for expansion [1][4] Key Insights and Arguments - The ratio of training to inference computing power is currently 1:1, indicating that the industry is still in the early investment phase [1][4] - Robotics, as an application of AI, is accelerating in development, with companies like Figure starting mass production of advanced robots [1][5] - The U.S. market shows strong consumer willingness to spend on technology products, benefiting both the robotics and electric vehicle sectors [1][8] Market Dynamics - Companies like Taotao and Ecovacs in the U.S. are noteworthy for their strong channel transformation capabilities, while Chinese companies like Yushu are making inroads into the North American market [1][6] - The average annual capital expenditure for U.S. tech giants ranges from $27 billion to $68 billion, with a return on investment (ROI) of approximately 40% to 50%, significantly higher than that of Chinese companies [1][6] Economic Implications - The rapid growth of the AI industry in the U.S. has led to rising wages for AI-related personnel, contributing to inflation and creating a positive ROI cycle [1][7] - The increasing cost of labor makes AI technology more attractive for companies, further driving investment in AI and robotics [1][7] Future Projections - The market for electric vehicles is expected to grow significantly, with projections of over 10 million units sold by 2025 [1][12] - The robotics sector is also anticipated to expand, with the potential for high demand as technology advances [1][12] Investment Considerations - When selecting stocks in the North American market, focus on companies with strong channel capabilities and those actively expanding into North America [1][9] - The ongoing investment in AI, projected to reach $60 billion annually by U.S. companies, will likely lead to a wave of white-collar job replacements, eventually extending to blue-collar jobs [1][11] Conclusion - The AI and robotics sectors are poised for significant growth, driven by technological advancements, strong consumer demand, and substantial investments from major companies [1][12]
吴恩达Agentic AI新课:手把手教你搭建Agent工作流,GPT-3.5反杀GPT-4就顺手的事
量子位· 2025-10-12 04:07
Core Concept - The article discusses the new course by Andrew Ng on Agentic AI, emphasizing the development of workflows that mimic human-like task execution through decomposition, reflection, and optimization [1][9][74]. Summary by Sections Agentic AI Overview - Agentic AI focuses on breaking down tasks into manageable steps, allowing for iterative improvement rather than generating a single output [5][14][74]. - The course reveals a systematic methodology behind Agentic AI, highlighting the importance of task decomposition and continuous optimization [9][10][74]. Core Design Patterns - The course identifies four core design patterns for developing Agentic workflows: Reflection, Tool Usage, Planning, and Multi-agent Collaboration [3][17][44]. Reflection - Reflection involves the model assessing its outputs and considering improvements, which can be enhanced by using multiple models in tandem [18][21]. - Objective evaluation standards can be established to assess outputs, improving the quality of the model's self-correction [23][27]. Tool Usage - Tool usage allows the model to autonomously decide which functions to call, enhancing efficiency compared to traditional methods where developers manually implement tools [28][34]. - The article discusses the importance of a unified protocol for tool calls, which simplifies the integration of various tools [41][43]. Planning - Planning enables the model to adjust the sequence of tool execution based on different requests, optimizing performance and resource use [46][48]. - A practical technique involves converting execution steps into JSON or code format for clearer task execution [47]. Multi-agent Collaboration - Multi-agent collaboration involves creating multiple agents with different expertise to tackle complex tasks, improving overall efficiency [51][52]. - This structured collaboration mirrors organizational structures, enhancing task division and scalability [52]. Iterative Improvement Process - The article outlines a feedback loop for building Agentic workflows, consisting of sampling, evaluation, and improvement [59][60]. - Error analysis is crucial for optimizing the system, allowing for targeted improvements based on specific performance issues [61][66]. Practical Insights - The course provides practical insights into selecting and testing different models, emphasizing the importance of iterative refinement in workflow design [68][70]. - The concept of Agentic AI represents a significant opportunity for developers to explore more complex, multi-step workflows, moving beyond traditional end-to-end agents [80].
冯帅章:部分院校的专业设置与实际需求脱节
经济观察报· 2025-10-11 09:15
Core Viewpoint - The employment situation, particularly for young people, is a concern for society, but there is no need for excessive anxiety. The job market is relatively stable this year, with enterprises, graduates, and schools actively adjusting to the new employment landscape. Future attention should be paid to the quality rather than the quantity of higher education expansion [1][2][5][7]. Employment Market Overview - The number of college graduates is expected to reach a record high of 12.22 million by 2025, an increase of 430,000 from the previous year [2]. - As of August, the unemployment rate for urban labor aged 16-24 reached 18.9%, up 1.1 percentage points from July, marking a new high since the new standard was introduced in December 2023 [2]. - The overall employment market is stable compared to last year, with no significant fluctuations, which can be viewed as a positive sign in the current macroeconomic context [5][6]. Higher Education and Employment Quality - There is a need for significant adjustments in the professional settings of existing higher education institutions to align with actual market demands [7][8]. - Caution is advised regarding the expansion of higher education, emphasizing the importance of maintaining educational quality over merely increasing enrollment numbers [7][8]. Recommendations for Graduates - Graduates are encouraged to actively seek employment opportunities while considering market demands, rather than focusing solely on salary and job stability [9]. - Key strategies for students include solidifying their professional knowledge, embracing new technologies, and participating in internships to better understand market needs [9]. Flexible Employment Trends - The new flexible employment sector is divided into two categories: cloud-based and location-based. The latter, such as delivery and ride-sharing services, is approaching saturation due to local market demand limitations [12][13]. - The total number of platform workers in China has reached 247 million, accounting for 28.6% of the working-age population, with full-time and part-time workers being nearly equal [18]. Social Security and Policy Recommendations - There is a pressing need to enhance social security for flexible employment groups, particularly in light of an aging population [16]. - Policies should encourage platforms to assist flexible workers in securing social insurance, even if formal labor contracts are not in place [17][18].
北大 & 作业帮团队提出 Text-to-SQL 新框架 Interactive-T2S,攻克宽表处理与低资源对齐难题
AI前线· 2025-10-11 04:14
Core Insights - The article discusses the development of the Interactive-T2S framework, which transforms large language models (LLMs) into intelligent query agents capable of multi-turn interactions with databases, addressing inefficiencies in handling complex, wide tables [2][5][6]. Text-to-SQL Technology - Text-to-SQL serves as a bridge between natural language and databases, allowing users to convert natural language queries into executable SQL without needing SQL syntax knowledge, which is valuable in various sectors like enterprise data analysis and public services [4]. Challenges in Current LLM-based Text-to-SQL Methods - Existing methods face three main challenges: inefficiency in processing wide tables, poor adaptability in low-resource scenarios, and lack of interpretability in the interaction process [5][8]. Interactive-T2S Framework - The Interactive-T2S framework views LLMs as intelligent query agents and databases as data environments, utilizing a multi-turn interaction logic to generate and validate SQL queries step-by-step, requiring only two annotated examples for few-shot learning [6][10]. Core Tools of Interactive-T2S - The framework includes four core tools designed to reduce the reasoning burden on LLMs: - SearchColumn for semantic column identification - SearchValue for fuzzy value searching - FindShortestPath for table association - ExecuteSQL for real-time execution and validation of SQL queries [7][12]. Experimental Validation - The research team conducted experiments on various datasets, demonstrating that Interactive-T2S outperforms existing methods in execution accuracy and efficiency, particularly in complex and noisy data environments [11][14][15]. Application Value and Future Directions - Interactive-T2S has potential applications in smart education, enterprise data analysis, and public service queries, simplifying data retrieval processes for users [18]. Future enhancements will focus on optimizing tool efficiency and exploring capabilities in multimodal data queries [19].
中康科技·天宫一号:完成对前沿大语言模型DeepSeek-V3.2-Exp的适配,持续深化开放的健康产业AI应用生态
Ge Long Hui· 2025-10-11 02:03
Core Insights - Zhongkang Technology's Tiangong-1 platform has recently completed the adaptation of the advanced language model DeepSeek-V3.2-Exp, emphasizing a dual strategy of technological independence and ecological openness [1][2] Group 1: Technology and Innovation - The Tiangong-1 platform serves as the AI application capability hub for the health industry, built on the dual-core driving architecture of the self-developed "Zhuomuniao" medical model and the "Tiangong-1" decision-making model [1] - This unique architecture integrates the professionalism of the medical field with the broad applicability of business decision-making, ensuring Tiangong-1's leading position and professional barriers in the complex health industry [1] Group 2: Ecosystem and Product Offering - The intelligent agent ecosystem of Tiangong-1 is designed as a combination of a "supermarket" and a "factory," providing standardized intelligent agent products that cover the entire spectrum of "medicine, pharmacy, patients, and management" for users to quickly address common issues [2] - The platform also offers powerful intelligent agent creation tools, allowing clients to customize their agents based on unique business processes, thereby securing proprietary intelligent agent assets and enabling continuous evolution of core capabilities [2] - The adaptation of excellent third-party models like DeepSeek-V3.2-Exp significantly enriches the "raw materials" library under the "factory" model, allowing enterprises to freely combine and call upon various models based on specific task performance, cost, and efficiency requirements, achieving a synergistic effect of "1+1>2" [2]