Workflow
大语言模型(LLM)
icon
Search documents
教全世界与AI对话的男人,正式加入DeepMind,提示工程封神
3 6 Ke· 2025-10-24 12:57
Group 1 - The core point of the article is the rise of prompt engineering as a profession, highlighted by Riley Goodside's recent joining of Google DeepMind, marking a significant milestone in the field [1][6][12] - Riley Goodside became famous for earning over one million dollars annually by engaging with AI, particularly ChatGPT, which popularized the role of prompt engineers [1][6][12] - The profession of prompt engineering has gained legitimacy and importance over the past three years, contrary to initial skepticism about its sustainability [12][9] Group 2 - DeepMind's CEO Demis Hassabis and product head Logan Kilpatrick publicly welcomed Goodside, indicating the significance of his role within the company [2][3] - Goodside's background includes a degree in computer science from PennWest California and experience in data-related roles at various companies, showcasing his expertise in the field [8] - The article discusses the evolution of prompt engineering, emphasizing its role as a frontier in the development of large language models (LLMs) and the importance of effective prompt design [13][12] Group 3 - Goodside's notable contributions include designing advanced prompts that enhance the capabilities of AI models, demonstrating the potential of prompt engineering to unlock AI's full potential [19][10] - The article mentions the concept of "glitch tokens," which are specific tokens in AI models that can lead to unexpected outputs, showcasing the intricacies of prompt engineering [15][16] - Goodside's work is seen as a bridge between traditional programming and the new paradigm of interacting with AI through natural language prompts [9][13]
Karpathy盛赞DeepSeek-OCR“淘汰”tokenizer!实测如何用Claude Code 让新模型跑在N卡上
AI前线· 2025-10-21 04:54
Core Insights - DeepSeek has released a new model, DeepSeek-OCR, which is a 6.6GB model specifically fine-tuned for OCR, achieving a 10× near-lossless compression and a 20× compression while retaining 60% accuracy [2] - The model introduces DeepEncoder to address the trade-offs between high resolution, low memory, and fewer tokens, achieving state-of-the-art performance in practical scenarios with minimal token consumption [2][4] - The model's architecture is lightweight, consisting of only 12 layers, which is suitable for the pattern recognition nature of OCR tasks [5] Model Innovations - DeepSeek-OCR allows for rendering original content as images before input, leading to more efficient information compression and richer information flow [6] - The model eliminates the need for tokenizers, which have been criticized for their inefficiencies and historical baggage, thus enabling a more seamless end-to-end process [6] - It employs a "Mixture of Experts" paradigm, activating only 500 million parameters during inference, allowing for efficient processing of large datasets [7] Market Position and Future Implications - Alexander Doria, co-founder of Pleiasfr, views DeepSeek-OCR as a milestone achievement, suggesting it sets a foundation for future OCR systems [4][8] - The model's training pipeline includes a significant amount of synthetic and simulated data, indicating that while it has established a balance between inference efficiency and model performance, further customization for specific domains is necessary for large-scale real-world applications [8] Developer Engagement - The release has attracted many developers, with Simon Willison successfully running the model on NVIDIA Spark in about 40 minutes, showcasing the model's accessibility and ease of use [9][21] - Willison emphasized the importance of providing a clear environment and task definition for successful implementation, highlighting the model's practical utility [24]
马斯克亲自点名Karpathy迎战Grok 5,别神话LLM,AGI还要等十年
3 6 Ke· 2025-10-21 02:21
Core Insights - The path to Artificial General Intelligence (AGI) is acknowledged to exist but is fraught with challenges, with a timeline of approximately 10 years suggested for its realization [1][3][12]. Group 1: Challenges in Achieving AGI - Karpathy highlights several significant challenges in achieving AGI, including sparse reinforcement learning signals, risks of model collapse, and the need for better environmental and evaluative frameworks [2][3]. - He critiques the current hype surrounding AI, suggesting that the industry has overestimated the intelligence level of existing AI systems [1][3]. Group 2: Perspectives on AGI Timeline - The timeline of 10 years for AGI is considered optimistic compared to the current hype, indicating a more realistic approach to expectations in the field [12][15]. - Karpathy believes that while there has been substantial progress in large language models (LLMs), there remains a considerable amount of work to be done before achieving a fully autonomous AGI capable of outperforming humans in all tasks [17][18]. Group 3: Reinforcement Learning and Learning Paradigms - Karpathy expresses skepticism about the effectiveness of traditional reinforcement learning (RL), suggesting that it may not be the complete solution for developing AGI [21][24]. - He advocates for alternative learning paradigms, such as "agentic interaction," which could provide better opportunities for LLMs to engage with their environments [24][25]. Group 4: Collaboration vs. Competition - In a notable exchange, Elon Musk challenged Karpathy to a programming duel with Grok 5, which Karpathy declined, preferring collaboration over competition [4][5]. - This reflects a broader sentiment in the industry that emphasizes the importance of refining tools and methodologies rather than engaging in competitive showdowns [9][32]. Group 5: Future of AI and Automation - Karpathy discusses the potential for AI to enhance productivity across various sectors, emphasizing that automation will likely complement human roles rather than completely replace them [34]. - He suggests that the future of AI will involve a careful balance of human oversight and AI capabilities, particularly in programming and decision-making processes [32][33].
世界模型:机器能否理解现实?
3 6 Ke· 2025-10-20 13:01
Core Concept - The article discusses the concept of "world models" in artificial intelligence (AI), which are internal representations of the environment that AI systems use to evaluate predictions and decisions before executing tasks [1][4]. Group 1: Definition and Importance of World Models - World models are considered essential for building intelligent, scientific, and safe AI systems, as emphasized by leading figures in deep learning [1]. - The idea of a world model has historical roots, dating back to Kenneth Craik's 1943 proposal of a "small-scale model" in the brain that allows organisms to simulate various scenarios [2]. Group 2: Historical Context and Evolution - Early AI systems like SHRDLU demonstrated the use of world models but struggled with scalability and complexity in real-world environments [3]. - The rise of machine learning and deep learning has revitalized the concept of world models, allowing AI to build internal approximations of environments through trial and error [3]. Group 3: Current Challenges and Perspectives - Despite the potential of world models, there is still a lack of consensus among researchers regarding their definition, content, and verification methods [2]. - Current generative AI models, such as large language models (LLMs), exhibit heuristic rules but lack a coherent and unified world model, leading to inconsistencies in their outputs [4][6]. Group 4: Future Directions and Research Focus - Researchers are exploring how to develop robust and verifiable world models, which could enhance AI's reliability and interpretability [6][7]. - There are differing opinions on how to create these models, with some suggesting that sufficient multimodal training data could naturally lead to their emergence, while others advocate for entirely new architectures [7].
LLM记忆管理终于不用“手把手教”了,新框架让智能体自主管理记忆系统
量子位· 2025-10-20 10:29
Core Insights - The article introduces Mem-α, an innovative reinforcement learning framework designed to enable large language models (LLMs) to autonomously manage complex memory systems, moving away from reliance on manual design and predefined instructions [2][4][14]. Memory Management Challenges - Traditional memory-enhanced agents often depend on predefined instructions and tools for memory updates, which can lead to suboptimal memory construction and information loss, particularly in long-term interactions [7][9][8]. - LLMs face limitations due to finite context windows, making external memory systems crucial for understanding long-term information [5][6]. Mem-α Framework - Mem-α transforms the memory construction problem into a sequential decision-making problem that can be optimized through reinforcement learning, allowing agents to explore optimal memory management strategies during information processing [14][16]. - The framework incorporates a complex memory system inspired by cognitive science, consisting of core memory, episodic memory, and semantic memory, each supporting various memory operations [22][20]. Training and Evaluation - Mem-α utilizes a multi-dimensional reward function to optimize memory construction, focusing on accurate retrieval, test-time learning, long-range understanding, and conflict resolution [18][28]. - Experimental results demonstrate that Mem-α significantly outperforms existing methods, achieving higher accuracy and efficient memory usage while maintaining performance [35][36]. Key Findings - Mem-α shows superior performance across all tasks, particularly in accurate retrieval and long-range understanding, indicating strong generalization capabilities [35]. - The framework reduces memory usage by approximately 50% compared to traditional methods while enhancing performance, validating the effectiveness of semantic compression mechanisms [35]. - The structured architecture of Mem-α proves essential for processing complex information, highlighting the limitations of flat memory representations [35]. - Mem-α exhibits robust generalization to document lengths exceeding 400K tokens, despite being trained on documents averaging less than 30K tokens [35].
微软BitDistill将LLM压缩到1.58比特:10倍内存节省、2.65倍CPU推理加速
机器之心· 2025-10-20 07:48
Core Insights - The article discusses the challenges of deploying large language models (LLMs) efficiently in downstream applications, particularly on resource-constrained devices like smartphones, due to high memory and computational costs [1][7] - A new approach called BitDistill is introduced, which aims to compress existing pre-trained LLMs into a 1.58-bit BitNet model while minimizing performance loss and training costs [4][19] Group 1: Challenges and Solutions - LLMs face significant deployment challenges as their scale increases, leading to instability in training and performance degradation when quantized to lower bit representations [2][10] - The introduction of extreme low-bit LLMs, such as BitNet, aims to reduce memory usage and accelerate inference, but achieving comparable accuracy to high-precision models requires extensive pre-training [1][4] Group 2: BitDistill Framework - BitDistill consists of three key stages: model refinement, continuous pre-training, and distillation-based fine-tuning [8][12] - The first stage addresses activation variance issues in low-bit models by introducing additional normalization layers to stabilize the optimization process [9][30] - The second stage involves continuous training with a small amount of pre-training data to adapt the model to the 1.58-bit representation before fine-tuning on specific tasks [11][32] - The third stage employs knowledge distillation techniques to align the performance of the quantized model with that of the full-precision teacher model [13][27] Group 3: Experimental Results - BitDistill demonstrates excellent scalability, achieving performance comparable to full-precision baselines while providing significant improvements in inference speed (approximately 2x) and memory usage (nearly 10x reduction) [19][20] - Experiments on text classification and summarization tasks show that the 1.58-bit BitDistill model maintains high accuracy and quality, with results indicating a strong performance across various model sizes [16][21] - The method exhibits cross-architecture generality, maintaining stable performance even when using different pre-trained models [22] Group 4: Ablation Studies - Ablation studies indicate that each stage of the BitDistill process is crucial for achieving the desired balance between efficiency and accuracy, with the removal of any stage leading to significant performance drops [25][26] - The combination of logits and attention distillation techniques yields the best results, highlighting the importance of using multiple strategies to mitigate quantization challenges [27][29]
卡帕西:强化学习很糟糕,但其他所有方法都更糟
量子位· 2025-10-18 09:30
Group 1 - The core viewpoint of the article is that achieving Artificial General Intelligence (AGI) will take at least another decade, as current AI systems need significant improvements to reach their full potential [5][10][28] - Karpathy emphasizes that existing AI systems lack maturity, multi-modal capabilities, and the ability to learn continuously, which are essential for them to function effectively in collaboration with humans [8][9][10] - He critiques the current state of Large Language Models (LLMs), stating that they have cognitive deficiencies and overestimate their capabilities, requiring substantial enhancements [16][18] Group 2 - Karpathy argues that reinforcement learning is more flawed than commonly perceived, as it reinforces all steps taken in reaching a correct answer, regardless of their validity, leading to inefficient learning [20][21][23] - He believes that AGI will not lead to a sudden leap in productivity but will follow a gradual growth pattern, similar to the historical 2% GDP growth trend observed with the internet [25][29] - The lengthy development of autonomous driving technology is attributed to the high stakes involved, where even minor errors can have severe consequences, necessitating extensive reliability improvements [30][32][33] Group 3 - As a full-time educator, Karpathy aims to establish a leading-edge educational institution that offers a unique mentorship experience, focusing on personalized learning and advanced AI education [34][36] - He highlights the importance of tailored teaching methods, which current LLMs cannot replicate, emphasizing the need for human instructors to provide appropriate challenges to students [36][38]
最新自进化综述!从静态模型到终身进化...
自动驾驶之心· 2025-10-17 00:03
Core Viewpoint - The article discusses the limitations of current AI agents, which rely heavily on static configurations and struggle to adapt to dynamic environments. It introduces the concept of "self-evolving AI agents" as a solution to these challenges, providing a systematic framework for their development and implementation [1][5][6]. Summary by Sections Need for Self-Evolving AI Agents - The rapid development of large language models (LLMs) has shown the potential of AI agents in various fields, but they are fundamentally limited by their dependence on manually designed static configurations [5][6]. Definition and Goals - Self-evolving AI agents are defined as autonomous systems that continuously and systematically optimize their internal components through interaction with their environment, adapting to changes in tasks, context, and resources while ensuring safety and performance [6][12]. Three Laws and Evolution Stages - The article outlines three laws for self-evolving AI agents, inspired by Asimov's laws, which serve as constraints during the design process [8][12]. It also describes a four-stage evolution process for LLM-driven agents, transitioning from static models to self-evolving systems [9]. Four-Component Feedback Loop - A unified technical framework is proposed, consisting of four components: system inputs, agent systems, environments, and optimizers, which work together in a feedback loop to facilitate the evolution of AI agents [10][11]. Technical Framework and Optimization - The article categorizes the optimization of self-evolving AI into three main directions: single-agent optimization, multi-agent optimization, and domain-specific optimization, detailing various techniques and methodologies for each [20][21][30]. Domain-Specific Applications - The paper highlights the application of self-evolving AI in specific fields such as biomedicine, programming, finance, and law, emphasizing the need for tailored approaches to meet the unique challenges of each domain [30][31][33]. Evaluation and Safety - The article discusses the importance of establishing evaluation methods to measure the effectiveness of self-evolving AI and addresses safety concerns associated with their evolution, proposing continuous monitoring and auditing mechanisms [34][40]. Future Challenges and Directions - The article identifies key challenges in the development of self-evolving AI, including balancing safety with evolution efficiency, improving evaluation systems, and enabling cross-domain adaptability [41][42]. Conclusion - The ultimate goal of self-evolving AI agents is to create systems that can collaborate with humans as partners rather than merely executing commands, marking a significant shift in the understanding and application of AI technology [42].
Sutton判定「LLM是死胡同」后,新访谈揭示AI困境
机器之心· 2025-10-15 07:33
Core Viewpoint - The article discusses Rich Sutton's critical perspective on large language models (LLMs), suggesting they may not align with the principles outlined in his work "The Bitter Lesson" and highlighting their limitations in learning from real-world interactions [1][3][22]. Group 1: Limitations of LLMs - Sutton argues that LLMs have significant flaws, particularly their inability to learn from ongoing interactions with the environment [3][21]. - He emphasizes that true intelligence should emerge from continuous reinforcement learning through dynamic interactions, rather than relying on extensive pre-training and supervised fine-tuning [3][4][22]. - The reliance on human knowledge and data in LLMs may lead to a lack of scalability and potential failure to meet expectations, as they are fundamentally limited by the biases present in the training data [24][25][26]. Group 2: Alternative Perspectives on Intelligence - Experts in the discussion, including Suzanne Gildert and Niamh Gavin, express skepticism about achieving pure reinforcement learning, suggesting that current systems often revert to imitation learning due to the difficulty in defining universal reward functions [7][11]. - The conversation highlights the need for systems that can autonomously learn in new environments, akin to how a squirrel learns to hide nuts, rather than relying solely on pre-existing data [8][10]. - There is a consensus that while LLMs exhibit impressive capabilities, they do not equate to true intelligence, as they lack the ability to explore and learn from their environment effectively [33][35]. Group 3: The Future of AI Development - The article suggests that the AI field is at a crossroads, where the dominance of certain paradigms may hinder innovation and lead to a cycle of self-limitation [28][29]. - Sutton warns that the current trajectory of LLMs, heavily reliant on human imitation, may not yield the breakthroughs needed for genuine understanding and reasoning capabilities [22][24]. - The discussion indicates a shift towards exploring more robust learning mechanisms that prioritize experience and exploration over mere data absorption [28][30].
4小时喜提专属 ChatGPT、卡帕西又整活,自曝Agent帮倒忙、手搓八千行代码,网友:跑完就当上机器学习工程师
3 6 Ke· 2025-10-14 12:52
Core Insights - Andrej Karpathy, former AI director at Tesla and co-founder of OpenAI, has released a new open-source project called nanochat, which has gained 7.9k stars on GitHub [1] - Nanochat is a minimalistic end-to-end training and inference toolchain designed to replicate a simplified version of ChatGPT, differing from Karpathy's previous project, nanoGPT [1][6] Project Overview - Nanochat allows users to train a conversational language model for approximately $100, achieving performance that surpasses GPT-2's CORE metric after about 12 hours of training [2][3] - The project can be initiated by launching a cloud GPU server and running a script, enabling users to interact with their trained model via a web interface [2] Technical Specifications - The project consists of around 8000 lines of code, primarily handwritten by Karpathy, emphasizing a clear code structure [7] - The architecture of nanochat is similar to the Llama model but is designed to be simpler, incorporating elements from modded-nanoGPT [7][8] - Key features include dense transformers, rotary embeddings, and a unique optimizer combining Muon and AdamW [8][9] Performance Metrics - Performance metrics for various training stages are provided, showing improvements in CORE, ARC-Challenge, ARC-Easy, GSM8K, HumanEval, and MMLU scores [5] Community Impact - The release of nanochat has generated significant interest on social media, with users expressing excitement about its potential to democratize access to language model training [10] - The project is expected to serve as a valuable resource for researchers and machine learning enthusiasts, enabling them to experiment with language models more easily [10]