信息论
Search documents
一文讲透Agent的底层逻辑
Hu Xiu· 2025-10-22 14:47
这篇文章,源于我一年半的AI开发实践,也源于我离职这近两个月里和许多团队密集交流后的一个强烈感受。我发现,在讨论Agent时,我们常常陷入两种 误区:一些人将其神秘化,认为它无所不能;另一些人则将其过度简化,认为它"不过是把ChatGPT多调用几次"。 因为对 agentic 循环过程的体感缺少和原理的理解,形成认知的错位,最终导致我们的沟通成本很高。 因此,我写下这篇长文,希望能为我们这些从业者,建立一个关于Agent的体感和共识基础:AI Agent能力的质变,不仅在于底层大模型日益增长的智力, 更关键的,在于我们围绕模型所设计的、那一套行之有效的"认知流程"。 本文近万字,就是体感的建立和对这套"流程"的完整拆解。你可以根据这份指南,快速找到自己感兴趣的部分: 第一部分:建立直观理解 这里,我用了一个"学霸的五个成长阶段"的比喻,来描述Agent核心能力的演进过程。 同时,我们会分析那个被行业广泛使用的"旅行规划"案例。它就像一道"标准考题",在对比中,我们可以清晰地看到一个动态流程与一次性生成的本质区 别。 第二部分:面向开发者的核心 第四节是本文的技术核心。它会详细拆解"流程"带来的三重价值:如何用 ...
Agent 一年半开发复盘:大家对 Agent 的理解有错位,有效的「认知流程」很关键
Founder Park· 2025-10-22 12:46
Core Insights - The article emphasizes the importance of understanding AI Agents and their cognitive processes, arguing that the true power of AI Agents lies not in the models themselves but in the effective cognitive workflows designed around them [1][2][3]. Group 1: Understanding AI Agents - The author identifies two common misconceptions about AI Agents: one is the mystification of their capabilities, and the other is the oversimplification of their functions [1][2]. - A unified context is proposed to help practitioners understand what is meant by "Agentic" discussions, focusing on the cognitive processes that enhance AI capabilities [2][3]. Group 2: Development Framework - The article outlines a comprehensive framework for understanding the evolution of AI Agents, using a metaphor of a student's growth stages to illustrate the development of core capabilities [3][15]. - It discusses the transition from "prompt engineers" to "Agent process architects," highlighting the need for structured cognitive workflows that enhance AI performance [5][62]. Group 3: Cognitive Processes - The article breaks down the cognitive processes into several key components: Planning, Chain of Thought (CoT), Self-Reflection, and Tool Use, each contributing to the overall effectiveness of AI Agents [4][20][24]. - The importance of iterative processes is emphasized, showcasing how reflection and memory compression can lead to improved decision-making and learning [40][43]. Group 4: Practical Applications - A detailed comparison is made between traditional chatbots and AI Agents using a travel planning example, illustrating how AI Agents can dynamically adjust plans based on real-time information [27][30]. - The article highlights the significance of structured workflows in achieving high-quality, reliable outcomes, contrasting the static nature of traditional chatbots with the dynamic capabilities of AI Agents [35][36]. Group 5: Theoretical Foundations - The effectiveness of AI Agents is linked to foundational theories in Cybernetics and Information Theory, which explain how feedback loops and information acquisition reduce uncertainty in problem-solving [50][59]. - The article argues that the closed-loop nature of AI Agents allows them to continuously refine their actions based on observed outcomes, enhancing their ability to achieve set goals [55][58]. Group 6: Future Directions - The article concludes with a call for a shift in focus from merely creating prompts to designing intelligent processes that enable AI to self-plan, self-correct, and self-iterate [62][70]. - It emphasizes the need for performance engineering to address the challenges of execution efficiency while maintaining high-quality outcomes in AI applications [70][72].
超越ZIP的无损压缩来了,华盛顿大学让大模型成为无损文本压缩器
3 6 Ke· 2025-10-11 10:47
对此,华盛顿大学(UW)SyFI实验室的研究者们提出了一个创新的解决方案:LLMc,即利用大型语言模型自身进行无损文本压缩的引擎。 基准测试结果表明,无论是在维基百科、小说文本还是科学摘要等多种数据集上,LLMc的压缩率都优于传统的压缩工具(如ZIP和LZMA)。同时,与其 他以LLM为基础的闭源压缩系统相比,LLMc也表现出同等甚至更优的性能。 当大语言模型生成海量数据时,数据存储的难题也随之而来。 LLMc并不直接存储词元本身(例如其ID),而是存储该词元在概率排序列表中的"排名"(rank)。这些排名通常是非常小的整数,因此占用的存储空间 极小。 值得一提的是,该项目已经开源,主要作者是来自上海交通大学ACM班的本科生Yi Pan,目前正在华盛顿大学实习。 LLMc的压缩机制 LLMc的灵感来源于实验室一年前的一次内部讨论。当时,研究者们面临一个核心挑战:LLM推理中涉及的内核操作具有高度的非确定性,这使得精确、 可复现的压缩和解压变得困难。 但随着业界在确定性LLM推理方面取得突破,这一问题迎刃而解,也为新引擎的诞生铺平了道路。研究团队顺势快速构建了LLMc的原型,并成功证明用 LLM进行高效压缩的可 ...
重磅发现!大模型的「aha moment」不是装腔作势,内部信息量暴增数倍!
机器之心· 2025-07-03 04:14
Core Insights - The article discusses a groundbreaking study that reveals the reasoning dynamics of large language models (LLMs) through the lens of mutual information, identifying "thinking tokens" as critical indicators of information peaks during reasoning [3][4][24]. Group 1: Key Findings - The study uncovers the phenomenon of "information peaks" in the reasoning trajectories of LLMs, indicating that the presence of thinking tokens correlates with a significant increase in the information related to the correct answer [3][4][5]. - Researchers demonstrated that higher accumulated mutual information during reasoning leads to a tighter bound on the probability of answering correctly, thus enhancing the model's performance [6][8]. - The research indicates that reasoning models exhibit more pronounced mutual information peaks compared to non-reasoning models, suggesting that enhanced training improves the encoding of relevant information [9][10]. Group 2: Thinking Tokens - Thinking tokens, which include phrases like "Hmm," "Wait," and "Therefore," are identified as linguistic manifestations of information peaks, playing a crucial role in guiding the model's reasoning process [10][11][15]. - Experimental results show that suppressing the generation of thinking tokens significantly impacts the model's performance on mathematical reasoning datasets, confirming their importance in effective reasoning [16][25]. Group 3: Applications - Two novel methods are proposed to enhance LLM reasoning performance: Representation Recycling (RR) and Thinking Token based Test-time Scaling (TTTS), both of which leverage the insights gained from the study [18][26]. - The RR method involves re-inputting representations associated with thinking tokens for additional computation, leading to improved performance on various reasoning benchmarks [20][26]. - The TTTS method encourages the model to generate thinking tokens when additional computation resources are available, resulting in sustained performance improvements across different datasets [21][22][26].
最新发现!每参数3.6比特,语言模型最多能记住这么多
机器之心· 2025-06-04 04:41
Core Insights - The memory capacity of GPT series models is approximately 3.6 bits per parameter, indicating a limit beyond which models stop memorizing and begin to generalize [1][4][27]. Group 1: Memory and Generalization - The research distinguishes between two types of memory: unexpected memory (specific dataset information) and generalization (understanding of the real data generation process) [5][7]. - A new method was proposed to estimate a model's understanding of specific data points, which helps measure the capacity of modern language models [2][8]. Group 2: Model Capacity and Measurement - The study defines model capacity as the total amount of memory that can be stored across all parameters of a specific language model [17][18]. - The maximum memory capacity is reached when the model no longer increases its memory with larger datasets, indicating saturation [19][28]. - Experiments showed that the memory capacity of models scales with the number of parameters, with a stable memory of 3.5 to 3.6 bits per parameter observed [27][28]. Group 3: Experimental Findings - The research involved training hundreds of transformer language models with parameters ranging from 500,000 to 1.5 billion, leading to insights on scaling laws related to model capacity and data size [6][25]. - Results indicated that even with different dataset sizes, the memory bits remained consistent, reinforcing the relationship between model capacity and parameter count [28][29]. - The impact of precision on capacity was analyzed, revealing that increasing precision from bfloat16 to float32 slightly improved capacity, with average values rising from 3.51 bits/parameter to 3.83 bits/parameter [31][32].
当答案变得廉价时,好问题就是新的稀缺品
3 6 Ke· 2025-05-04 00:03
Group 1 - The core argument of the article is that in an era where answers are easily accessible, the value lies in asking the right questions, which can reshape understanding and drive creativity [1][4][19] - The invention of photography in the 1830s challenged traditional artistic standards, leading artists to focus on subjective experiences rather than mere replication of reality [3][10][11] - The emergence of large language models (LLMs) has made obtaining answers cheaper, but this has led to a decline in the quality of inquiry and an increase in the cost of asking good questions [15][17][26] Group 2 - The article emphasizes that the value of information is proportional to the uncertainty it eliminates, as illustrated by Claude Shannon's information theory [21][22][23] - It argues that in a world of information overload, the challenge is not the lack of facts but the misalignment of attention, leading to a focus on quantity over quality in answers [31][32][46] - The piece highlights the importance of redefining problems and frameworks to navigate structural uncertainties effectively, suggesting that good questions can expand the boundaries of understanding [37][38][39]