Workflow
信息论
icon
Search documents
每日钉一下(再平衡策略,为什么被称为投资领域的免费午餐?)
银行螺丝钉· 2025-12-20 14:02
文 | 银行螺丝钉 (转载请注明出处) 想要获取这个课程,可以添加下方「课程小助手」,回复「 指数基金 」领取哦~ 更有课程笔记、思维导图,帮您快速搞懂课程脉络,学习更高效。 很多投资者都是从投资指数基金开始自己的投资之路的。 但是怎样投资指数基金,才能获得好收益? 这里有一门限时免费的福利课程,介绍了指数基金的投资技巧。 ◆◆◆ urnel 银行螺丝钉 #螺丝钉小知识 再平衡策略,为什么被称为投资 领域的免费午餐? 每一种股债配置策略,都会涉及到一个 问题:股票和债券涨跌是不同步的, 开始分配好比例,后面总会发生变化。 比如,一开始,50%的资金投入品种A, 50%的资金投入品种B。A和B的涨跌并 不同步,一段时间后A上涨比较多,比例 就超过50%了。 那如何重新调整比例,这就是再平衡。 再平衡策略的好处,之前有人研究过, 而且是一位非常出名的大师,就是信息 论的创始人香农。 香农是近代历史上数一数二的科学家, 以一己之力研究出了信息论,进而推动 现代的通信、计算机、互联网等一系列 信息科技的出现,同时他对投资非常感 兴趣。 1966-1971年,香农在麻省理工学院做 过几次投资的公开讲座,主题就是利用 股 ...
大模型「越想越错」?人大&腾讯团队用信息论揭示:什么时候该想、什么时候别想
机器之心· 2025-12-19 06:38
本文的第一作者雍希贤是来中国人民大学的博士生,研究方向聚焦于 Humanoid AI,LLM Coginition & Reasoning。通讯作者为中国人民大学的周骁副教授以及 腾讯天衍实验室的吴贤。 当前,大模型的「推理能力」几乎成为行业最热词。o1、R1、QwQ 类强化学习(RL)推理模型,让模型会「 想」、会解析复杂问题,甚至能像人一样写长长的 推理过程(Chain-of-Thought,CoT),在数学、逻辑与常识等领域任务中展现出强大的多步推理能力。 看上去很强,但问题也随之出现: 很多题模型似乎「 一眼就能猜中八成」,但它还是坚持把推理写到几百、几千 token,有时甚至越写越乱、越想越错。 如果你用过这些模型,就会感受到: 来自中国人民大学、腾讯 Jarvis Lab、西湖大学的研究团队,看到了这背后的核心: 当前大模型的「推理机制」其实非常低效,甚至常常在自我制造噪声。 于是研究团队从另一个视角切入 —— 信息论 。 通过「熵(entropy)」与「互信息(mutual information)」等底层信息指标,重新衡量模型思考的价值。 论文标题: Think or Not? Explori ...
一文讲透Agent的底层逻辑
Hu Xiu· 2025-10-22 14:47
Core Insights - The article emphasizes the importance of understanding AI Agents beyond mere API calls, highlighting the need for a structured cognitive process that enhances their capabilities [3][15][56] Group 1: Understanding AI Agents - The article identifies two common misconceptions about AI Agents: one that mystifies their capabilities and another that oversimplifies them as just repeated calls to ChatGPT [1][2] - It aims to establish a consensus on the cognitive processes that underpin AI Agents, asserting that their effectiveness lies in the design of these processes rather than just the underlying models [3][4] Group 2: Development Insights - The article outlines a structured approach to developing AI Agents, detailing the transition from "prompt engineers" to "Agent process architects" [7][72] - It discusses the threefold value of structured processes: providing a framework for thought, creating memory compression algorithms, and enabling interaction with the real world [6][55][66] Group 3: Theoretical Foundations - The article connects the effectiveness of the "Think -> Act -> Observe" cycle to foundational theories in cybernetics and information theory, explaining how feedback mechanisms enhance goal attainment and reduce uncertainty [74][75][91] - It illustrates the evolution from open-loop systems to closed-loop systems, emphasizing the importance of feedback in achieving reliable outcomes [77][84] Group 4: Practical Applications - The article uses a travel planning example to contrast the static outputs of traditional chatbots with the dynamic, iterative processes of AI Agents, showcasing the latter's ability to produce actionable and reliable results [40][48] - It highlights the significance of structured workflows in enhancing the quality and reliability of AI outputs, moving beyond mere text generation to a more interactive and iterative approach [55][68] Group 5: Future Directions - The article discusses the future role of developers as "Agent process architects," focusing on designing cognitive workflows, empowering AI with tools, and constructing decision-making contexts [100][102] - It emphasizes the need for advanced cognitive architectures that can manage complex tasks and improve execution efficiency while maintaining high-quality outcomes [106][111]
Agent 一年半开发复盘:大家对 Agent 的理解有错位,有效的「认知流程」很关键
Founder Park· 2025-10-22 12:46
Core Insights - The article emphasizes the importance of understanding AI Agents and their cognitive processes, arguing that the true power of AI Agents lies not in the models themselves but in the effective cognitive workflows designed around them [1][2][3]. Group 1: Understanding AI Agents - The author identifies two common misconceptions about AI Agents: one is the mystification of their capabilities, and the other is the oversimplification of their functions [1][2]. - A unified context is proposed to help practitioners understand what is meant by "Agentic" discussions, focusing on the cognitive processes that enhance AI capabilities [2][3]. Group 2: Development Framework - The article outlines a comprehensive framework for understanding the evolution of AI Agents, using a metaphor of a student's growth stages to illustrate the development of core capabilities [3][15]. - It discusses the transition from "prompt engineers" to "Agent process architects," highlighting the need for structured cognitive workflows that enhance AI performance [5][62]. Group 3: Cognitive Processes - The article breaks down the cognitive processes into several key components: Planning, Chain of Thought (CoT), Self-Reflection, and Tool Use, each contributing to the overall effectiveness of AI Agents [4][20][24]. - The importance of iterative processes is emphasized, showcasing how reflection and memory compression can lead to improved decision-making and learning [40][43]. Group 4: Practical Applications - A detailed comparison is made between traditional chatbots and AI Agents using a travel planning example, illustrating how AI Agents can dynamically adjust plans based on real-time information [27][30]. - The article highlights the significance of structured workflows in achieving high-quality, reliable outcomes, contrasting the static nature of traditional chatbots with the dynamic capabilities of AI Agents [35][36]. Group 5: Theoretical Foundations - The effectiveness of AI Agents is linked to foundational theories in Cybernetics and Information Theory, which explain how feedback loops and information acquisition reduce uncertainty in problem-solving [50][59]. - The article argues that the closed-loop nature of AI Agents allows them to continuously refine their actions based on observed outcomes, enhancing their ability to achieve set goals [55][58]. Group 6: Future Directions - The article concludes with a call for a shift in focus from merely creating prompts to designing intelligent processes that enable AI to self-plan, self-correct, and self-iterate [62][70]. - It emphasizes the need for performance engineering to address the challenges of execution efficiency while maintaining high-quality outcomes in AI applications [70][72].
超越ZIP的无损压缩来了,华盛顿大学让大模型成为无损文本压缩器
3 6 Ke· 2025-10-11 10:47
Core Insights - The article discusses the challenges of data storage arising from the generation of massive data by large language models (LLMs) and introduces an innovative solution called LLMc, which utilizes LLMs for lossless text compression [2][5]. Group 1: LLMc Overview - LLMc has demonstrated superior compression rates compared to traditional compression tools like ZIP and LZMA across various datasets, including Wikipedia, novels, and scientific abstracts [2]. - The project has been open-sourced, with the main author being Yi Pan, an undergraduate from Shanghai Jiao Tong University currently interning at the University of Washington [4]. Group 2: Compression Mechanism - The compression mechanism of LLMc is based on the principle of rank-based encoding, where the model predicts the next possible token and generates a probability distribution list [6]. - Instead of storing the token itself, LLMc stores the rank of the token in the probability list, which typically requires minimal storage space [6]. - During decompression, the same LLM and context are used to recreate the probability distribution, allowing for the accurate recovery of the original text using the stored rank [6]. Group 3: Challenges and Limitations - The research team identified several challenges with the current version of LLMc, including efficiency issues due to the quadratic relationship between LLM inference complexity and sequence length [7]. - The processing speed of LLMc is currently much lower than traditional compression algorithms due to its heavy reliance on large model inference [7]. - To ensure deterministic decompression, the system requires special kernels and integer encoding of token ranks instead of using logarithmic probabilities [8]. - The current implementation is primarily focused on natural language, with future exploration needed for extending its application to other modalities like images, videos, or binary data [9].
重磅发现!大模型的「aha moment」不是装腔作势,内部信息量暴增数倍!
机器之心· 2025-07-03 04:14
Core Insights - The article discusses a groundbreaking study that reveals the reasoning dynamics of large language models (LLMs) through the lens of mutual information, identifying "thinking tokens" as critical indicators of information peaks during reasoning [3][4][24]. Group 1: Key Findings - The study uncovers the phenomenon of "information peaks" in the reasoning trajectories of LLMs, indicating that the presence of thinking tokens correlates with a significant increase in the information related to the correct answer [3][4][5]. - Researchers demonstrated that higher accumulated mutual information during reasoning leads to a tighter bound on the probability of answering correctly, thus enhancing the model's performance [6][8]. - The research indicates that reasoning models exhibit more pronounced mutual information peaks compared to non-reasoning models, suggesting that enhanced training improves the encoding of relevant information [9][10]. Group 2: Thinking Tokens - Thinking tokens, which include phrases like "Hmm," "Wait," and "Therefore," are identified as linguistic manifestations of information peaks, playing a crucial role in guiding the model's reasoning process [10][11][15]. - Experimental results show that suppressing the generation of thinking tokens significantly impacts the model's performance on mathematical reasoning datasets, confirming their importance in effective reasoning [16][25]. Group 3: Applications - Two novel methods are proposed to enhance LLM reasoning performance: Representation Recycling (RR) and Thinking Token based Test-time Scaling (TTTS), both of which leverage the insights gained from the study [18][26]. - The RR method involves re-inputting representations associated with thinking tokens for additional computation, leading to improved performance on various reasoning benchmarks [20][26]. - The TTTS method encourages the model to generate thinking tokens when additional computation resources are available, resulting in sustained performance improvements across different datasets [21][22][26].
最新发现!每参数3.6比特,语言模型最多能记住这么多
机器之心· 2025-06-04 04:41
Core Insights - The memory capacity of GPT series models is approximately 3.6 bits per parameter, indicating a limit beyond which models stop memorizing and begin to generalize [1][4][27]. Group 1: Memory and Generalization - The research distinguishes between two types of memory: unexpected memory (specific dataset information) and generalization (understanding of the real data generation process) [5][7]. - A new method was proposed to estimate a model's understanding of specific data points, which helps measure the capacity of modern language models [2][8]. Group 2: Model Capacity and Measurement - The study defines model capacity as the total amount of memory that can be stored across all parameters of a specific language model [17][18]. - The maximum memory capacity is reached when the model no longer increases its memory with larger datasets, indicating saturation [19][28]. - Experiments showed that the memory capacity of models scales with the number of parameters, with a stable memory of 3.5 to 3.6 bits per parameter observed [27][28]. Group 3: Experimental Findings - The research involved training hundreds of transformer language models with parameters ranging from 500,000 to 1.5 billion, leading to insights on scaling laws related to model capacity and data size [6][25]. - Results indicated that even with different dataset sizes, the memory bits remained consistent, reinforcing the relationship between model capacity and parameter count [28][29]. - The impact of precision on capacity was analyzed, revealing that increasing precision from bfloat16 to float32 slightly improved capacity, with average values rising from 3.51 bits/parameter to 3.83 bits/parameter [31][32].
当答案变得廉价时,好问题就是新的稀缺品
3 6 Ke· 2025-05-04 00:03
Group 1 - The core argument of the article is that in an era where answers are easily accessible, the value lies in asking the right questions, which can reshape understanding and drive creativity [1][4][19] - The invention of photography in the 1830s challenged traditional artistic standards, leading artists to focus on subjective experiences rather than mere replication of reality [3][10][11] - The emergence of large language models (LLMs) has made obtaining answers cheaper, but this has led to a decline in the quality of inquiry and an increase in the cost of asking good questions [15][17][26] Group 2 - The article emphasizes that the value of information is proportional to the uncertainty it eliminates, as illustrated by Claude Shannon's information theory [21][22][23] - It argues that in a world of information overload, the challenge is not the lack of facts but the misalignment of attention, leading to a focus on quantity over quality in answers [31][32][46] - The piece highlights the importance of redefining problems and frameworks to navigate structural uncertainties effectively, suggesting that good questions can expand the boundaries of understanding [37][38][39]