Workflow
信息论
icon
Search documents
大模型的第一性原理:(二)信号处理篇
机器之心· 2026-01-30 08:49
Core Viewpoint - The article discusses the transformation of natural language processing problems into signal processing problems through semantic vectorization, emphasizing the importance of token embedding in large models and its connection to signal processing and information theory [2][32]. Semantic Embedding / Vectorization - The concept of using vectors to model semantics dates back to Luhn's 1953 paper, but significant breakthroughs were achieved in 2013 by Mikolov and others, who successfully trained neural network models to convert tokens into semantic vectors [6][9]. - The ideal semantic vectorization has not been fully realized, but the inner product of semantic vectors can represent semantic relevance at the token level [7][11]. - The semantic vector space can be modeled as a probability-inner product space, balancing complexity and effectiveness by using a unit sphere to define the space [8][10]. Optimal Semantic Vectorization - The optimal semantic encoding is closely related to downstream tasks, with the goal of predicting the next token. The semantic encoder should maximize the conditional mutual information between the next token and the current sequence [13][14]. - The article highlights that existing methods like Contrastive Predictive Coding (CPC) optimize the upper bound of the semantic encoder but may not achieve the optimal solution [15][19]. Transformer as a Nonlinear Time-Varying Vector Autoregressive Time Series - The Transformer model is identified as a self-regressive large language model that predicts the next token based on the input token sequence and previously generated tokens [21][30]. - The attention mechanism in Transformers can be mathematically expressed as a nonlinear time-varying vector autoregressive time series, which is crucial for predicting the next token [22][24]. Signal Processing and Information Theory - The article establishes a relationship between signal processing and information theory, noting that signal processing implements information theory principles in specific computational architectures [32][33]. - The transition from BIT in the information age to TOKEN in the AI era is proposed as a way to apply Shannon's information theory to the mathematical principles behind large models [36].
谷歌AI掌门人、诺奖得主Demis:AGI 需要打破“金鱼记忆”,而谷歌无论泡沫破裂与否都将是赢家
AI科技大本营· 2026-01-29 10:05
作者 | Big Technology Podcast 编译 | 王启隆 出品丨AI 科技大本营(ID:rgznai100) 如果说 Sam Altman 是 AI 时代的布道者,善于用宏大的愿景点燃公众的想象力;那么 Demis Hassabis 更像是一位在实验室里盯着显微镜的科学 家,冷静、严谨,对"炒作"有着天然的免疫力。 一年前,当整个硅谷都在因为 ChatGPT 的红利期似乎见顶而焦虑,甚至开始讨论"大语言模型(LLM)是否撞墙"时,Demis 却感到困惑。在他看来, 进步从未停止。他掌舵的 Google DeepMind 刚刚经历了 AlphaFold 3 的高光时刻,正试图将 AI 的触角从简单的聊天机器人延伸到生物学、物理学乃 至材料科学的最深处。 在达沃斯的一间木质会议室里,Demis 近期接受了 Big Technology 播客的专访。这场对话的特别之处在于,他没有回避那些尖锐的问题: 现在的 AI 是不是只有"金鱼记忆"?谷歌会不会为了财报在 Gemini 里塞满广告?所谓的 AGI 究竟是营销话术还是科学定义? 最令人印象深刻的是他对"智能载体"的断言。在纪录片《The Think ...
信息论如何成为复杂系统科学的核心工具
3 6 Ke· 2025-12-24 08:51
Group 1 - The article discusses the importance of information theory as a foundational tool for understanding complex systems, emphasizing its ability to quantify interactions among components and their environment [1][2] - Information theory is increasingly recognized as essential in the study of complex systems due to its capacity to describe, quantify, and understand emergent phenomena [1][2] - The article aims to elaborate on why and how information theory serves as a cornerstone for complex systems science, detailing its core concepts, advanced tools, and practical applications [1] Group 2 - The article introduces key metrics of information theory, starting with entropy, which quantifies uncertainty in a random variable [3][5] - Joint entropy and conditional entropy are explained, highlighting their roles in measuring uncertainty in multiple random variables [6] - Mutual information is presented as a measure of statistical dependence between variables, capable of capturing non-linear relationships [7][8] Group 3 - Transfer entropy is introduced as a dynamic measure of information flow in time series, useful for determining causal relationships in complex systems [13][14] - Active information storage (AIS) quantifies how much past information influences a system's current state, with implications for predicting future behavior [17] - Integrated information theory, proposed by Giulio Tononi, attempts to measure consciousness based on the degree of information integration within a system [19][20] Group 4 - The article discusses partial information decomposition (PID) as a method to analyze shared information among multiple variables, distinguishing between redundancy and synergy [26][27] - The concept of statistical complexity is introduced, measuring the minimum information required to predict future states based on historical data [22][23] - The article emphasizes the significance of network representations in modeling complex systems, differentiating between physical and statistical networks [34][35] Group 5 - The balance of integration and separation in complex systems is highlighted, with examples from neuroscience and economics illustrating the importance of this dynamic [36] - The article discusses the challenges of applying information theory in practice, particularly in estimating probability distributions from limited data [41][42] - Future directions in the application of information theory are suggested, including the use of neural networks for estimating information metrics and guiding evolutionary algorithms [43][44]
信息论如何成为复杂系统科学的核心工具
腾讯研究院· 2025-12-24 08:33
Core Concept - The article discusses the significance of information theory as a foundational tool for understanding complex systems, emphasizing its ability to quantify interactions among components and the system's environment [2][3]. Group 1: Key Metrics in Information Theory - Entropy is introduced as a fundamental measure of uncertainty, quantifying the expected level of surprise regarding the outcome of a random variable [5][7]. - Joint entropy measures the uncertainty of two random variables together, while conditional entropy reflects the uncertainty of one variable given the other [9]. - Mutual information quantifies the amount of information gained about one variable through the observation of another, capturing both linear and non-linear dependencies [10]. Group 2: Dynamic Features of Complex Systems - Transfer entropy extends mutual information to time series, measuring the directed information flow between variables, which is crucial for understanding causal relationships [16]. - Active information storage quantifies how much past information influences the current state of a system, indicating memory capacity [18]. - Integrated information theory, proposed by Giulio Tononi, attempts to measure consciousness based on the degree of information integration among system components [20]. Group 3: Information Decomposition - Partial information decomposition (PID) aims to break down the total information shared between variables into components such as redundancy, unique information, and synergy [29]. - Statistical complexity measures the minimum amount of information required to predict future states based on historical data, reflecting the internal structure and dynamics of a system [25]. Group 4: Network Representation of Complex Systems - Networks serve as a universal language for modeling complex systems, with edges representing statistical dependencies, and can be categorized into physical and statistical networks [40]. - The balance between integration and segregation within a system is crucial for its functionality, as seen in examples from neuroscience and economics [42]. Group 5: Practical Applications and Challenges - The article highlights the challenges of estimating probability distributions and information measures from limited data, which can lead to biases in results [49]. - Future directions include the use of neural information estimators to handle large and complex datasets, as well as the application of information theory in machine learning and evolutionary algorithms [52][53].
每日钉一下(再平衡策略,为什么被称为投资领域的免费午餐?)
银行螺丝钉· 2025-12-20 14:02
Group 1 - Many investors start their investment journey with index funds and seek ways to achieve good returns through them [2] - A free limited-time course is available that introduces investment techniques for index funds, along with course notes and mind maps for efficient learning [2] Group 2 - The rebalancing strategy is referred to as a free lunch in the investment field due to the asynchronous nature of stock and bond price movements [6] - Rebalancing involves adjusting the proportions of investments when the initial allocation changes due to market fluctuations [7] - Notable scientist Shannon, known for his work in information theory, also had an interest in investment and conducted public lectures on utilizing stock volatility for profit [8]
大模型「越想越错」?人大&腾讯团队用信息论揭示:什么时候该想、什么时候别想
机器之心· 2025-12-19 06:38
Core Insights - The article discusses the inefficiencies in the reasoning capabilities of large models, highlighting the need for a more effective approach to reasoning in AI systems [4][10][46] - The proposed solution, Adaptive Think, allows models to automatically stop reasoning when they reach a sufficient level of confidence, thus improving efficiency and accuracy [7][28][45] Group 1: Inefficiencies in Current Models - Current large models exhibit a tendency to overthink, leading to longer reasoning chains that often result in noise and decreased accuracy [3][19] - Research indicates that longer reasoning chains do not necessarily yield better results, as they can lead to diminishing returns and increased computational costs [19][20][36] - The study employs information theory metrics such as entropy and mutual information to evaluate the reasoning efficiency of models [6][12] Group 2: Adaptive Think Mechanism - The Adaptive Think strategy enables models to self-monitor their reasoning process, terminating when confidence is sufficiently high [28][29] - Experimental results show that Adaptive Think significantly reduces token consumption while maintaining or improving accuracy across various tasks [33][36] - The mechanism allows for dynamic adjustment of reasoning depth based on task difficulty, enhancing both speed and reliability [31][45] Group 3: Experimental Findings - In tests on the GSM8K dataset, Adaptive Think reduced average token usage by over 40% while improving accuracy by 0.93% compared to traditional methods [33] - The approach demonstrated effectiveness across multiple reasoning tasks, with notable improvements in efficiency for common-sense reasoning tasks [36][37] - The findings suggest that many models can achieve correct answers with fewer reasoning steps, challenging the notion that longer reasoning is inherently better [38][46]
一文讲透Agent的底层逻辑
Hu Xiu· 2025-10-22 14:47
Core Insights - The article emphasizes the importance of understanding AI Agents beyond mere API calls, highlighting the need for a structured cognitive process that enhances their capabilities [3][15][56] Group 1: Understanding AI Agents - The article identifies two common misconceptions about AI Agents: one that mystifies their capabilities and another that oversimplifies them as just repeated calls to ChatGPT [1][2] - It aims to establish a consensus on the cognitive processes that underpin AI Agents, asserting that their effectiveness lies in the design of these processes rather than just the underlying models [3][4] Group 2: Development Insights - The article outlines a structured approach to developing AI Agents, detailing the transition from "prompt engineers" to "Agent process architects" [7][72] - It discusses the threefold value of structured processes: providing a framework for thought, creating memory compression algorithms, and enabling interaction with the real world [6][55][66] Group 3: Theoretical Foundations - The article connects the effectiveness of the "Think -> Act -> Observe" cycle to foundational theories in cybernetics and information theory, explaining how feedback mechanisms enhance goal attainment and reduce uncertainty [74][75][91] - It illustrates the evolution from open-loop systems to closed-loop systems, emphasizing the importance of feedback in achieving reliable outcomes [77][84] Group 4: Practical Applications - The article uses a travel planning example to contrast the static outputs of traditional chatbots with the dynamic, iterative processes of AI Agents, showcasing the latter's ability to produce actionable and reliable results [40][48] - It highlights the significance of structured workflows in enhancing the quality and reliability of AI outputs, moving beyond mere text generation to a more interactive and iterative approach [55][68] Group 5: Future Directions - The article discusses the future role of developers as "Agent process architects," focusing on designing cognitive workflows, empowering AI with tools, and constructing decision-making contexts [100][102] - It emphasizes the need for advanced cognitive architectures that can manage complex tasks and improve execution efficiency while maintaining high-quality outcomes [106][111]
Agent 一年半开发复盘:大家对 Agent 的理解有错位,有效的「认知流程」很关键
Founder Park· 2025-10-22 12:46
Core Insights - The article emphasizes the importance of understanding AI Agents and their cognitive processes, arguing that the true power of AI Agents lies not in the models themselves but in the effective cognitive workflows designed around them [1][2][3]. Group 1: Understanding AI Agents - The author identifies two common misconceptions about AI Agents: one is the mystification of their capabilities, and the other is the oversimplification of their functions [1][2]. - A unified context is proposed to help practitioners understand what is meant by "Agentic" discussions, focusing on the cognitive processes that enhance AI capabilities [2][3]. Group 2: Development Framework - The article outlines a comprehensive framework for understanding the evolution of AI Agents, using a metaphor of a student's growth stages to illustrate the development of core capabilities [3][15]. - It discusses the transition from "prompt engineers" to "Agent process architects," highlighting the need for structured cognitive workflows that enhance AI performance [5][62]. Group 3: Cognitive Processes - The article breaks down the cognitive processes into several key components: Planning, Chain of Thought (CoT), Self-Reflection, and Tool Use, each contributing to the overall effectiveness of AI Agents [4][20][24]. - The importance of iterative processes is emphasized, showcasing how reflection and memory compression can lead to improved decision-making and learning [40][43]. Group 4: Practical Applications - A detailed comparison is made between traditional chatbots and AI Agents using a travel planning example, illustrating how AI Agents can dynamically adjust plans based on real-time information [27][30]. - The article highlights the significance of structured workflows in achieving high-quality, reliable outcomes, contrasting the static nature of traditional chatbots with the dynamic capabilities of AI Agents [35][36]. Group 5: Theoretical Foundations - The effectiveness of AI Agents is linked to foundational theories in Cybernetics and Information Theory, which explain how feedback loops and information acquisition reduce uncertainty in problem-solving [50][59]. - The article argues that the closed-loop nature of AI Agents allows them to continuously refine their actions based on observed outcomes, enhancing their ability to achieve set goals [55][58]. Group 6: Future Directions - The article concludes with a call for a shift in focus from merely creating prompts to designing intelligent processes that enable AI to self-plan, self-correct, and self-iterate [62][70]. - It emphasizes the need for performance engineering to address the challenges of execution efficiency while maintaining high-quality outcomes in AI applications [70][72].
超越ZIP的无损压缩来了,华盛顿大学让大模型成为无损文本压缩器
3 6 Ke· 2025-10-11 10:47
Core Insights - The article discusses the challenges of data storage arising from the generation of massive data by large language models (LLMs) and introduces an innovative solution called LLMc, which utilizes LLMs for lossless text compression [2][5]. Group 1: LLMc Overview - LLMc has demonstrated superior compression rates compared to traditional compression tools like ZIP and LZMA across various datasets, including Wikipedia, novels, and scientific abstracts [2]. - The project has been open-sourced, with the main author being Yi Pan, an undergraduate from Shanghai Jiao Tong University currently interning at the University of Washington [4]. Group 2: Compression Mechanism - The compression mechanism of LLMc is based on the principle of rank-based encoding, where the model predicts the next possible token and generates a probability distribution list [6]. - Instead of storing the token itself, LLMc stores the rank of the token in the probability list, which typically requires minimal storage space [6]. - During decompression, the same LLM and context are used to recreate the probability distribution, allowing for the accurate recovery of the original text using the stored rank [6]. Group 3: Challenges and Limitations - The research team identified several challenges with the current version of LLMc, including efficiency issues due to the quadratic relationship between LLM inference complexity and sequence length [7]. - The processing speed of LLMc is currently much lower than traditional compression algorithms due to its heavy reliance on large model inference [7]. - To ensure deterministic decompression, the system requires special kernels and integer encoding of token ranks instead of using logarithmic probabilities [8]. - The current implementation is primarily focused on natural language, with future exploration needed for extending its application to other modalities like images, videos, or binary data [9].
重磅发现!大模型的「aha moment」不是装腔作势,内部信息量暴增数倍!
机器之心· 2025-07-03 04:14
Core Insights - The article discusses a groundbreaking study that reveals the reasoning dynamics of large language models (LLMs) through the lens of mutual information, identifying "thinking tokens" as critical indicators of information peaks during reasoning [3][4][24]. Group 1: Key Findings - The study uncovers the phenomenon of "information peaks" in the reasoning trajectories of LLMs, indicating that the presence of thinking tokens correlates with a significant increase in the information related to the correct answer [3][4][5]. - Researchers demonstrated that higher accumulated mutual information during reasoning leads to a tighter bound on the probability of answering correctly, thus enhancing the model's performance [6][8]. - The research indicates that reasoning models exhibit more pronounced mutual information peaks compared to non-reasoning models, suggesting that enhanced training improves the encoding of relevant information [9][10]. Group 2: Thinking Tokens - Thinking tokens, which include phrases like "Hmm," "Wait," and "Therefore," are identified as linguistic manifestations of information peaks, playing a crucial role in guiding the model's reasoning process [10][11][15]. - Experimental results show that suppressing the generation of thinking tokens significantly impacts the model's performance on mathematical reasoning datasets, confirming their importance in effective reasoning [16][25]. Group 3: Applications - Two novel methods are proposed to enhance LLM reasoning performance: Representation Recycling (RR) and Thinking Token based Test-time Scaling (TTTS), both of which leverage the insights gained from the study [18][26]. - The RR method involves re-inputting representations associated with thinking tokens for additional computation, leading to improved performance on various reasoning benchmarks [20][26]. - The TTTS method encourages the model to generate thinking tokens when additional computation resources are available, resulting in sustained performance improvements across different datasets [21][22][26].