信息论
Search documents
ICLR 2026 Oral | 大道至简!斯坦福、英伟达、新国立联合推出InfoTok,用信息论重新定义高效视频分词
机器之心· 2026-03-30 06:52
Core Insights - The article discusses the introduction of InfoTok, an adaptive video tokenizer that utilizes information theory to optimize token allocation based on video content complexity, achieving a 2.3 times compression rate and 11 times faster inference speed compared to similar adaptive solutions [2][41]. Group 1: Motivation and Theory - Current visual tokenizers apply a fixed compression rate, leading to inefficient token allocation regardless of video complexity, which is not optimal [9][10]. - InfoTok aims to address this inefficiency by leveraging Shannon's information theory, which suggests that the amount of information dictates the number of tokens required for encoding [11][12]. - The ideal video tokenizer should achieve high compression rates, maintain high fidelity, and capture semantically meaningful content [12]. Group 2: Methodology - InfoTok employs two main components: the ELBO router, which determines the number of tokens to allocate, and the adaptive compressor, which encodes the data into a variable-length token sequence [19][23]. - The ELBO router uses a computable proxy to measure the predictability of video content, allowing for near-optimal token allocation based on content complexity [20][21]. - The adaptive compressor intelligently packages fixed-length embeddings into a variable-length token sequence, optimizing information retention and compression [25][26]. Group 3: Experimental Results - InfoTok demonstrated superior performance in video reconstruction benchmarks, achieving lossless reconstruction while saving 20% of tokens and outperforming ElasticTok at a 2.3 times compression rate [41][44]. - The framework consistently outperformed heuristic methods across all compression levels, showcasing significant improvements in reconstruction quality and inference efficiency [44][45]. - Visual results indicated that InfoTok dynamically adjusts token allocation based on scene complexity, effectively balancing compression and quality [38][39]. Group 4: Future Prospects - The principles behind InfoTok's information-theoretic framework could extend beyond video to other domains such as images and 3D scenes, suggesting a broader application of adaptive tokenization [48]. - The integration of adaptive tokenization into video generation pipelines could enhance both quality and efficiency, marking a significant advancement in AI video generation technology [48].
大模型的第一性原理:(三)信息论篇
机器之心· 2026-03-04 09:15
Group 1 - The article introduces the concept of Semantic Information Theory, which adapts Shannon's information theory to explain the underlying principles of large models by shifting the focus from BIT to TOKEN [2][22]. - Shannon's information theory established a mathematical framework for reliable communication in noisy digital systems, which is foundational for understanding modern digital communication [2][4]. - The three main conclusions of Shannon's theory include the source coding theorem, the noise channel coding theorem, and the separation theorem, which collectively provide a comprehensive understanding of communication systems [7][14][15]. Group 2 - The article discusses the Rate-Distortion function, which characterizes lossy compression performance, and emphasizes the importance of mutual information in this context [24][25]. - Directed Information is introduced as a measure to describe the statistical correlation between inputs and outputs in communication systems, particularly in feedback scenarios [27][33]. - The concept of Directed Information Density is also presented, which extends the idea of directed information to a random variable context, providing a framework for analyzing information flow in large models [36][39]. Group 3 - The article posits that large models can be viewed as stateful, feedback channels, where the flow of semantic information can be quantified using directed information [42][44]. - It outlines the training phase's performance metrics, including the directed rate-distortion function, which defines the relationship between input and output in terms of human preferences [45][49]. - The inference phase is characterized by the semantic information flow, which measures the information transfer from input to output tokens, highlighting the model's predictive capabilities [50][56]. Group 4 - The article concludes that the core concept of the AI era is TOKEN, which connects experience and reasoning, similar to how BIT defined the information age [67][68]. - It emphasizes the need for new research and development frameworks centered around this concept to advance towards Artificial General Intelligence (AGI) and Artificial Super Intelligence (ASI) [68].
大模型的第一性原理:(二)信号处理篇
机器之心· 2026-01-30 08:49
Core Viewpoint - The article discusses the transformation of natural language processing problems into signal processing problems through semantic vectorization, emphasizing the importance of token embedding in large models and its connection to signal processing and information theory [2][32]. Semantic Embedding / Vectorization - The concept of using vectors to model semantics dates back to Luhn's 1953 paper, but significant breakthroughs were achieved in 2013 by Mikolov and others, who successfully trained neural network models to convert tokens into semantic vectors [6][9]. - The ideal semantic vectorization has not been fully realized, but the inner product of semantic vectors can represent semantic relevance at the token level [7][11]. - The semantic vector space can be modeled as a probability-inner product space, balancing complexity and effectiveness by using a unit sphere to define the space [8][10]. Optimal Semantic Vectorization - The optimal semantic encoding is closely related to downstream tasks, with the goal of predicting the next token. The semantic encoder should maximize the conditional mutual information between the next token and the current sequence [13][14]. - The article highlights that existing methods like Contrastive Predictive Coding (CPC) optimize the upper bound of the semantic encoder but may not achieve the optimal solution [15][19]. Transformer as a Nonlinear Time-Varying Vector Autoregressive Time Series - The Transformer model is identified as a self-regressive large language model that predicts the next token based on the input token sequence and previously generated tokens [21][30]. - The attention mechanism in Transformers can be mathematically expressed as a nonlinear time-varying vector autoregressive time series, which is crucial for predicting the next token [22][24]. Signal Processing and Information Theory - The article establishes a relationship between signal processing and information theory, noting that signal processing implements information theory principles in specific computational architectures [32][33]. - The transition from BIT in the information age to TOKEN in the AI era is proposed as a way to apply Shannon's information theory to the mathematical principles behind large models [36].
谷歌AI掌门人、诺奖得主Demis:AGI 需要打破“金鱼记忆”,而谷歌无论泡沫破裂与否都将是赢家
AI科技大本营· 2026-01-29 10:05
Core Insights - Demis Hassabis emphasizes that AI progress has not stalled, countering the "hitting a wall" narrative prevalent in the industry, and highlights ongoing advancements in AI capabilities, particularly through optimization of existing architectures and data [4][5][6] - The concept of AGI (Artificial General Intelligence) is defined scientifically by Hassabis, who argues that it should be able to perform all cognitive tasks that humans can, and not merely be a marketing term [10][12] - The future of AI is envisioned to be embodied in smart glasses, which would serve as a universal digital assistant, enhancing user experience by providing hands-free interaction with the environment [20][22] Group 1: AI Progress and Challenges - Hassabis clarifies that concerns about data exhaustion and the limitations of current AI models are overstated, as there is still significant potential for improvement through existing technologies [5][6] - The debate on whether scaling existing models is sufficient or if new architectures are needed is ongoing, with Hassabis leaning towards the necessity for new inventions to achieve AGI [6][7] - Current AI systems exhibit limitations in continuous learning and memory retention, which are critical for achieving AGI [8][11] Group 2: AGI Definition and Future - AGI is defined as a system capable of performing all cognitive tasks, with an emphasis on the need for continuous learning and the ability to understand and interact with the physical world [10][12][13] - Hassabis believes that true AGI is still 5 to 10 years away, and it must include capabilities beyond human intelligence, such as understanding complex physical interactions [14][15] - The integration of multimodal models is crucial for developing a comprehensive understanding of the world, which is necessary for achieving AGI [15][19] Group 3: Trust and Commercialization - The introduction of advertisements in AI systems poses a risk to user trust, which is essential for the effectiveness of AI assistants [22][23] - Hassabis stresses the importance of maintaining user trust and ensuring that AI serves the user's best interests without commercial interference [23][24] - The potential for AI to enhance existing products and services is highlighted, with a focus on integrating AI into Google's extensive product ecosystem [26][27] Group 4: Philosophical Perspective on Information - Hassabis presents a philosophical view that information is the fundamental unit of the universe, suggesting that AI acts as an accelerator in understanding and processing this information [29][30] - The application of AI in solving complex problems, such as protein folding and climate change, illustrates its potential to revolutionize various fields [30][32] - The future of work is envisioned as a collaboration between humans and AI, where AI handles mundane tasks, allowing humans to focus on higher-level creative and scientific endeavors [32]
信息论如何成为复杂系统科学的核心工具
3 6 Ke· 2025-12-24 08:51
Group 1 - The article discusses the importance of information theory as a foundational tool for understanding complex systems, emphasizing its ability to quantify interactions among components and their environment [1][2] - Information theory is increasingly recognized as essential in the study of complex systems due to its capacity to describe, quantify, and understand emergent phenomena [1][2] - The article aims to elaborate on why and how information theory serves as a cornerstone for complex systems science, detailing its core concepts, advanced tools, and practical applications [1] Group 2 - The article introduces key metrics of information theory, starting with entropy, which quantifies uncertainty in a random variable [3][5] - Joint entropy and conditional entropy are explained, highlighting their roles in measuring uncertainty in multiple random variables [6] - Mutual information is presented as a measure of statistical dependence between variables, capable of capturing non-linear relationships [7][8] Group 3 - Transfer entropy is introduced as a dynamic measure of information flow in time series, useful for determining causal relationships in complex systems [13][14] - Active information storage (AIS) quantifies how much past information influences a system's current state, with implications for predicting future behavior [17] - Integrated information theory, proposed by Giulio Tononi, attempts to measure consciousness based on the degree of information integration within a system [19][20] Group 4 - The article discusses partial information decomposition (PID) as a method to analyze shared information among multiple variables, distinguishing between redundancy and synergy [26][27] - The concept of statistical complexity is introduced, measuring the minimum information required to predict future states based on historical data [22][23] - The article emphasizes the significance of network representations in modeling complex systems, differentiating between physical and statistical networks [34][35] Group 5 - The balance of integration and separation in complex systems is highlighted, with examples from neuroscience and economics illustrating the importance of this dynamic [36] - The article discusses the challenges of applying information theory in practice, particularly in estimating probability distributions from limited data [41][42] - Future directions in the application of information theory are suggested, including the use of neural networks for estimating information metrics and guiding evolutionary algorithms [43][44]
信息论如何成为复杂系统科学的核心工具
腾讯研究院· 2025-12-24 08:33
Core Concept - The article discusses the significance of information theory as a foundational tool for understanding complex systems, emphasizing its ability to quantify interactions among components and the system's environment [2][3]. Group 1: Key Metrics in Information Theory - Entropy is introduced as a fundamental measure of uncertainty, quantifying the expected level of surprise regarding the outcome of a random variable [5][7]. - Joint entropy measures the uncertainty of two random variables together, while conditional entropy reflects the uncertainty of one variable given the other [9]. - Mutual information quantifies the amount of information gained about one variable through the observation of another, capturing both linear and non-linear dependencies [10]. Group 2: Dynamic Features of Complex Systems - Transfer entropy extends mutual information to time series, measuring the directed information flow between variables, which is crucial for understanding causal relationships [16]. - Active information storage quantifies how much past information influences the current state of a system, indicating memory capacity [18]. - Integrated information theory, proposed by Giulio Tononi, attempts to measure consciousness based on the degree of information integration among system components [20]. Group 3: Information Decomposition - Partial information decomposition (PID) aims to break down the total information shared between variables into components such as redundancy, unique information, and synergy [29]. - Statistical complexity measures the minimum amount of information required to predict future states based on historical data, reflecting the internal structure and dynamics of a system [25]. Group 4: Network Representation of Complex Systems - Networks serve as a universal language for modeling complex systems, with edges representing statistical dependencies, and can be categorized into physical and statistical networks [40]. - The balance between integration and segregation within a system is crucial for its functionality, as seen in examples from neuroscience and economics [42]. Group 5: Practical Applications and Challenges - The article highlights the challenges of estimating probability distributions and information measures from limited data, which can lead to biases in results [49]. - Future directions include the use of neural information estimators to handle large and complex datasets, as well as the application of information theory in machine learning and evolutionary algorithms [52][53].
每日钉一下(再平衡策略,为什么被称为投资领域的免费午餐?)
银行螺丝钉· 2025-12-20 14:02
Group 1 - Many investors start their investment journey with index funds and seek ways to achieve good returns through them [2] - A free limited-time course is available that introduces investment techniques for index funds, along with course notes and mind maps for efficient learning [2] Group 2 - The rebalancing strategy is referred to as a free lunch in the investment field due to the asynchronous nature of stock and bond price movements [6] - Rebalancing involves adjusting the proportions of investments when the initial allocation changes due to market fluctuations [7] - Notable scientist Shannon, known for his work in information theory, also had an interest in investment and conducted public lectures on utilizing stock volatility for profit [8]
大模型「越想越错」?人大&腾讯团队用信息论揭示:什么时候该想、什么时候别想
机器之心· 2025-12-19 06:38
Core Insights - The article discusses the inefficiencies in the reasoning capabilities of large models, highlighting the need for a more effective approach to reasoning in AI systems [4][10][46] - The proposed solution, Adaptive Think, allows models to automatically stop reasoning when they reach a sufficient level of confidence, thus improving efficiency and accuracy [7][28][45] Group 1: Inefficiencies in Current Models - Current large models exhibit a tendency to overthink, leading to longer reasoning chains that often result in noise and decreased accuracy [3][19] - Research indicates that longer reasoning chains do not necessarily yield better results, as they can lead to diminishing returns and increased computational costs [19][20][36] - The study employs information theory metrics such as entropy and mutual information to evaluate the reasoning efficiency of models [6][12] Group 2: Adaptive Think Mechanism - The Adaptive Think strategy enables models to self-monitor their reasoning process, terminating when confidence is sufficiently high [28][29] - Experimental results show that Adaptive Think significantly reduces token consumption while maintaining or improving accuracy across various tasks [33][36] - The mechanism allows for dynamic adjustment of reasoning depth based on task difficulty, enhancing both speed and reliability [31][45] Group 3: Experimental Findings - In tests on the GSM8K dataset, Adaptive Think reduced average token usage by over 40% while improving accuracy by 0.93% compared to traditional methods [33] - The approach demonstrated effectiveness across multiple reasoning tasks, with notable improvements in efficiency for common-sense reasoning tasks [36][37] - The findings suggest that many models can achieve correct answers with fewer reasoning steps, challenging the notion that longer reasoning is inherently better [38][46]
一文讲透Agent的底层逻辑
Hu Xiu· 2025-10-22 14:47
Core Insights - The article emphasizes the importance of understanding AI Agents beyond mere API calls, highlighting the need for a structured cognitive process that enhances their capabilities [3][15][56] Group 1: Understanding AI Agents - The article identifies two common misconceptions about AI Agents: one that mystifies their capabilities and another that oversimplifies them as just repeated calls to ChatGPT [1][2] - It aims to establish a consensus on the cognitive processes that underpin AI Agents, asserting that their effectiveness lies in the design of these processes rather than just the underlying models [3][4] Group 2: Development Insights - The article outlines a structured approach to developing AI Agents, detailing the transition from "prompt engineers" to "Agent process architects" [7][72] - It discusses the threefold value of structured processes: providing a framework for thought, creating memory compression algorithms, and enabling interaction with the real world [6][55][66] Group 3: Theoretical Foundations - The article connects the effectiveness of the "Think -> Act -> Observe" cycle to foundational theories in cybernetics and information theory, explaining how feedback mechanisms enhance goal attainment and reduce uncertainty [74][75][91] - It illustrates the evolution from open-loop systems to closed-loop systems, emphasizing the importance of feedback in achieving reliable outcomes [77][84] Group 4: Practical Applications - The article uses a travel planning example to contrast the static outputs of traditional chatbots with the dynamic, iterative processes of AI Agents, showcasing the latter's ability to produce actionable and reliable results [40][48] - It highlights the significance of structured workflows in enhancing the quality and reliability of AI outputs, moving beyond mere text generation to a more interactive and iterative approach [55][68] Group 5: Future Directions - The article discusses the future role of developers as "Agent process architects," focusing on designing cognitive workflows, empowering AI with tools, and constructing decision-making contexts [100][102] - It emphasizes the need for advanced cognitive architectures that can manage complex tasks and improve execution efficiency while maintaining high-quality outcomes [106][111]
Agent 一年半开发复盘:大家对 Agent 的理解有错位,有效的「认知流程」很关键
Founder Park· 2025-10-22 12:46
Core Insights - The article emphasizes the importance of understanding AI Agents and their cognitive processes, arguing that the true power of AI Agents lies not in the models themselves but in the effective cognitive workflows designed around them [1][2][3]. Group 1: Understanding AI Agents - The author identifies two common misconceptions about AI Agents: one is the mystification of their capabilities, and the other is the oversimplification of their functions [1][2]. - A unified context is proposed to help practitioners understand what is meant by "Agentic" discussions, focusing on the cognitive processes that enhance AI capabilities [2][3]. Group 2: Development Framework - The article outlines a comprehensive framework for understanding the evolution of AI Agents, using a metaphor of a student's growth stages to illustrate the development of core capabilities [3][15]. - It discusses the transition from "prompt engineers" to "Agent process architects," highlighting the need for structured cognitive workflows that enhance AI performance [5][62]. Group 3: Cognitive Processes - The article breaks down the cognitive processes into several key components: Planning, Chain of Thought (CoT), Self-Reflection, and Tool Use, each contributing to the overall effectiveness of AI Agents [4][20][24]. - The importance of iterative processes is emphasized, showcasing how reflection and memory compression can lead to improved decision-making and learning [40][43]. Group 4: Practical Applications - A detailed comparison is made between traditional chatbots and AI Agents using a travel planning example, illustrating how AI Agents can dynamically adjust plans based on real-time information [27][30]. - The article highlights the significance of structured workflows in achieving high-quality, reliable outcomes, contrasting the static nature of traditional chatbots with the dynamic capabilities of AI Agents [35][36]. Group 5: Theoretical Foundations - The effectiveness of AI Agents is linked to foundational theories in Cybernetics and Information Theory, which explain how feedback loops and information acquisition reduce uncertainty in problem-solving [50][59]. - The article argues that the closed-loop nature of AI Agents allows them to continuously refine their actions based on observed outcomes, enhancing their ability to achieve set goals [55][58]. Group 6: Future Directions - The article concludes with a call for a shift in focus from merely creating prompts to designing intelligent processes that enable AI to self-plan, self-correct, and self-iterate [62][70]. - It emphasizes the need for performance engineering to address the challenges of execution efficiency while maintaining high-quality outcomes in AI applications [70][72].