语义向量化
Search documents
大模型的第一性原理:(二)信号处理篇
机器之心· 2026-01-30 08:49
Core Viewpoint - The article discusses the transformation of natural language processing problems into signal processing problems through semantic vectorization, emphasizing the importance of token embedding in large models and its connection to signal processing and information theory [2][32]. Semantic Embedding / Vectorization - The concept of using vectors to model semantics dates back to Luhn's 1953 paper, but significant breakthroughs were achieved in 2013 by Mikolov and others, who successfully trained neural network models to convert tokens into semantic vectors [6][9]. - The ideal semantic vectorization has not been fully realized, but the inner product of semantic vectors can represent semantic relevance at the token level [7][11]. - The semantic vector space can be modeled as a probability-inner product space, balancing complexity and effectiveness by using a unit sphere to define the space [8][10]. Optimal Semantic Vectorization - The optimal semantic encoding is closely related to downstream tasks, with the goal of predicting the next token. The semantic encoder should maximize the conditional mutual information between the next token and the current sequence [13][14]. - The article highlights that existing methods like Contrastive Predictive Coding (CPC) optimize the upper bound of the semantic encoder but may not achieve the optimal solution [15][19]. Transformer as a Nonlinear Time-Varying Vector Autoregressive Time Series - The Transformer model is identified as a self-regressive large language model that predicts the next token based on the input token sequence and previously generated tokens [21][30]. - The attention mechanism in Transformers can be mathematically expressed as a nonlinear time-varying vector autoregressive time series, which is crucial for predicting the next token [22][24]. Signal Processing and Information Theory - The article establishes a relationship between signal processing and information theory, noting that signal processing implements information theory principles in specific computational architectures [32][33]. - The transition from BIT in the information age to TOKEN in the AI era is proposed as a way to apply Shannon's information theory to the mathematical principles behind large models [36].