从Token到词元:全模态时代的基模与交互入口
量子位·2026-03-27 05:10

Core Viewpoint - The article discusses the establishment of "Token" as the standard translation for "词元" by the National Bureau of Statistics, highlighting the significant daily usage of Tokens in China and the shift from discrete text to continuous perception in AI systems [1][37]. Group 1: Token Standardization and Industry Trends - The term "词元" was promoted by Professor Qiu Xipeng from Fudan University in 2021, emphasizing its role as a fundamental unit in language processing while avoiding confusion with natural language "words" [3]. - The deployment of Agents in multi-modal scenarios is changing the way Tokens are generated and consumed, impacting the capabilities and cost structures of next-generation AI systems [1][10]. - Companies focusing on unified Token structures and contextual intelligence are gaining significant capital attention, as seen with the recent funding of MoSi Intelligent [4][36]. Group 2: Technological Pathways and Innovations - MoSi Intelligent is pursuing a less common path by starting with voice technology and moving towards a unified Token structure for multi-modal information processing [7][9]. - The choice of voice as a breakthrough point is due to its higher information density and its natural alignment with real-world human-computer interactions [9][10]. - The development of SpeechGPT and SpeechTokenizer demonstrates the feasibility of integrating continuous speech signals into a unified Token space, allowing for a cohesive understanding of both spoken and written language [14][17]. Group 3: Advancements and Future Directions - The release of AnyGPT marks a significant step in unifying voice, text, images, and video into a discrete Token system, paving the way for comprehensive multi-modal models [18][19]. - MoSi Intelligent's ongoing advancements, such as MOSS-TTSD and NEX, showcase the competitive edge gained through a unified architecture that extends to Agent and productivity scenarios [21][22]. - The company is building a robust team with deep research and engineering capabilities, supported by the Shanghai Institute of Intelligent Technology, which enhances its speed of technological transformation [27][31]. Group 4: Market Positioning and Commercialization - MoSi Intelligent's multi-modal model open platform is in full public testing, providing API services that cater to enterprise-level demands across various sectors [35][36]. - The company emphasizes an integrated capability from foundational models to vertical applications, aiming to create a dual-driven growth model through Token production, distribution, and application [36][38]. - The official recognition of "词元" signifies a shift towards a more regulated industry, where future model capabilities will increasingly depend on architectural innovation and talent density rather than just parameter scaling [37][38].

从Token到词元:全模态时代的基模与交互入口 - Reportify