多模态协同

Search documents
2025年中国多模态大模型行业核心技术现状 关键在表征、翻译、对齐、融合、协同技术【组图】
Qian Zhan Wang· 2025-06-03 05:12
Core Insights - The article discusses the core technologies of multimodal large models, focusing on representation learning, translation, alignment, fusion, and collaborative learning [1][2][7][11][14]. Representation Learning - Representation learning is fundamental for multimodal tasks, addressing challenges such as combining heterogeneous data and handling varying noise levels across different modalities [1]. - Prior to the advent of Transformers, different modalities required distinct representation learning models, such as CNNs for computer vision (CV) and LSTMs for natural language processing (NLP) [1]. - The emergence of Transformers has enabled the unification of multiple modalities and cross-modal tasks, leading to a surge in multimodal pre-training models post-2019 [1]. Translation - Cross-modal translation aims to map source modalities to target modalities, such as generating descriptive sentences from images or vice versa [2]. - The use of syntactic templates allows for structured predictions, where specific words are filled in based on detected attributes [2]. - Encoder-decoder architectures are employed to encode source modality data into latent features, which are then decoded to generate the target modality [2]. Alignment - Alignment is crucial in multimodal learning, focusing on establishing correspondences between different data modalities to enhance understanding of complex scenarios [7]. - Explicit alignment involves categorizing instances with multiple components and measuring similarity, utilizing both unsupervised and supervised methods [7][8]. - Implicit alignment leverages latent representations for tasks without strict alignment, improving performance in applications like visual question answering (VQA) and machine translation [8]. Fusion - Fusion combines multimodal data or features for unified analysis and decision-making, enhancing task performance by integrating information from various modalities [11]. - Early fusion merges features at the feature level, while late fusion combines outputs at the decision level, with hybrid fusion incorporating both approaches [11][12]. - The choice of fusion method depends on the task and data, with neural networks becoming a popular approach for multimodal fusion [12]. Collaborative Learning - Collaborative learning utilizes data from one modality to enhance the model of another modality, categorized into parallel, non-parallel, and hybrid methods [14][15]. - Parallel learning requires direct associations between observations from different modalities, while non-parallel learning relies on overlapping categories [15]. - Hybrid methods connect modalities through shared datasets, allowing one modality to influence the training of another, applicable across various tasks [15].
报告显示中国网络文学IP市场规模跃升至2985.6亿元
Huan Qiu Wang Zi Xun· 2025-05-09 14:30
Core Insights - The report indicates that by the end of 2024, the Chinese online literature IP market is expected to reach 298.56 billion RMB, marking a year-on-year growth of 14.6% [1] - The number of online literature authors has surpassed 30 million for the first time, reaching 31.198 million [1] - The online literature reading market is projected to reach 43.06 billion RMB, with a year-on-year increase of 6.8% [1] Market Trends - The report highlights the emergence of realistic themes and traditional Chinese culture in online literature, serving as a vibrant medium for promoting mainstream social values and cultural innovation [2] - The classicization of online literature is accelerating, with 81 outstanding works being archived in the National Library of China and 24 intangible cultural heritage-themed works in the Shanghai Library [2] - The integration of short dramas, games, and AI into the online literature ecosystem is expanding the IP development model, leading to new hits and additional revenue streams [2] International Expansion - The overseas market for Chinese online literature is expected to exceed 5 billion RMB in 2024, with 460,000 overseas online writers and over 350 million users across more than 200 countries and regions [2] - The user base in the Japanese market has surged by 180%, making it the fastest-growing emerging market for global users [2]