多模态协同
Search documents
商贸零售行业周报:全球大模型能力再升级,阿里持续加码全栈AI能力-20251201
Shenwan Hongyuan Securities· 2025-12-01 11:57
Investment Rating - The report maintains a positive outlook on the e-commerce sector, particularly focusing on companies like Alibaba, JD.com, Meituan, and Pinduoduo, which are expected to benefit from advancements in AI technology and the growth of instant retail [4][11]. Core Insights - The report highlights the significant performance upgrade of Google's Gemini 3 model, which shows improvements in mathematical reasoning, code generation, and cross-modal understanding, marking a shift from "scale competition" to "execution capability competition" in AI technology [4][7]. - Alibaba's integration of its AI product ecosystem under the "Qwen" brand has led to the rapid growth of the Qwen App, which achieved over 10 million downloads in its first week, indicating strong potential in the consumer AI market [4][16]. - The report emphasizes the transition of AI image capabilities from consumer entertainment to business production, suggesting a structural reduction in costs and increased efficiency in global retail marketing systems [11][9]. Market Performance Overview - During the period from November 24 to November 28, 2025, the commerce retail index increased by 3.45%, outperforming the CSI 300 index by 1.81 percentage points, ranking 9th among Shenwan's primary industries [4][25]. - The social services index rose by 3.92%, also surpassing the CSI 300 index by 2.28 percentage points, ranking 6th among Shenwan's primary industries [4][25]. Company Updates - The report notes that Alibaba's Qwen App has been positioned as an "AI super entrance" for consumer-facing applications, integrating various high-frequency scenarios within Alibaba's ecosystem, including maps, shopping, and payment services [22][16]. - The report also mentions significant stock performance within the commerce retail sector, with notable gains for companies such as Maoye Commercial (+51.11%) and Guangbai Co. (+18.72%) during the specified week [31][38]. Industry Developments - The report discusses the launch of Alibaba's cross-border e-commerce AI assistant "Ao Xia," which aims to provide a one-stop service for cross-border entrepreneurs, enhancing product selection and communication efficiency [45][43]. - It highlights the ongoing trend of AI technology application in various sectors, indicating a shift towards practical implementations that enhance operational efficiency and user engagement [4][11].
2025年中国多模态大模型行业核心技术现状 关键在表征、翻译、对齐、融合、协同技术【组图】
Qian Zhan Wang· 2025-06-03 05:12
Core Insights - The article discusses the core technologies of multimodal large models, focusing on representation learning, translation, alignment, fusion, and collaborative learning [1][2][7][11][14]. Representation Learning - Representation learning is fundamental for multimodal tasks, addressing challenges such as combining heterogeneous data and handling varying noise levels across different modalities [1]. - Prior to the advent of Transformers, different modalities required distinct representation learning models, such as CNNs for computer vision (CV) and LSTMs for natural language processing (NLP) [1]. - The emergence of Transformers has enabled the unification of multiple modalities and cross-modal tasks, leading to a surge in multimodal pre-training models post-2019 [1]. Translation - Cross-modal translation aims to map source modalities to target modalities, such as generating descriptive sentences from images or vice versa [2]. - The use of syntactic templates allows for structured predictions, where specific words are filled in based on detected attributes [2]. - Encoder-decoder architectures are employed to encode source modality data into latent features, which are then decoded to generate the target modality [2]. Alignment - Alignment is crucial in multimodal learning, focusing on establishing correspondences between different data modalities to enhance understanding of complex scenarios [7]. - Explicit alignment involves categorizing instances with multiple components and measuring similarity, utilizing both unsupervised and supervised methods [7][8]. - Implicit alignment leverages latent representations for tasks without strict alignment, improving performance in applications like visual question answering (VQA) and machine translation [8]. Fusion - Fusion combines multimodal data or features for unified analysis and decision-making, enhancing task performance by integrating information from various modalities [11]. - Early fusion merges features at the feature level, while late fusion combines outputs at the decision level, with hybrid fusion incorporating both approaches [11][12]. - The choice of fusion method depends on the task and data, with neural networks becoming a popular approach for multimodal fusion [12]. Collaborative Learning - Collaborative learning utilizes data from one modality to enhance the model of another modality, categorized into parallel, non-parallel, and hybrid methods [14][15]. - Parallel learning requires direct associations between observations from different modalities, while non-parallel learning relies on overlapping categories [15]. - Hybrid methods connect modalities through shared datasets, allowing one modality to influence the training of another, applicable across various tasks [15].