Workflow
多模态信息检索
icon
Search documents
阿里通义发布并开源Qwen3-VL-Embedding和Qwen3-VL-Reranker模型
智通财经网· 2026-01-09 01:31
Core Insights - The Qwen3-VL model series excels in handling multiple modalities such as text, images, visual documents, and videos within a unified framework, achieving industry-leading performance in various tasks like image-text retrieval and visual question answering [1][8]. Group 1: Multi-Modal Capabilities - The model series can process diverse input types, including text, images, visual documents, and videos, demonstrating superior performance in tasks like multi-modal content clustering [1]. - Qwen3-VL-Embedding effectively generates rich semantic vector representations, mapping visual and textual information into a shared semantic space for efficient cross-modal similarity computation and retrieval [2][4]. Group 2: Model Architecture and Performance - Qwen3-VL-Embedding employs a dual-tower architecture for efficient independent encoding of different modalities, suitable for large-scale parallel computation [14]. - Qwen3-VL-Reranker, as a complementary model, utilizes a single-tower architecture with cross-attention mechanisms to analyze semantic relationships between queries and documents, enhancing precision in relevance scoring [14]. - In benchmark tests like MMEB-v2 and MMTEB, Qwen3-VL-Embedding-8B achieved leading results, surpassing all previous open-source models and closed-source commercial services [8][11]. Group 3: Practical Utility - The Qwen3-VL series supports over 30 languages, making it suitable for global deployment, and offers flexible vector dimension selection and task instruction customization [6]. - The models maintain excellent performance even after quantization, facilitating integration into existing systems for developers [6]. Group 4: Performance Metrics - Qwen3-VL-Embedding-2B achieved an average score of 73.4 in MMEB-v2 retrieval tasks, while Qwen3-VL-Reranker-8B scored 79.2, indicating superior performance in most tasks [13]. - The performance of Qwen3-VL-Reranker models consistently outperformed baseline models, with the 8B version achieving the best results across various tasks [11].