Core Insights - 360 Group's open-source visual language alignment model FG-CLIP2 has generated significant attention in the global tech community, outperforming Google's SigLIP 2 and Meta's MetaCLIP2 in 29 authoritative benchmark tests, marking a breakthrough for China in the AI foundational model field [1] Group 1 - FG-CLIP2 addresses the long-standing "fine-grained recognition" challenge of CLIP models, achieving a high confidence level of 96% in recognizing details in complex scenes with multiple objects [1] - The model incorporates three fundamental innovations: a hierarchical alignment architecture that allows it to perceive both macro scenes and micro details, a dynamic attention mechanism for intelligent focus on key image areas, and a bilingual collaborative optimization strategy that resolves the imbalance in understanding between Chinese and English [1]
360:模型FG-CLIP2全面超越国际巨头