蚂蚁集团开源全模态大模型Ming-flash-omni 2.0
Core Insights - Ant Group has released the Ming-flash-omni2.0, a full-modal large model that excels in various benchmark tests, surpassing some metrics of Gemini2.5Pro [1] - This model is the industry's first unified audio generation model capable of generating speech, environmental sounds, and music simultaneously on the same audio track [1] - Users can control various audio parameters such as tone, speed, pitch, volume, emotion, and dialect using natural language commands [1] - The model achieves a low inference frame rate of 3.1Hz, enabling real-time high-fidelity generation of long audio segments within minutes [1]