智象未来团队荣膺ACM MM 2025最佳演示奖:重新定义对话式视觉创作
Ge Long Hui·2025-11-06 05:23

Group 1 - The 33rd ACM International Multimedia Conference (ACM MM 2025) was held in Dublin, Ireland, where the team from Zhixiang Future won the Best Demonstration Award, marking it as the first multimodal generative AI startup from China to achieve this honor, showcasing its top-tier research capabilities and innovative strength in the field of multimodal AI [1][2] - The ACM International Multimedia Conference, organized by the Association for Computing Machinery (ACM), is one of the most authoritative and influential academic conferences in the multimedia field, with the Best Demonstration Award representing high recognition for technological innovation, practicality, maturity, and presentation effectiveness [2] - Zhixiang Future's awarded "Inspiration Agent" (Chat Generation) is a unified multimodal intelligent agent that revolutionizes the creation of complex visual content into intuitive conversational experiences, effectively addressing the industry challenge of cross-modal semantic alignment by integrating text-to-image generation, directive image editing, and text/image-to-video generation functionalities within a single interface [2][5] Group 2 - The technology is based on the HiDream-I1 model with 17 billion parameters, utilizing a sparse diffusion Transformer (DiT) structure and a dynamic mixture of experts (MoE) design, demonstrating excellent performance in international benchmark tests such as HPS and GenEval [2] - The intelligent agent introduces a new way of collaborative content creation in accessible, interactive visual storytelling and multimodal generative AI, lowering the barriers for high-quality visual content creation and significantly shortening iteration cycles, achieving a "one conversation" creative loop from idea to quality output [5] - The prototype of this technology has been successfully iterated and applied to the conversational generation feature of Zhixiang Future's flagship product, vivago.ai, providing users with a more natural and personalized multimodal conversational interaction experience [5][6]