SIASUN-阿里云发布全新多模态交互开发套件可应用于AI眼镜、机器人等

Core Insights - Alibaba Cloud has launched a new multimodal interaction development kit that integrates three foundational models: Qianwen, Wanxiang, and Bailing, enabling devices to listen, see, think, and interact with the physical world [1][2] - The kit is compatible with over 30 mainstream ARM, RISC-V, and MIPS architecture terminal chip platforms, facilitating rapid integration with most hardware devices in the market [1] - The development kit includes over ten pre-set Agents and MCP tools for various applications in daily life, work efficiency, and entertainment, enhancing user interaction capabilities [1][2] Group 1 - The multimodal interaction development kit supports full-duplex voice, video, and text interactions, with end-to-end voice interaction latency as low as 1 second and video interaction latency as low as 1.5 seconds [1] - The kit connects to Alibaba Cloud's Bailian platform ecosystem, allowing users to add third-party Agents and expand application capabilities significantly [2] - Solutions for smart wearable devices and companion robots have been showcased, including features like real-time anomaly monitoring and keyword-based video search [2] Group 2 - In the AI glasses sector, the kit enables functionalities such as simultaneous translation, photo translation, multimodal memos, and audio transcription through a complete interaction chain [2] - The development kit aims to optimize the deployment and inference performance of the Tongyi model family on RISC-V architecture in collaboration with Xuantie RISC-V [1] - The pre-set travel planning Agent allows users to access route planning, travel guides, and leisure exploration capabilities directly [1]