AI能帮忙厨房看火了!面壁智能开源全模态模型MiniCPM-o4.5,边看边听还能主动抢答
量子位·2026-02-04 12:31

Core Viewpoint - The article discusses the launch of MiniCPM-o4.5, a new multimodal AI model developed by 面壁智能, which can listen, see, and respond proactively, marking a significant advancement in AI interaction capabilities [2][10][44]. Group 1: Model Capabilities - MiniCPM-o4.5 can simultaneously listen and observe while actively engaging in conversation, allowing for a more natural interaction experience [10][19]. - The model can recognize changes in the environment, such as elevator floors or cooking timers, and provide timely reminders without needing explicit prompts from users [18][21]. - Unlike traditional AI, which operates in a question-and-answer format, MiniCPM-o4.5 can maintain continuous dialogue and respond to interruptions seamlessly [30][40]. Group 2: Technical Innovations - The model employs a full-duplex multimodal real-time streaming mechanism, enabling it to process audio and visual inputs while generating outputs concurrently [35][39]. - MiniCPM-o4.5 integrates online versions of its encoders and decoders to support streaming input/output, enhancing its responsiveness and stability [36][42]. - The architecture allows for continuous semantic assessment, enabling the model to decide when to intervene in conversations based on real-time context rather than relying on silence detection [40][41]. Group 3: Market Positioning and Strategy - 面壁智能 emphasizes a focus on edge AI, aiming to deploy models that operate effectively on local devices rather than relying on cloud infrastructure, addressing privacy and latency concerns [50][54]. - The company has established collaborations with chip manufacturers to ensure that their models are optimized for specific hardware environments from the design phase [58][60]. - MiniCPM-o4.5 is positioned as a foundational model for various applications, including automotive and robotics, highlighting its potential to transform user interactions across different platforms [49][62].