Workflow
全双工全模态大模型
icon
Search documents
刚刚,面壁小钢炮开源进阶版「Her」,9B模型居然有了「活人感」
机器之心· 2026-02-04 11:20
Core Viewpoint - The article discusses the limitations of traditional AI interactions and introduces MiniCPM-o 4.5, a groundbreaking model that enables real-time, multimodal communication, enhancing human-like interaction capabilities [4][12][40]. Group 1: MiniCPM-o 4.5 Features - MiniCPM-o 4.5 is the first model to achieve full-duplex, multimodal capabilities, allowing it to "see, hear, and speak" simultaneously, thus enabling real-time interaction [4][12]. - The model has a parameter count of 9 billion and has achieved state-of-the-art (SOTA) performance across various benchmarks, scoring 77.6 in the OpenCompass comprehensive evaluation [5][9]. - It outperforms top closed-source models like Gemini 2.5 Flash in key tasks such as visual understanding and document parsing [7]. Group 2: Technical Innovations - MiniCPM-o 4.5 employs a full-duplex architecture that allows continuous input and output without blocking, enabling the model to perceive environmental changes while generating responses [29][36]. - The model features an autonomous interaction mechanism that allows it to determine when to respond based on real-time semantic understanding, eliminating reliance on external tools [33][36]. - It utilizes time alignment and time-division multiplexing to process multimodal streams in real-time, ensuring that input and output are synchronized at a millisecond level [35]. Group 3: User Experience and Comparisons - User experiences with MiniCPM-o 4.5 demonstrate its ability to engage in dynamic interactions, such as providing real-time feedback during drawing games, unlike traditional models that wait for complete inputs [15][16]. - In practical tests, MiniCPM-o 4.5 showed proactive engagement by reminding users about tasks, showcasing its ability to maintain context and provide timely interventions [20][21]. - Comparisons with ChatGPT highlight MiniCPM-o 4.5's superior ability to adapt and respond in real-time, making interactions feel more natural and human-like [16][22]. Group 4: Implications for the Future - The introduction of MiniCPM-o 4.5 signifies a shift towards more human-like AI interactions, where AI can actively participate in conversations rather than merely responding to prompts [41]. - The model's capabilities suggest potential applications in various fields, including smart monitoring, human-computer collaboration, and accessibility support for individuals with disabilities [38]. - The advancements in MiniCPM-o 4.5 reflect a broader trend in the industry towards achieving higher capability density in AI models, moving away from simply increasing parameter counts [40].