非Transformer架构大模型

Search documents
非Transformer架构落地之王,带着离线智能和原生记忆能力在上海WAIC浮出水面
量子位· 2025-07-26 06:34
Core Viewpoint - The article discusses the advancements made by RockAI in developing a new AI model architecture that operates offline and possesses memory capabilities, marking a significant shift from traditional Transformer-based models [6][10][12]. Group 1: RockAI's Innovations - RockAI has introduced the Yan 2.0 Preview model, which features a "native memory" capability allowing it to learn and evolve continuously through user interactions [11][12]. - The model operates entirely offline, demonstrating effective performance in tasks such as learning new actions and playing games without external control [6][8][11]. - The architecture is designed specifically for edge devices, enabling efficient operation without relying on cloud resources, which is crucial for devices with limited computational power [30][48]. Group 2: Memory Mechanism - Yan 2.0 Preview incorporates a memory module that allows for dynamic updating and retrieval of information, enabling the model to forget outdated knowledge while integrating new insights [20][23]. - The model's memory retrieval mechanism selects the most relevant memories to generate outputs, enhancing its reasoning capabilities [23][24]. - This approach contrasts with traditional models that are static and unable to learn post-deployment, positioning Yan 2.0 as a more adaptive and intelligent system [14][17]. Group 3: Market Position and Future Directions - RockAI is positioned as a leader in the non-Transformer architecture space, having successfully deployed its models on various edge devices without the need for model compression or quantization [58][60]. - The company aims to create a collective intelligence framework where multiple models can collaborate and evolve, moving towards a more decentralized AI ecosystem [65][66]. - The shift away from Transformer models is seen as a response to the limitations of current architectures, with RockAI advocating for simpler algorithms that require less computational power and data [28][37].