华为升级行业Agent算法架构!MindScale自己写prompt和工作流,KV Cache减少5.7倍token
Xin Lang Cai Jing·2026-02-12 12:13

Core Insights - The article discusses the launch of Huawei's MindScale algorithm package aimed at enhancing the development of industry-specific agents, which are seen as vital for improving productivity and value creation in various sectors [1][13]. Group 1: Challenges in Industry Agent Development - MindScale identifies four core challenges in the widespread adoption of industry agents: the need for self-evolving workflows, automation of prompt optimization, historical knowledge reuse, and efficient training and inference processes [3][16]. - The project includes solutions such as the EvoFabric algorithm for self-evolving agents and SOP2Workflow for generating executable workflows from natural language documents [3][16]. Group 2: Workflow and Memory Optimization - The framework supports a state graph engine that allows for deep mixing of multiple agents, tools, and memory forms, facilitating rapid copying, migration, and deployment of complex intelligent processes [7][20]. - A memory module enhances agent performance over time by utilizing trajectory memory and evaluation results to create an optimized context for experience [7][20]. Group 3: Prompt Optimization Techniques - The SCOPE algorithm enables online prompt optimization between inference steps, achieving over 20% accuracy improvement in specific reasoning scenarios [7][21]. - The C-MOP model introduces a feedback loop for prompt optimization, addressing conflicts in text gradients and enabling automatic prompt adjustments based on positive and negative feedback [8][21]. Group 4: Efficiency and Hardware Adaptation - MindScale emphasizes training and inference efficiency, with the TrimR algorithm significantly reducing inference latency by up to 70% in high-concurrency scenarios without compromising accuracy [10][23]. - The introduction of KV-Embeddings redefines the use of KV Cache, allowing for efficient representation reuse during inference, which can reduce the number of generated tokens by up to 5.7 times [12][25].