华为升级行业Agent算法架构！MindScale自己写prompt和工作流，KV Cache减少5.7倍token

Core Viewpoint - The article emphasizes the significance of industry-specific agents in enhancing productivity and value creation through the application of large models in various sectors [1]. Group 1: Challenges in Industry Agent Development - The MindScale project identifies four core challenges in the widespread application of agents across industries: self-evolving workflows, automated prompt optimization, historical knowledge reuse, and complex reasoning evaluation [4]. - The project aims to address these challenges by providing solutions in collaboration with various partners [4]. Group 2: Workflow Development and Automation - The algorithm package includes the EvoFabric agent algorithm, which facilitates self-evolving workflows, allowing for rapid generation of executable workflows from natural language documents and historical tool libraries using SOP2Workflow [5][6]. - The traditional manual maintenance of workflows relies heavily on expert experience, which poses challenges in reusing historical knowledge and maintaining efficiency in training and inference [7]. Group 3: Prompt Optimization Techniques - The article discusses the implementation of a prompt optimization algorithm, SCOPE, which allows developers to optimize prompts between inference steps, achieving over 20% accuracy improvement in specific scenarios [11]. - The C-MOP model introduces a feedback loop for prompt optimization, addressing conflicts in text gradients and enabling automatic prompt optimization based on positive and negative feedback [11][14]. Group 4: Efficiency and Performance Enhancements - MindScale focuses on optimizing training and inference efficiency for industry-specific models, with the TrimR algorithm significantly reducing inference latency by up to 70% in high-concurrency scenarios without compromising accuracy [14][16]. - The introduction of KV-Embeddings redefines the use of KV Cache, enhancing performance in chain-of-embedding scenarios and reducing the number of generated tokens by up to 5.7 times [16]. Group 5: Hardware Adaptation and Implementation - MindScale includes code implementations that are compatible with Ascend hardware, enabling industry developers to build high-precision and efficient agents based on domestic computing power [18]. - The TrimR algorithm employs a lightweight verifier to detect and truncate unnecessary intermediate thoughts without requiring fine-tuning of the large model or verifier, suitable for high-concurrency production environments [19].