静态调度
Search documents
一颗RISC-V芯片,打破常规!
半导体行业观察· 2025-09-01 01:17
Core Insights - Condor Computing, a subsidiary of Andes Technology, focuses on developing licensable RISC-V cores, similar to Arm and SiFive, and has prior RISC-V design experience before its establishment in 2023 [2] - The Cuzco core, to be showcased at Hot Chips 2025, is a high-performance RISC-V design featuring advanced out-of-order execution capabilities and sophisticated branch predictors, expected to outperform existing RISC-V cores like Alibaba's T-HEAD C910 and SiFive's P550 [2][6] Core Overview - Cuzco is an 8-wide out-of-order core with 256 ROB entries, targeting clock speeds of approximately 2 GHz to 2.5 GHz on TSMC's 5nm process, with a 12-stage pipeline [6][10] - The core employs a static scheduling approach to save power and reduce complexity, which does not require modifications to the ISA or compiler for optimal performance [4][10] Execution Resources - Cuzco's execution resources are grouped into multiple slices, each capable of executing all supported RISC-V instructions, allowing for easy scalability by adjusting the number of slices [33] - Each slice has a set of execution queues (XEQ) that hold micro-operations waiting for functional units, with a maximum of two micro-operations executed per cycle [33] Branch Prediction - The core utilizes a complex branch predictor, TAGE-SC-L, which efficiently manages branch predictions by selecting the most suitable history length for each branch [11][12] - Cuzco features an 8K entry branch target buffer (BTB) and a 32-entry return stack for predicting return values, with instruction fetching supported by a 64 KB instruction cache [14] Load/Store Architecture - Cuzco's load/store unit includes a 64-entry load queue, a 64-entry store queue, and a 64-entry data cache miss queue, with a maximum load bandwidth of 64B per cycle [36][38] - The L1D cache is 64 KB with an 8-way set associative design, while the L2 cache can be configured up to 8 MB, and the L3 cache is shared among eight cores [38][43] Performance and Efficiency - Cuzco's design aims to achieve high performance while maintaining low power consumption, with a focus on minimizing replay penalties and optimizing resource utilization through a time resource matrix (TRM) [23][25] - The core's architecture allows for dynamic scheduling and effective handling of cache misses through instruction replay mechanisms [50][52]