Workflow
乱序执行
icon
Search documents
一颗RISC-V芯片,打破常规!
半导体行业观察· 2025-09-01 01:17
Core Insights - Condor Computing, a subsidiary of Andes Technology, focuses on developing licensable RISC-V cores, similar to Arm and SiFive, and has prior RISC-V design experience before its establishment in 2023 [2] - The Cuzco core, to be showcased at Hot Chips 2025, is a high-performance RISC-V design featuring advanced out-of-order execution capabilities and sophisticated branch predictors, expected to outperform existing RISC-V cores like Alibaba's T-HEAD C910 and SiFive's P550 [2][6] Core Overview - Cuzco is an 8-wide out-of-order core with 256 ROB entries, targeting clock speeds of approximately 2 GHz to 2.5 GHz on TSMC's 5nm process, with a 12-stage pipeline [6][10] - The core employs a static scheduling approach to save power and reduce complexity, which does not require modifications to the ISA or compiler for optimal performance [4][10] Execution Resources - Cuzco's execution resources are grouped into multiple slices, each capable of executing all supported RISC-V instructions, allowing for easy scalability by adjusting the number of slices [33] - Each slice has a set of execution queues (XEQ) that hold micro-operations waiting for functional units, with a maximum of two micro-operations executed per cycle [33] Branch Prediction - The core utilizes a complex branch predictor, TAGE-SC-L, which efficiently manages branch predictions by selecting the most suitable history length for each branch [11][12] - Cuzco features an 8K entry branch target buffer (BTB) and a 32-entry return stack for predicting return values, with instruction fetching supported by a 64 KB instruction cache [14] Load/Store Architecture - Cuzco's load/store unit includes a 64-entry load queue, a 64-entry store queue, and a 64-entry data cache miss queue, with a maximum load bandwidth of 64B per cycle [36][38] - The L1D cache is 64 KB with an 8-way set associative design, while the L2 cache can be configured up to 8 MB, and the L3 cache is shared among eight cores [38][43] Performance and Efficiency - Cuzco's design aims to achieve high performance while maintaining low power consumption, with a focus on minimizing replay penalties and optimizing resource utilization through a time resource matrix (TRM) [23][25] - The core's architecture allows for dynamic scheduling and effective handling of cache misses through instruction replay mechanisms [50][52]
高通服务器芯片,深度解读
半导体行业观察· 2025-05-30 01:55
Core Insights - Qualcomm is leveraging its experience in low-power CPU design from the mobile sector to enter the cloud computing market with a focus on cost-effective and energy-efficient server chips [1][2] - The Falkor CPU architecture is designed to meet performance benchmarks while maintaining low power consumption and silicon area requirements [4][7] - The Centriq 2400 series chips integrate up to 48 Falkor cores on a 398 mm² die, with a thermal design power (TDP) of 120 watts, indicating a power consumption of less than 2.5 watts per core [7][8] Performance and Architecture - Falkor is a 4-wide aarch64 core that runs on a 64-bit Arm instruction set, inheriting features from Qualcomm's previous mobile cores [4] - The architecture includes a unique L0 and L1 instruction cache setup, providing higher instruction cache capacity compared to contemporary cores [9][13] - Falkor's performance in SPEC CPU2017 tests shows a significant lead over Arm's Cortex A72, with a 21.6% advantage in integer operations and a 53.4% lead in floating-point operations [74][76] Memory and Cache Design - The L2 cache of Falkor is 512 KB with 8-way set associative design, serving as an intermediate cache between L1 and the on-chip network [49] - The Centriq's L3 cache capacity totals 60 MB across 12 slices, with a latency of 40.9 ns, which is relatively high compared to competitors [61][65] - Falkor's memory subsystem is designed to handle high bandwidth, with a theoretical memory bandwidth of 128 GB/s [69] System Architecture - The Centriq architecture does not support multi-socket configurations, limiting it to a maximum of 48 cores, which is a strategic decision to focus on mainstream cloud applications [71][72] - The interconnect topology utilizes a segmented ring design, allowing for efficient data transfer between cores and caches [56][58] Future Outlook - Qualcomm aims to re-enter the server market with new CPU offerings, building on the lessons learned from the Centriq experience [90][91] - The company is exploring partnerships for AI solutions, indicating a strategic pivot towards integrating AI capabilities into its server offerings [91]