Rapid Prototyping

Search documents
AMD Vitis™ Tool: AI Engine Rapid Prototyping
AMD· 2025-08-10 04:55
AI Engine Rapid Prototyping Overview - AMD introduces the Versal AI Engine Rapid Prototyping using the AMD Vitis Unified IDE for early design analysis and risk reduction [1][2][12][13] - The rapid prototyping feature is available in the Vitis Unified IDE in version 20242 [13][16] Key Steps in Rapid Prototyping - Involves resource estimation, including tile count, buffer usage, PLIO resources, and stream array traffic [2] - Assesses latency and throughput feasibility with early data flow simulations, prototype kernel coding, and initiation interval loop analysis [2] - Utilizes Vitis libraries for existing block elements and develops candidate vectorization options [3] - Includes building empty kernel wrappers, building the graph and compiling, simulating and analyzing for early estimation [12] Custom Kernel Example: Digital Up Conversion (DUC) Chain - The DUC chain translates a signal from baseband to intermediate frequency band and includes a FIR fractional resampler, half-band interpolators, DDS mixer functions, and an adder functional block [5] - The FIR fractional resampler, the half-band interpolators, and DDS mixer functions can be implemented using the Vitis DSP Library [5] - Focuses on fast prototyping of the custom adder kernel, identifying input/output data types, coefficient types, number of taps, sampling rate, and the kernel function [6] AI Engine System Mapping - Identifies hardware resources such as the number of AI Engine tiles, storage, buffers, and connectivity ports [8] - Considers compute (AIE tiles), storage (buffer size, local memory, DMA size), and input/output bandwidth (PLIO ports, clocking, buffer/stream interfaces) [8] - A custom adder kernel requires a sampling rate of 1200 MSPS with a latency less than 500 ns [9] - The adder is implemented in one tile with two inputs and one output of cint16 type, taking 3KB of data at a sampling rate of 1200 MSPS [10][11] Vitis Unified IDE Implementation - Generates data flow models with parametrized kernel ports, multi-core graph topology, full buffering, stream details, and LUT storages [14] - Allows exploration of hardware utilization through AI Engine compilation and ensures throughput and latency requirements are met through AI engine emulation [14] - Requires creating a new empty AI engine component and using the "Generate AIE Prototype Code" option [15] - Involves setting kernel properties such as name, input/output port properties (data type cint16, dimension to 384 samples), and enabling "Generate Top Level graph and Simulation code" [19][20] Simulation and Analysis - The tool generates graph CPP and H files, with the graph CPP setting the graph to run for one iteration (modifiable for better analysis) [20][21] - Requires adding input text files for simulation, containing values representing cint16 samples per clock on the 64-bit interface [22][23][24][25] - Simulation results report a throughput of 5000 megabytes/second or 1250 mega 16-bit complex samples per system, meeting the requirements [26]