The FFT is factored into Radix-4 Butterfly operations. When an odd power of two is required, a small radix-2 “follower” stage performs the final iteration. The radix-2 stage does not require a full complex rotator so its cost is minimal.
The Radix-4 Engine fetches one complex word of data each clock cycle. Four interleaved data words are collected then applied to the t0-t3 inputs. On successive clock cycles the engine calculates the four frequency domain outputs f0-f3. These are then stored back into the Working Buffer.
During the final iteration, the engine produces frequency domain outputs on successive clocks. These arrive in scrambled (digit-reversed) order.
A final pass through the data produces outputs in sorted order.