Overview
Noesis Technologies ntFFT_UHS IP implements a customized FFT/IFFT programmable fixed point (Decimation in Frequency - DIF) transform processor, supporting low latency, streaming, ultra-parallel complex samples per clock cycle in natural order. Input, internal and output 2’s complement fixed point precision are fully configurable before IP synthesis.
Radix-2, Radix-4 or Mixed-Radix design may be selected with parallel butterflies deployment, depending on the implemented transform sizes. Each stage features its own permutation network buffer, implemented optionally as either Register File or BRAM primitives. Twiddle factors fixed-point precision is selected via parameterization and the values are precalculated and stored in small distributed LUTs next to the respective butterflies, using a scalable design methodology. The permutation of each buffer stage is necessarily custom-made, since it is dependent on the parallel samples per clock cycle configuration and the supported FFT transform sizes. An optional Circular Shift buffer can be instantiated for those applications that need to correct a detected Carrier Frequency Offset in the frequency domain (FFT), with range of circular shifts correction relevant to both the FFT transform size and the parallel samples per clock cycle. Additional Overlap-Save (OLS) method wrappers may be provided to support real time high bandwidth filtering applications.
Learn more about Filters Transforms IP core
Processor Architecture for High Performance Video Decode
No size fits all for signal processing on FPGA (RF Engines)
Employing general-purpose processors for radio DSP
Finally a Practical ''Do and Don't'' primer on architecting FPGA solutions for DSP design. With more do's than don’ts, the article is a down-to-basics look at how to avoid the pitfalls and realize device benefits
Regardless of whether you are using VHDL, System Verilog, or a different design capture language, there are a number of universal design techniques with which FPGA engineers should be familiar, from the very simple to the most advanced.
The real results of a double-precision matrix multiply core that can easily be extended to a full DGEMM benchmark are demonstrated.