Implementing floating-point DSP on FPGAs
For developers using FPGAs for the implementation of floating-point DSP functions, one key challenge is how to decompose the computation algorithm into sequences of parallel hardware processes while efficiently managing data flow through the parallel pipelines of these processes.
In this article, we'll discuss our experiences exploring architectures with Xilinx PicoBlaze controllers, and present a design strategy employing the ESL techniques of model-based and C-based design to demonstrate how you can rapidly integrate highly parameterizable DSP hardware primitives into power-efficient high-performance implementations in Spartan devices.
Hardware Acceleration and Reuse
High-performance implementations of floating-point DSP algorithms in FPGAs require single-cycle parallel memory accesses and effective use of pipelined arithmetic operators. Many common DSP vector and matrix operations can be split into batch calculations fulfilling these requirements. Our architectures comprise of Xilinx PicoBlaze worker processors, each with a dedicated DSP hardware accelerator (Figure 1). Each worker can do preparatory tasks for the next batch in parallel with its hardware accelerator. Once the DSP hardware accelerator finishes the computation, it issues an interrupt to the worker. The worker's job is to combine the accelerated parts of the computation into a complete DSP algorithm.

Figure 1. PicoBlaze-based architecture for floating-point DSP. The DSP hardware accelerators are modeled and implemented using Celoxica DK.
It is ideal if you limit implementations to the batch operations of each worker starting in a block RAM, performing a relatively simple sequence of pipelined operations at the maximum clock speed and returning the result(s) back to another block RAM. You can effectively map these primitives to hardware, including the complete autonomous data-flow control in hardware. You can also code the related dedicated generators of address counters and control signals in Handel-C, using several synchronized do-while loops. Simulink is effective for fast derivation of bit-exact models of the batch calculations in DSP hardware accelerators.
To read the full article, click here
Related Semiconductor IP
- Post-Quantum Digital Signature IP Core
- Compact Embedded RISC-V Processor
- Power-OK Monitor
- RISC-V-Based, Open Source AI Accelerator for the Edge
- Securyzr™ neo Core Platform
Related White Papers
- Implementing LTE on FPGAs
- Using an interface wrapper module to simplify implementing PCIe on FPGAs
- A Resource-Driven Approach for Implementing CNNs on FPGAs Using Adaptive IPs
- How to implement double-precision floating-point on FPGAs
Latest White Papers
- DRsam: Detection of Fault-Based Microarchitectural Side-Channel Attacks in RISC-V Using Statistical Preprocessing and Association Rule Mining
- ShuffleV: A Microarchitectural Defense Strategy against Electromagnetic Side-Channel Attacks in Microprocessors
- Practical Considerations of LDPC Decoder Design in Communications Systems
- A Direct Memory Access Controller (DMAC) for Irregular Data Transfers on RISC-V Linux Systems
- A logically correct SoC design isn’t an optimized design