Implementing floating-point DSP on FPGAs

By Jiri Kadlec, UTIA Prague, and Stephen Chappell, Celoxica

For developers using FPGAs for the implementation of floating-point DSP functions, one key challenge is how to decompose the computation algorithm into sequences of parallel hardware processes while efficiently managing data flow through the parallel pipelines of these processes.

In this article, we'll discuss our experiences exploring architectures with Xilinx PicoBlaze controllers, and present a design strategy employing the ESL techniques of model-based and C-based design to demonstrate how you can rapidly integrate highly parameterizable DSP hardware primitives into power-efficient high-performance implementations in Spartan devices.

Hardware Acceleration and Reuse
High-performance implementations of floating-point DSP algorithms in FPGAs require single-cycle parallel memory accesses and effective use of pipelined arithmetic operators. Many common DSP vector and matrix operations can be split into batch calculations fulfilling these requirements. Our architectures comprise of Xilinx PicoBlaze worker processors, each with a dedicated DSP hardware accelerator (Figure 1). Each worker can do preparatory tasks for the next batch in parallel with its hardware accelerator. Once the DSP hardware accelerator finishes the computation, it issues an interrupt to the worker. The worker's job is to combine the accelerated parts of the computation into a complete DSP algorithm.

Figure 1. PicoBlaze-based architecture for floating-point DSP. The DSP hardware accelerators are modeled and implemented using Celoxica DK.

It is ideal if you limit implementations to the batch operations of each worker starting in a block RAM, performing a relatively simple sequence of pipelined operations at the maximum clock speed and returning the result(s) back to another block RAM. You can effectively map these primitives to hardware, including the complete autonomous data-flow control in hardware. You can also code the related dedicated generators of address counters and control signals in Handel-C, using several synchronized do-while loops. Simulink is effective for fast derivation of bit-exact models of the batch calculations in DSP hardware accelerators.

To read the full article, click here

Implementing floating-point DSP on FPGAs

Related Semiconductor IP

Related Articles

Latest Articles

Related Articles

Using an interface wrapper module to simplify implementing PCIe on FPGAs

A Resource-Driven Approach for Implementing CNNs on FPGAs Using Adaptive IPs

Implementing floating-point algorithms in FPGAs or ASICs

SNAP-V: A RISC-V SoC with Configurable Neuromorphic Acceleration for Small-Scale Spiking Neural Networks

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA

VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration

PDF: PUF-based DNN Fingerprinting for Knowledge Distillation Traceability

Implementing floating-point DSP on FPGAs

Subscribe to the Semi IP Hub Newsletter

Related Semiconductor IP

Related Articles

Latest Articles