How to accelerate algorithms by automatically generating FPGA coprocessors
Recent advances in C-to-FPGA design methodologies and tools facilitate the rapid creation of hardware-accelerated embedded systems.
By Glenn Steiner, Kunal Shenoy, Dan Isaacs (Xilinx), and David Pellerin (ImpulseC)
Today's designers are constrained by space, power, and cost, and they simply cannot afford to implement embedded designs with gigahertz-class computers. Fortunately, in embedded systems, the greatest computational requirements are frequently determined by a relatively small number of algorithms. These algorithms, identified through profiling techniques, can be rapidly converted into hardware coprocessors using design automation tools. The coprocessors can then be efficiently interfaced to the offloaded processor, yielding "gigahertz-class" performance.
In this article, we explore code acceleration and techniques for code conversion to hardware coprocessors. We also demonstrate the process for making trade-off decisions with benchmark data through an actual image-rendering case study involving an auxiliary processor unit (APU)-based technique. The design uses an immersed PowerPC implemented in a platform FPGA.
By Glenn Steiner, Kunal Shenoy, Dan Isaacs (Xilinx), and David Pellerin (ImpulseC)
Today's designers are constrained by space, power, and cost, and they simply cannot afford to implement embedded designs with gigahertz-class computers. Fortunately, in embedded systems, the greatest computational requirements are frequently determined by a relatively small number of algorithms. These algorithms, identified through profiling techniques, can be rapidly converted into hardware coprocessors using design automation tools. The coprocessors can then be efficiently interfaced to the offloaded processor, yielding "gigahertz-class" performance.
In this article, we explore code acceleration and techniques for code conversion to hardware coprocessors. We also demonstrate the process for making trade-off decisions with benchmark data through an actual image-rendering case study involving an auxiliary processor unit (APU)-based technique. The design uses an immersed PowerPC implemented in a platform FPGA.
To read the full article, click here
Related Semiconductor IP
- HBM4 PHY IP
- Ultra-Low-Power LPDDR3/LPDDR2/DDR3L Combo Subsystem
- MIPI D-PHY and FPD-Link (LVDS) Combinational Transmitter for TSMC 22nm ULP
- HBM4 Controller IP
- IPSEC AES-256-GCM (Standalone IPsec)
Related Articles
- How to accelerate genomic sequence alignment 4X using half an FPGA
- How FPGA technology is evolving to meet new mid-range system requirements
- How to accelerate memory bandwidth by 50% with ZeroPoint technology
- Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage
Latest Articles
- ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update
- A 14ns-Latency 9Gb/s 0.44mm² 62pJ/b Short-Blocklength LDPC Decoder ASIC in 22FDX
- Pipeline Stage Resolved Timing Characterization of FPGA and ASIC Implementations of a RISC V Processor
- Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing
- Leveraging FPGAs for Homomorphic Matrix-Vector Multiplication in Oblivious Message Retrieval