How to accelerate algorithms by automatically generating FPGA coprocessors
Recent advances in C-to-FPGA design methodologies and tools facilitate the rapid creation of hardware-accelerated embedded systems.
By Glenn Steiner, Kunal Shenoy, Dan Isaacs (Xilinx), and David Pellerin (ImpulseC)
Today's designers are constrained by space, power, and cost, and they simply cannot afford to implement embedded designs with gigahertz-class computers. Fortunately, in embedded systems, the greatest computational requirements are frequently determined by a relatively small number of algorithms. These algorithms, identified through profiling techniques, can be rapidly converted into hardware coprocessors using design automation tools. The coprocessors can then be efficiently interfaced to the offloaded processor, yielding "gigahertz-class" performance.
In this article, we explore code acceleration and techniques for code conversion to hardware coprocessors. We also demonstrate the process for making trade-off decisions with benchmark data through an actual image-rendering case study involving an auxiliary processor unit (APU)-based technique. The design uses an immersed PowerPC implemented in a platform FPGA.
By Glenn Steiner, Kunal Shenoy, Dan Isaacs (Xilinx), and David Pellerin (ImpulseC)
Today's designers are constrained by space, power, and cost, and they simply cannot afford to implement embedded designs with gigahertz-class computers. Fortunately, in embedded systems, the greatest computational requirements are frequently determined by a relatively small number of algorithms. These algorithms, identified through profiling techniques, can be rapidly converted into hardware coprocessors using design automation tools. The coprocessors can then be efficiently interfaced to the offloaded processor, yielding "gigahertz-class" performance.
In this article, we explore code acceleration and techniques for code conversion to hardware coprocessors. We also demonstrate the process for making trade-off decisions with benchmark data through an actual image-rendering case study involving an auxiliary processor unit (APU)-based technique. The design uses an immersed PowerPC implemented in a platform FPGA.
To read the full article, click here
Related Semiconductor IP
- JESD204E Controller IP
- eUSB2V2.0 Controller + PHY IP
- I/O Library with LVDS in SkyWater 90nm
- 50G PON LDPC Encoder/Decoder
- UALink Controller
Related Articles
- How to accelerate genomic sequence alignment 4X using half an FPGA
- How FPGA technology is evolving to meet new mid-range system requirements
- How to accelerate memory bandwidth by 50% with ZeroPoint technology
- Double Duty: FPGA Architecture to Enable Concurrent LUT and Adder Chain Usage
Latest Articles
- System-Level Isolation for Mixed-Criticality RISC-V SoCs: A "World" Reality Check
- CVA6-CFI: A First Glance at RISC-V Control-Flow Integrity Extensions
- Crypto-RV: High-Efficiency FPGA-Based RISC-V Cryptographic Co-Processor for IoT Security
- In-Pipeline Integration of Digital In-Memory-Computing into RISC-V Vector Architecture to Accelerate Deep Learning
- QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design