Understanding Peak Floating-Point Performance Calculations

Michael Parker, Altera
EETimes (10/20/2014 02:03 PM EDT)

DSPs, GPUs, and FPGAs serve as accelerators for many CPUs, providing both performance and power efficiency benefits. Given the variety of computing architectures available, designers need a uniform method to compare performance and power efficiency. The accepted method is to measure floating-point operations per second (FLOPs), where a FLOP is defined as either an addition or multiplication of single (32 bit) or double (64 bit) precision numbers in conformance with the IEEE 754 standard. All higher-order functions, such as division, square root, and trigonometric operators, can be constructed using adders and multipliers. As these operators, as well as other common functions such as fast Fourier transforms (FFTs) and matrix operators, require both adders and multipliers. There is commonly a 1:1 ratio of adders and multipliers in all these architectures.

Let's look at how we go about comparing the performance of the DSP, GPU, and FPGA architectures based on their peak FLOPS rating. The peak FLOPS rating is determined by multiplying the sum of the adders and multipliers by the maximum operation frequency. This represents the theoretical limit for computations, which can never be achieved in practice, since it is generally not possible to implement useful algorithms that can keep all the computational units occupied all the time. It does however provide a useful comparison metric.

To read the full article, click here

Understanding Peak Floating-Point Performance Calculations

Related Semiconductor IP

Related Articles

Latest Articles

Related Articles

Understanding and selecting higher performance NAND architectures

Understanding the "e" verification language

Understanding the Semiconductor Intellectual Property (SIP) Business Process

Understanding the MAC impact of 802.11e: Part 2 (By Simon Chung and Kamila Piechota, Silicon and Software Systems)

Crypto-RV: High-Efficiency FPGA-Based RISC-V Cryptographic Co-Processor for IoT Security

In-Pipeline Integration of Digital In-Memory-Computing into RISC-V Vector Architecture to Accelerate Deep Learning

QMC: Efficient SLM Edge Inference via Outlier-Aware Quantization and Emergent Memories Co-Design

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design

COVERT: Trojan Detection in COTS Hardware via Statistical Activation of Microarchitectural Events

Understanding Peak Floating-Point Performance Calculations

Subscribe to the Semi IP Hub Newsletter

Related Semiconductor IP

Related Articles

Latest Articles