FPGA Architectures from 'A' to 'Z' : Part 2
September 08, 2006
Editor's Note: This is Part 2 of an article that is abstracted from Chapter 4 of my book The Design Warrior's Guide to FPGAs, ISBN: 0750676043, with the kind permission of the publisher (See also Part 1).
Embedded multipliers, adders, MACs, etc.
Some functions like multipliers are inherently slow if they are implemented by connecting a large number of programmable logic blocks together. Since these functions are required by a lot of applications, many FPGAs incorporate special hard-wired multiplier blocks (Fig 11).

11. Bird's-eye view of chip with columns
of embedded multipliers and RAM blocks.
Similarly, some FPGAs offer dedicated adder blocks. One operation that is very common in DSP-type applications is called a multiply-and-accumulate. As its name would suggest, this function multiplies two numbers together and adds the result into a running total stored in an accumulator. Hence, it is commonly referred to as a MAC, which stands for Multiply, Add, and aCcumulate (Fig 12).

12. The core functions forming a MAC.
If the FPGA you are working with supplies only embedded multipliers, you would have to implement this function by combining the multiplier with an adder formed from a number of programmable logic blocks, while the result would be stored in some associated flip-flops, in a block RAM, or in a number of distributed RAMs. Life becomes a little easier if the FPGA also provides embedded adders, and some FPGAs provide entire MACs as embedded functions.
To read the full article, click here
Related Semiconductor IP
- Peripheral Sensor Interface (PSI5) Host Controller
- Link Acceleration Unit
- 64-bit, RISC-V, ultra-high performance processors
- 64-bit, RISC-V, performance and data computation processors
- 32-bit, RISC-V, deeply embedded processors
Related Articles
- FPGA Architectures from 'A' to 'Z' : Part 1
- Providing memory system and compiler support for MPSoc designs: Customization of memory architectures (Part 2)
- Power-aware FPGA design (Part 2)
- From a Lossless (~1.5:1) Compression Algorithm for Llama2 7B Weights to Variable Precision, Variable Range, Compressed Numeric Data Types for CNNs and LLMs
Latest Articles
- Croc: Training the Next Generation Chip Designers on Domain-Specific End-to-End Open Source Silicon
- Design and Development of a Neuromorphic Silicon Suite: PVT Sensing, Stochastic LIF Inference, On-Chip STDP Learning, and Crossbar Programming
- LLM4RTL: Tool-Assisted LLM for RTL Generation
- Towards Delta Aware Training: Efficient DNN Weight Storage for Resource-Constrained FPGAs
- CHERI-D: Secure and efficient inline object ID for CHERI temporal memory safety