Reducing Avoidable Memory Trips In HBM Systems

Picture a highway during rush hour. When a road has limited capacity, traffic backs up quickly because only so many cars can move through at once. Adding more lanes increases capacity, but it does not always guarantee a smoother commute. If cars keep flooding onto the highway, if exits are poorly placed, or if drivers have to stay on the road for long distances, congestion can still build. More lanes help, but the system still depends on how efficiently traffic moves.

Memory systems face many of the same challenges. High-bandwidth memory (HBM) enables advanced AI accelerators and high-performance systems-on-chip (SoCs) to move large data sets quickly.

When bandwidth is not enough

This is where memory hierarchy becomes important. Even when total throughput is high, bandwidth determines how much data can move, while latency determines how quickly the system can respond. However, increased memory bandwidth does not eliminate delays. Each round trip to external memory adds time before the compute engine can continue, creating idle cycles that can become a performance bottleneck. When data is fetched suboptimally, HBM systems can hide inefficiencies in bandwidth headroom while still suffering from poor data reuse, unpredictable access patterns, and repeated trips outside the compute die.

A practical answer is to keep more reusable data on chip. A last-level cache (LLC) provides a solution because it sits between compute engines and external memory, as shown in Figure 1. CPUs, GPUs, NPUs, and other accelerators typically include their own local caches to reduce access latency for frequently used data. However, when data must be shared across engines or exceeds the capacity of the smaller caches, the LLC provides a common cache layer that can satisfy those requests before they reach external memory.

To read the full article on Semiconductor Engineering, click here.


Explore Arteris IP:


×
Semiconductor IP