Memory Systems for AI: Part 3

In part two of this series, we took a closer look at how the upcoming deployment of 5G technology will enable processing at the edge, and how the industry is further refining the edge into the near edge and the far edge. The near edge is closer to the cloud, while the far edge is closer to the endpoints. As we noted, we expect to see a full range of AI solutions spanning the near and far edge. Specifically, at the near edge, closest to the cloud, AI solutions and memory systems will likely resemble what is seen in cloud data centers including on-chip memory, HBM and GDDR. At the far edge, AI memory solutions will likely be similar to those deployed in endpoint devices, including on-chip memory, LPDDR, and perhaps even DDR.

In this blog post, we’ll explore how to determine if specific AI architectures are limited by either their compute performance or memory bandwidth by introducing the Roofline model. Put simply, the Roofline model illustrates how an application performs on a given processor architecture by plotting performance (operations per second) on the y-axis against the amount of data reuse (also called “operational intensity”) on the x-axis.

The operational intensity of an application on a particular processor is a measure of how many times each piece of data is reused for computations once it’s retrieved from the memory system. If an application has a high operational intensity, it means that that data is reused many, many times in various calculations once it’s retrieved from the memory system. Applications with high operational intensity put less stress on the memory system, because data can be reused often. In contrast, applications with low operational intensity can be bottlenecked by the memory systems of the processors they are running on, because they demand much more memory bandwidth to achieve high performance.

Click here to read more ...

×
Semiconductor IP