Innovative Memory Architectures for AI

One of the biggest trends in the industry today is the shift towards AI computing at the edge. For many years the expectation was that the huge datacenters on the cloud would be the ones performing all the AI tasks, and the edge devices would only collect the raw data and send it to the cloud, potentially receiving the end directives after the analysis was done.

More recently, however, it has become more and more evident that this can’t work. While the learning task is a strong fit for the cloud, performing inference on the cloud is less optimal.

With the promise of lower latency, lower power and better security, we are seeing AI inference in a growing number of edge applications, from IoT and smart home devices all the way up to critical applications like automotive, medical, and aerospace and defense.

Since edge devices are often small, battery-powered, and resource-constrained, edge AI computing resources must be low-power, high-performance, and low-cost. This is a challenge considering power-hungry AI workloads, which must rely on the storage of large amounts of data in memory and the ability to quickly access it. Some models have millions of parameters (e.g., weights and biases), which must be continually read from memory for processing. This creates a fundamental challenge in terms of power consumption and latency in computing hardware.

Data movement is a key contributor to power consumption. Within chips, significant power is consumed while accessing the memory arrays in which the data is stored and while transferring the data over the on-chip interconnect. The memory access and speed of the interconnect also contribute to latency, which limits the speed of the AI computation. Speed and power both get significantly worse when the data needs to be moved between two separate chips.

To keep edge computing resources low-power and low-latency, hardware must be designed so that memory is as close as possible to the computing resources.

The continuous move to smaller process geometries has helped to keep power consumption to a minimum and has also reduced latency for AI tasks. But while computing resources continually scale to more advanced nodes, Flash memory hasn’t been able to keep pace. Because of this, it isn’t possible to integrate Flash and an AI inference engine in a single SoC at 28nm and below for edge AI.

To read the full article, click here

×
Semiconductor IP