Innovative Memory Architectures for AI
One of the biggest trends in the industry today is the shift towards AI computing at the edge. For many years the expectation was that the huge datacenters on the cloud would be the ones performing all the AI tasks, and the edge devices would only collect the raw data and send it to the cloud, potentially receiving the end directives after the analysis was done.
More recently, however, it has become more and more evident that this can’t work. While the learning task is a strong fit for the cloud, performing inference on the cloud is less optimal.
With the promise of lower latency, lower power and better security, we are seeing AI inference in a growing number of edge applications, from IoT and smart home devices all the way up to critical applications like automotive, medical, and aerospace and defense.
Since edge devices are often small, battery-powered, and resource-constrained, edge AI computing resources must be low-power, high-performance, and low-cost. This is a challenge considering power-hungry AI workloads, which must rely on the storage of large amounts of data in memory and the ability to quickly access it. Some models have millions of parameters (e.g., weights and biases), which must be continually read from memory for processing. This creates a fundamental challenge in terms of power consumption and latency in computing hardware.
Data movement is a key contributor to power consumption. Within chips, significant power is consumed while accessing the memory arrays in which the data is stored and while transferring the data over the on-chip interconnect. The memory access and speed of the interconnect also contribute to latency, which limits the speed of the AI computation. Speed and power both get significantly worse when the data needs to be moved between two separate chips.
To keep edge computing resources low-power and low-latency, hardware must be designed so that memory is as close as possible to the computing resources.
The continuous move to smaller process geometries has helped to keep power consumption to a minimum and has also reduced latency for AI tasks. But while computing resources continually scale to more advanced nodes, Flash memory hasn’t been able to keep pace. Because of this, it isn’t possible to integrate Flash and an AI inference engine in a single SoC at 28nm and below for edge AI.
To read the full article, click here
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related Blogs
- The Importance of Memory Architecture for AI SoCs
- Redefining XPU Memory for AI Data Centers Through Custom HBM4 – Part 1
- Redefining XPU Memory for AI Data Centers Through Custom HBM4 – Part 2
- Redefining XPU Memory for AI Data Centers Through Custom HBM4 – Part 3
Latest Blogs
- Cadence Announces Industry's First Verification IP for Embedded USB2v2 (eUSB2v2)
- The Industry’s First USB4 Device IP Certification Will Speed Innovation and Edge AI Enablement
- Understanding Extended Metadata in CXL 3.1: What It Means for Your Systems
- 2025 Outlook with Mahesh Tirupattur of Analog Bits
- eUSB2 Version 2 with 4.8Gbps and the Use Cases: A Comprehensive Overview