Innovative Memory Architectures for AI
One of the biggest trends in the industry today is the shift towards AI computing at the edge. For many years the expectation was that the huge datacenters on the cloud would be the ones performing all the AI tasks, and the edge devices would only collect the raw data and send it to the cloud, potentially receiving the end directives after the analysis was done.
More recently, however, it has become more and more evident that this can’t work. While the learning task is a strong fit for the cloud, performing inference on the cloud is less optimal.
With the promise of lower latency, lower power and better security, we are seeing AI inference in a growing number of edge applications, from IoT and smart home devices all the way up to critical applications like automotive, medical, and aerospace and defense.
Since edge devices are often small, battery-powered, and resource-constrained, edge AI computing resources must be low-power, high-performance, and low-cost. This is a challenge considering power-hungry AI workloads, which must rely on the storage of large amounts of data in memory and the ability to quickly access it. Some models have millions of parameters (e.g., weights and biases), which must be continually read from memory for processing. This creates a fundamental challenge in terms of power consumption and latency in computing hardware.
Data movement is a key contributor to power consumption. Within chips, significant power is consumed while accessing the memory arrays in which the data is stored and while transferring the data over the on-chip interconnect. The memory access and speed of the interconnect also contribute to latency, which limits the speed of the AI computation. Speed and power both get significantly worse when the data needs to be moved between two separate chips.
To keep edge computing resources low-power and low-latency, hardware must be designed so that memory is as close as possible to the computing resources.
The continuous move to smaller process geometries has helped to keep power consumption to a minimum and has also reduced latency for AI tasks. But while computing resources continually scale to more advanced nodes, Flash memory hasn’t been able to keep pace. Because of this, it isn’t possible to integrate Flash and an AI inference engine in a single SoC at 28nm and below for edge AI.
To read the full article, click here
Related Semiconductor IP
- ISO/IEC 7816 Verification IP
- 50MHz to 800MHz Integer-N RC Phase-Locked Loop on SMIC 55nm LL
- Simulation VIP for AMBA CHI-C2C
- Process/Voltage/Temperature Sensor with Self-calibration (Supply voltage 1.2V) - TSMC 3nm N3P
- USB 20Gbps Device Controller
Related Blogs
- Redefining XPU Memory for AI Data Centers Through Custom HBM4 – Part 1
- Redefining XPU Memory for AI Data Centers Through Custom HBM4 – Part 2
- Redefining XPU Memory for AI Data Centers Through Custom HBM4 – Part 3
- HBM4 Boosts Memory Performance for AI Training
Latest Blogs
- A Comparison on Different AMBA 5 CHI Verification IPs
- Cadence Recognized as TSMC OIP Partner of the Year at 2025 OIP Ecosystem Forum
- Accelerating Development Cycles and Scalable, High-Performance On-Device AI with New Arm Lumex CSS Platform
- Desktop-Quality Ray-Traced Gaming and Intelligent AI Performance on Mobile with New Arm Mali G1-Ultra GPU
- Powering Scale Up and Scale Out with 224G SerDes for UALink and Ultra Ethernet