Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators

By Ashwin Sanjay Lele, Bo Zhang, Win-San Khwa, Meng-Fan Chang (TSMC)

Abstract

Unprecedented penetration of artificial intelligence (AI) algorithms has brought about rapid innovations in electronic hardware, including new memory devices. Nonvolatile memory (NVM) devices offer one such attractive alternative with ∼2× density and data retention after powering off. Compute-in-memory (CIM) architectures further improve energy efficiency by fusing the computation operations with AI model storage. Electronic characteristics of NVM devices, like resistance in the two resistance states, directly affect the circuit designers’ decisions and result in the varying performance of NVM-CIM chips. In this mini review, we assess the bounds on device resistances for accuracy and circuit performance to suggest recommendations to device engineers for frictionless device–circuit–system interactions. Furthermore, we review challenges in reliably programming NVM devices, followed by benchmarking recent NVM-CIM chips. Our literature review and analytical modeling reveal that a high resistance ratio and low variability are favored, and the resistance in a low resistance state is bound by accuracy and circuit performance constraints.

KEYWORDS: compute-in-memory, nonvolatile memory, RRAM, PCM, MRAM

Introduction

AI workloads require storage of models close to the computationunits to avoid data movement to improve both energy efficiencyand throughput. SRAM is conventionally used for on-chipstorage because of its CMOS compatibility, reliable operation,and scaling to advanced technology nodes. SRAM iscomplemented by DRAM for larger and denser off-chip memoryhosted on a separate die. The difference in fabrication processprevents DRAM integration on the CMOS flow, while a largerarea of SRAM caused by a 6-transistors (6T) bitcell hinders largeon-chip storage. Nonvolatile memory (NVM) devices offeradvantages of density, like DRAM and CMOS, compatible withon-die integration, like SRAM. 1-Transistor-1-resistor(1T1R) bitcells of RRAM and MRAM demonstrate significanthigh density compared to 6T SRAM bitcells. Their memoryoperation stems from their physical characteristics like materialcrystallization in phase change memory (PCM), conductivefilament formation in RRAM, and spin alignment in MRAM. This gives rise to distinct resistance states called the lowresistance state (LRS) and high resistance state (HRS). Thesebinary states offer distinguishable currents when sensed using areading voltage (Vread) and can be used to store 0/1. NVMs havebeen shown to be integrated within the CMOS process atvarious nodes, like 12, 22, and 40 nm for RRAM, 14, 22, and18 nm for MRAM, and 14 and 40 nm for PCM.

Additionally, they retain the stored data even when they areisolated from the power supply, resulting in additional energysavings in standby operations. Typical resistance characteristicsand operational principles of popular NVMs are shown in Figure1a.Nevertheless, energy overheads of fetching and storing thedata between memory and compute still continue if NVMs areused only for on-chip storage. Figure 1c shows a conventionalVon-Neumann architecture with an AI model stored in thememory with a physically separate compute unit carrying outMAC operations. Constant data movement between memoryand compute causes significant energy and latency overhead.NVM storage improves memory density and may reducetransfers to and from DRAM (Figure 1d). Compute-in-memory(CIM) takes a more aggressive approach and merges a part ofthe MAC operation within memory array to reduce datamovement even more for efficiency and speed. Figure 1e showsan example of an analog CIM array with NVM devices like

RRAM storing the weights using HRS and LRS as bits 0/1,respectively. The wordline (WL) is driven by the inputactivation through a wordline driver (WLD) and the NVMdevice allows passage of current depending on the resistancestate. Bit-wise multiplication happens within the NVM devices,whereas accumulation is carried out over the BL/SL. The sum ofthe currents represents the multiply and accumulation (MAC)result between activation and weight, and it is converted to adigital code for post-MAC processing by an analog-to-digitalconvertor (ADC). Without extra data movement and multirowaccess, the CIM macro usually exhibits high energy efficiency(tera-operations/sec/watt) and high compute density (tera-operations/sec/mm2).

Numerous material and device candidates have beenproposed in recent years for NVM-CIM operations withdifferent switching materials and electrodes, e.g., resistancein LRS may vary from 700 Ω to 930 MΩ for RRAM and 900 Ωto 6 MΩ for MRAM. However, the resistance ratio (K = RHRS/RLRS) remains relatively constant, as shown in Figure 1a.Different devices provide a large range of resistance values(LRS/HRS), write characteristics, and endurance perform-ance. MAC operation in CIM is known to be affected by theresistance ratio (K = RHRS/RLSR), read current in LRS (ILRS =Vread/RLRS), and process-induced variability (σ) (Figure1b). These parameters affect CIM accuracy, energy con-sumption, and compute latency at circuit level, and further, atsystem level. Therefore, early identification of deviceparameter design space given circuit/system specifications helpsthe material/device researchers to make design choices in thedevelopment of these resistive memory devices. In this minireview, we survey the recent literature to provide an analyticalmodel on how device parameters affect circuit designs for CIMreadout and suggest recommendations to the device andmaterial engineering community for seamless device−circuitinteractions. We focus on maintaining accuracy for readout andminimizing the energy-delay product for the CIM-array toidentify bounds on the device parameters. Our modelingframework may be useful for early design decisions by materials,devices, and circuit engineers, while the summary of literatureshows upcoming challenges and research trends that may pushthe viability of NVM-CIM for commercial applications.

To read the full article, click here

NVM IP Selector

Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators

Abstract

Introduction

Related Semiconductor IP

Related Articles

Latest Articles

Related Articles

Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience

The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization

An Outline of the Semiconductor Chip Design Flow

Understanding the Importance of Prerequisites in the VLSI Physical Design Stage

An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks

Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings

A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback Latency

SNAP-V: A RISC-V SoC with Configurable Neuromorphic Acceleration for Small-Scale Spiking Neural Networks

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators

Abstract

Introduction

Subscribe to the Semi IP Hub Newsletter

Related Semiconductor IP

Related Articles

Latest Articles