Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators
By Ashwin Sanjay Lele, Bo Zhang, Win-San Khwa, Meng-Fan Chang (TSMC)
Abstract
Unprecedented penetration of artificial intelligence (AI) algorithms has brought about rapid innovations in electronic hardware, including new memory devices. Nonvolatile memory (NVM) devices offer one such attractive alternative with ∼2× density and data retention after powering off. Compute-in-memory (CIM) architectures further improve energy efficiency by fusing the computation operations with AI model storage. Electronic characteristics of NVM devices, like resistance in the two resistance states, directly affect the circuit designers’ decisions and result in the varying performance of NVM-CIM chips. In this mini review, we assess the bounds on device resistances for accuracy and circuit performance to suggest recommendations to device engineers for frictionless device–circuit–system interactions. Furthermore, we review challenges in reliably programming NVM devices, followed by benchmarking recent NVM-CIM chips. Our literature review and analytical modeling reveal that a high resistance ratio and low variability are favored, and the resistance in a low resistance state is bound by accuracy and circuit performance constraints.
KEYWORDS: compute-in-memory, nonvolatile memory, RRAM, PCM, MRAM
Introduction
AI workloads require storage of models close to the computationunits to avoid data movement to improve both energy efficiencyand throughput. SRAM is conventionally used for on-chipstorage because of its CMOS compatibility, reliable operation,and scaling to advanced technology nodes. SRAM iscomplemented by DRAM for larger and denser off-chip memoryhosted on a separate die. The difference in fabrication processprevents DRAM integration on the CMOS flow, while a largerarea of SRAM caused by a 6-transistors (6T) bitcell hinders largeon-chip storage. Nonvolatile memory (NVM) devices offeradvantages of density, like DRAM and CMOS, compatible withon-die integration, like SRAM. 1-Transistor-1-resistor(1T1R) bitcells of RRAM and MRAM demonstrate significanthigh density compared to 6T SRAM bitcells. Their memoryoperation stems from their physical characteristics like materialcrystallization in phase change memory (PCM), conductivefilament formation in RRAM, and spin alignment in MRAM. This gives rise to distinct resistance states called the lowresistance state (LRS) and high resistance state (HRS). Thesebinary states offer distinguishable currents when sensed using areading voltage (Vread) and can be used to store 0/1. NVMs havebeen shown to be integrated within the CMOS process atvarious nodes, like 12, 22, and 40 nm for RRAM, 14, 22, and18 nm for MRAM, and 14 and 40 nm for PCM.
Additionally, they retain the stored data even when they areisolated from the power supply, resulting in additional energysavings in standby operations. Typical resistance characteristicsand operational principles of popular NVMs are shown in Figure1a.Nevertheless, energy overheads of fetching and storing thedata between memory and compute still continue if NVMs areused only for on-chip storage. Figure 1c shows a conventionalVon-Neumann architecture with an AI model stored in thememory with a physically separate compute unit carrying outMAC operations. Constant data movement between memoryand compute causes significant energy and latency overhead.NVM storage improves memory density and may reducetransfers to and from DRAM (Figure 1d). Compute-in-memory(CIM) takes a more aggressive approach and merges a part ofthe MAC operation within memory array to reduce datamovement even more for efficiency and speed. Figure 1e showsan example of an analog CIM array with NVM devices like
RRAM storing the weights using HRS and LRS as bits 0/1,respectively. The wordline (WL) is driven by the inputactivation through a wordline driver (WLD) and the NVMdevice allows passage of current depending on the resistancestate. Bit-wise multiplication happens within the NVM devices,whereas accumulation is carried out over the BL/SL. The sum ofthe currents represents the multiply and accumulation (MAC)result between activation and weight, and it is converted to adigital code for post-MAC processing by an analog-to-digitalconvertor (ADC). Without extra data movement and multirowaccess, the CIM macro usually exhibits high energy efficiency(tera-operations/sec/watt) and high compute density (tera-operations/sec/mm2).
Numerous material and device candidates have beenproposed in recent years for NVM-CIM operations withdifferent switching materials and electrodes, e.g., resistancein LRS may vary from 700 Ω to 930 MΩ for RRAM and 900 Ωto 6 MΩ for MRAM. However, the resistance ratio (K = RHRS/RLRS) remains relatively constant, as shown in Figure 1a.Different devices provide a large range of resistance values(LRS/HRS), write characteristics, and endurance perform-ance. MAC operation in CIM is known to be affected by theresistance ratio (K = RHRS/RLSR), read current in LRS (ILRS =Vread/RLRS), and process-induced variability (σ) (Figure1b). These parameters affect CIM accuracy, energy con-sumption, and compute latency at circuit level, and further, atsystem level. Therefore, early identification of deviceparameter design space given circuit/system specifications helpsthe material/device researchers to make design choices in thedevelopment of these resistive memory devices. In this minireview, we survey the recent literature to provide an analyticalmodel on how device parameters affect circuit designs for CIMreadout and suggest recommendations to the device andmaterial engineering community for seamless device−circuitinteractions. We focus on maintaining accuracy for readout andminimizing the energy-delay product for the CIM-array toidentify bounds on the device parameters. Our modelingframework may be useful for early design decisions by materials,devices, and circuit engineers, while the summary of literatureshows upcoming challenges and research trends that may pushthe viability of NVM-CIM for commercial applications.
To read the full article, click here
Related Semiconductor IP
- NVM OTP in Huali (40nm, 28nm)
- NVM OTP in Tower (180nm, 110nm)
- NVM OTP in GF (180nm, 130nm, 65nm, 55nm, 40nm, 28nm, 22nm, 12nm)
- NVM MTP in Samsung (130nm)
- NVM MTP in GF (180nm, 55nm)
Related White Papers
- Paving the way for the next generation of audio codec for True Wireless Stereo (TWS) applications - PART 5 : Cutting time to market in a safe and timely manner
- The benefit of non-volatile memory (NVM) for edge AI
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience
- The Future Of Chip Design
Latest White Papers
- A novel 3D buffer memory for AI and machine learning
- Novel Trade-offs in 5 nm FinFET SRAM Arrays at Extremely Low Temperatures
- Assessing Design Space for the Device-Circuit Codesign of Nonvolatile Memory-Based Compute-in-Memory Accelerators
- The SoC design: What’s next for NoCs?
- Streamlining SoC Design with IDS-Integrate™