Sensitivity-Aware Mixed-Precision Quantization for ReRAM-based Computing-in-Memory
By Guan-Cheng Chen 1, Chieh-Lin Tsai 2, Pei-Hsuan Tsai 1, Yuan-Hao Chang 2
1 National Cheng Kung University, Tainan, Taiwan
2 National Taiwan University, Taipei, Taiwan

Abstract
Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation. However, conventional quantization and compression techniques often fail to fully optimize performance and efficiency in these architectures. In this work, we present a structured quantization method that combines sensitivity analysis with mixed-precision strategies to enhance weight storage and computational performance on ReRAM-based CIM systems. Our approach improves ReRAM Crossbar utilization, significantly reducing power consumption, latency, and computational load, while maintaining high accuracy. Experimental results show 86.33% accuracy at 70% compression, alongside a 40% reduction in power consumption, demonstrating the method's effectiveness for power-constrained applications.
To read the full article, click here
Related Semiconductor IP
- 64-bit, RISC-V, ultra-high performance processors
- 64-bit, RISC-V, performance and data computation processors
- 32-bit, RISC-V, deeply embedded processors
- Verification IP for eUSB 2 v2 and USB 2.0
- AFDX 1G Switch IP
Related Articles
- VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
- Pyramid Vector Quantization and Bit Level Sparsity in Weights for Efficient Neural Networks Inference
- A Time for Rebalancing Global Patent Strategies in the Semiconductor Market?
- ARCANE: Adaptive RISC-V Cache Architecture for Near-memory Extensions
Latest Articles
- Design and Development of a Neuromorphic Silicon Suite: PVT Sensing, Stochastic LIF Inference, On-Chip STDP Learning, and Crossbar Programming
- LLM4RTL: Tool-Assisted LLM for RTL Generation
- Towards Delta Aware Training: Efficient DNN Weight Storage for Resource-Constrained FPGAs
- CHERI-D: Secure and efficient inline object ID for CHERI temporal memory safety
- AIA: A 16nm Multicore SoC for Approximate Inference Acceleration Exploiting Non-normalized Knuth-Yao Sampling and Inter-Core Register Sharing