On-chip memory expansion

Overview

The Cache MX IP compresses on-chip L2, L3 SRAM cache enabling 2x effective capacity. SRAM Caches can take upto 30-50% of an SoC xPU silicon real estate and a significant power budget that increases with physical dimensions. While digital logic scales effectively with process technology node shrink, SRAM essentially stopped scaling from 5nm to 3nm technology nodes. The number of compute cores demands higher SRAM capacity to effectively scale compute IPC performance. Increasing SRAM area can negatively impact both the die cost as well as die yield. Cache MX offers a power, area and cost effective alternative to enable performance scaling with single digit latency.

The Cache MX IP compresses on-chip L2, L3 SRAM cache enabling 2x effective capacity. SRAM Caches can take upto 30-50% of an SoC xPU silicon real estate and a significant power budget that increases with physical dimensions. While digital logic scales effectively with process technology node shrink, SRAM essentially stopped scaling from 5nm to 3nm technology nodes. The number of compute cores demands higher SRAM capacity to effectively scale compute IPC performance. Increasing SRAM area can negatively impact both the die cost as well as die yield. Cache MX offers a power, area and cost effective alternative to enable performance scaling with single digit latency.

Key Features

  • On-the-fly compression / decompression of cache lines
  • Optional secureTraining on metadata capability
  • Silicon Verified TSMC N5
  • On-the-fly Multi-algorithm switching capability without recompression

Benefits

  • Standards
    • Z-Trainless (proprietary)
    • Z-ZID (proprietary)
  • Architecture
    • Modular architecture, enables seamless scalability: Multiple, independent Cache MX instances can coexist within SoC without requiring co-ordination
    • Architectural configuration parameters accessible to fine tune performance

Applications

  • Server CPUs, Smart devices and Embedded systems all face the same challenge. The memory bandwidth is limiting the system scaling and the many cores and accelerators are fighting to serve their memory access requests. A wide range of data set from these different applications have been evaluated and they all verify that it is evident that bandwidth acceleration provides a very efficient and effective way to utilize the full memory potential.

Deliverables

  • Performance evaluation license C++ compression model for integration in customer performance simulation model
  • HDL Source Licenses
    • Synthesizable System Verilog RTL (encrypted)
    • Implementation constraints
    • UVM testbench (self-checking)
    • Vectors for testbench and expected results
  • User Documentation
  • FPGA evaluation license
    • Encrypted IP delivery (Xilinx)

Technical Specifications

Maturity
Tape-out
Availability
Immediate
×
Semiconductor IP