Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories

Faaiq Waqar ¹, Jiahao Zhang ², Anni Lu ¹, Zifan He ², Jason Cong ², Shimeng Yu ¹
¹Georgia Institute of Technology, Atlanta, GA; ²University of California – Los Angeles, Los Angeles, CA

Abstract

This work presents a novel monolithic 3D (M3D) FPGA architecture that leverages stackable back-end-of-line (BEOL) transistors to implement configuration memory and pass gates, significantly improving area, latency, and power efficiency. By integrating n-type (W-doped In_2O_3) and p-type (SnO) amorphous oxide semiconductor (AOS) transistors in the BEOL, Si SRAM configuration bits are substituted with a less leaky equivalent that can be programmed at logic-compatible voltages. BEOL-compatible AOS transistors are currently under extensive research and development in the device community, with investment by leading foundries, from which reported data is used to develop robust physics-based models in TCAD that enable circuit design. The use of AOS pass gates reduces the overhead of reconfigurable circuits by mapping FPGA switch block (SB) and connection block (CB) matrices above configurable logic blocks (CLBs), thereby increasing the proximity of logic elements and reducing latency. By interfacing with the latest Verilog-to-Routing (VTR) suite, an AOS-based M3D FPGA design implemented in 7 nm technology is demonstrated with 3.4x lower area-time squared product (AT^2), 27% lower critical path latency, and 26% lower reconfigurable routing block power on benchmarks including hyperdimensional computing and large language models (LLMs).

Keywords — FPGA, Monolithic 3D (M3D), Amorphous Oxide Semiconductors, Reconfigurable Computing

I. INTRODUCTION

Over the past three decades, field-programmable gate arrays (FPGAs) have become increasingly popular in domains such as telecommunications (packet processing and network function virtualization), system-on-chip development (prototyping and verification), and hardware acceleration for AI/ML workloads at the edge and in the data center. The advantage of FPGAs lies in their off-the-shelf reconfigurability, enabling the implementation of custom circuit designs using hardware description languages (HDLs) or high-level synthesis (HLS), thereby circumventing the substantial non-recurring engineering (NRE) costs—such as physical design, layout, and fabrication— associated with application-specific integrated circuit (ASIC) development. This adaptability extends their operational longevity in rapidly evolving markets where fixed-function ASICs quickly become obsolete. FPGAs can execute custom applications at over 10× lower power and with >3× runtime reduction compared to CPU implementations. However, although modern FPGAs incorporate hardened macros such as RAMs, processor subsystems, and digital signal processing (DSP) units, designs implemented on an FPGA are still 9× larger and 3-6× slower than an equivalent built on an ASIC.

The principal cause of the power, performance, and area (PPA) disparity between FPGAs and ASICs is a byproduct of their key advantage: reconfigurability. Configuration memories implemented using SRAM enable/disable logical and signal propagation functions to emulate the data path in an ASIC. Modern FPGAs require a substantial number (~2,000-5,000) of configuration bits per tile (Section 2A). The low density and high static power consumption of SRAM cells significantly limit the density of FPGA designs; configuration memories can occupy >50% of a tile's area and account for ~12% of the total static power. Additionally, the routing fabric comprises extensive networks of crossbars, multiplexers, buffers, and wires that dominate the dynamic (~75%) and static (~78%) power consumption. We observe that reductions to the routing and reconfiguration overhead have significant implications on the PPA of FPGAs.

Monolithic 3D (M3D) integrated circuits, enabled by innovations in low-temperature materials processing, permit the use of multiple active tiers on a single substrate by building transistors in the back-end-of-line (BEOL). Among the most promising emerging transistor candidates are amorphous oxide semiconductor (AOS) transistors, owing in part to their commercial adoption in transparent thin-film channels for active display technology. Beyond their BEOL compatibility and stackability, AOS transistors have ultra-low leakage (<10^-15-10^-18 μA/μm), high Ion/Ioff ratio (>4×10⁹), and moderate electron mobility (~15-20 cm2/V·s). AOS transistors have found themselves at the forefront of research on charge-based memories, hybrid M3D standard cells, and BEOL power delivery. Although vast literature exists on the applications of AOS transistors, their candidacy as a fully BEOL-compatible SRAM substitute has been largely unexplored due to the lack of a strong p-type AOS substitute with comparable hole-mobility which by translation degrades the dynamic access speed critical for SRAM-based register and cache memory. However, in FPGAs, SRAM configuration bits are stationary at runtime. Thus, the preferable criterion in a configuration SRAM bit cell is governed by its device stability, static power, and footprint, on which front BEOL-compatible AOS SRAMs can improve upon the Si counterpart.

This paper presents a design-space analysis of a novel M3D FPGA architecture with AOS transistor-based configuration SRAMs and multiplexed routing structures. Our proposed architecture bypasses the need for high-voltage conversion and delivery for programming (a density/reliability bottleneck in prior work on FPGAs employing emerging devices). To enable the precise quantitative study of an M3D FPGA with AOS device integration and its advantages over CMOS FPGAs in 7 nm technology, we develop an evaluation flow based on robust compact transistor models, a custom M3D-compatible version of COFFE and Verilog-to-Routing (VTR) to appraise PPA improvements on benchmarks targeting hyperdimensional computing and natural language processing (NLP).

To read the full article, click here

eFPGA IP Selector

Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories

Abstract

I. INTRODUCTION

Related Semiconductor IP

Related Articles

Latest Articles

Related Articles

Design of Wireless Systems Utilizing Scratchpad Memories

Consumer IC Advances -> Altering algorithms to create '3D' sound

Opto-electronics -> Monolithic integration requires clever process, packaging schemes

3-D bin packing algorithm proposed for SoC testing

Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings

A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback Latency

SNAP-V: A RISC-V SoC with Configurable Neuromorphic Acceleration for Small-Scale Spiking Neural Networks

An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS

A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA

Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories

Abstract

I. INTRODUCTION

Subscribe to the Semi IP Hub Newsletter

Related Semiconductor IP

Related Articles

Latest Articles