Memory Safety Features Impact on Ibex based processor area
By lowRISC
Abstract
Memory safety is a critical concern for modern embedded systems, particularly in security-sensitive applications. This post explores the area impact of adding memory safety extensions to the Ibex RISC-V core, focusing on Physical Memory Protection (PMP) and Capability Hardware Enhanced RISC Instructions for embedded devices (CHERIoT). We synthesise the extended Ibex® cores using a commercial tool targeting the open FreePDK45 process and provide a detailed area breakdown and discussion of the results.
The PMP configuration we consider is one with 16 PMP regions. We find that the extensions increase the core size by 24 thousand gate-equivalent (kGE) for PMP and 33 kGE for CHERIoT. The increase is mainly due to the additional state required to store information about protected memory. While this increase amounts to 42% for PMP and 57% for CHERIoT in Ibex’s area, its effect on the overall system is minimal. In a complete SoC, like the secure microcontroller OpenTitan Earl Grey, where the core represents only a fraction of the total area, the estimated system-wide overhead is 0.6% for PMP and 1% for CHERIoT. Given the security benefits these extensions provide, the area trade-off is well justified, making Ibex a compelling choice for secure embedded applications.
Memory Safety in Ibex-based Processors
Ibex is a compact, efficient, and open-source RISC-V core developed by lowRISC, designed for low-power and embedded applications. It supports the RV32IMCB instruction set and features a configurable two- or three-stage pipeline, making it well-suited for constrained environments such as microcontrollers and security-focused processors. Ibex is best known as the core of OpenTitan, a security SoC platform equipped with a wide range of security and I/O peripherals, and the world’s first commercial-grade open-source silicon root of trust.
In security domains, memory safety vulnerabilities are a major source of software security issues. Reports from Microsoft and Google indicate that around 70% of security-related bug fixes in Windows and Chrome stem from memory safety errors. Memory-safety vulnerabilities are similarly responsible for insecurity in embedded systems. Exploiting these vulnerabilities allows an attacker to compromise the confidentiality, integrity, and authenticity of data stored and processed by the system.
To enhance memory safety, Ibex has been used to implement two different extensions. The Ibex core maintained by lowRISC supports RISC-V’s Physical Memory Protection (PMP) as well as the PMP Enhancements for memory access and execution prevention on Machine mode (Smepmp), also known as enhanced PMP (ePMP). In a parallel, alternative approach, Microsoft extended Ibex with Capability Hardware Enhanced RISC Instructions (CHERI) for embedded devices in their CHERIoT-Ibex implementation. In this post, we examine and compare the area impact of these two extensions on the Ibex core.
Physical Memory Protection (PMP)
PMP enhances memory access control and execution prevention by defining access rights for a configurable number of memory regions based on privilege levels. These permissions are enforced according to privilege levels, helping to enforce memory isolation and to prevent unauthorised access.
RISC-V’s PMP specification enforces rules on all privilege modes. If at least one PMP entry is configured, all privilege modes except for machine mode are denied access to regions that do not have a corresponding rule. This, however, does not allow PMP regions to apply to machine mode without applying to all less privileged modes as well. For example, PMP cannot restrict machine mode access to memory that should be accessible by user mode. To provide a more flexible way to secure memory regions from machine mode, RISC-V specifies the PMP Enhancements for memory access and execution prevention on Machine mode (Smepmp) extension. It introduces mechanisms that restrict machine mode’s access to memory lacking a corresponding PMP rule and mechanisms that permit access in less privileged modes but deny it in machine mode. Smepmp helps mitigate attack vectors where attackers attempt to trick high-privileged processes into accessing or executing tampered memory from lower-privileged processes.
PMP and Smepmp are relatively simple to implement in hardware and set up in software, making them attractive security features. However, their fixed and coarse-grained number of regions can be a limiting factor, as RISC-V permits at most 64 PMP regions, and misconfigurations can introduce vulnerabilities. And while these mechanisms are straightforward to implement in simple cases, using PMP regions effectively in software becomes challenging when managing complex access control policies or supporting fine-grained sharing between tasks. Neither mechanism is intended to provide language-level memory safety.
CHERI and CHERIoT
CHERI, originally developed by the University of Cambridge and SRI International, takes a different approach to memory safety by extending each memory pointer into a capability. Instead of accessing memory directly through addresses, memory operations must go through these capabilities — which not only store a memory address but also enforce strict limits on which parts of memory can be accessed and how. CHERI’s design ensures that these capabilities cannot be modified in unsafe ways and includes features for securely compartmentalising different parts of a program, making software more resistant to attacks. This enables fine-grained control over memory access and significantly improves security.
Although the initial CHERI work focused on memory safety for application-class cores, embedded applications often require tailored solutions for microcontrollers. To address this, Microsoft developed CHERIoT, successfully bringing the benefits of CHERI to embedded devices without significantly increasing complexity or resource demands. On top of that, CHERIoT provides the primitives to create strong compartmentalisation models without the burden of backwards compatibility with application-class software, as well as efficient temporal safety enabled by simpler core designs. This makes CHERIoT a compelling choice for embedded systems that need both efficiency and strong security guarantees.
Compared to PMP, CHERIoT enforces fine-grained restrictions specific to each individual memory access instead of general restrictions that apply to all memory accesses. PMP can provide segmentation and isolation but is limited by the number of regions it has. CHERIoT allows C and C++ code to be fully memory safe in a way that would be infeasible in a PMP setup with limited regions. This memory safety includes deterministically guaranteeing both spatial and temporal safety. This results in a more scalable solution for memory safety and fine-grained compartmentalisation, albeit at the cost of additional metadata in memory.
Area Comparison of Extensions
Evaluation methodology
To evaluate the area overhead of different extensions designed to mitigate memory safety issues, we synthesised Ibex using a commercial tool targeting the open-source FreePDK45 technology. In our synthesis, we included only the processor and its instruction cache logic, excluding data memories and any surrounding SoC infrastructure. We applied reasonable timing constraints to the input and output ports and used timing models from OpenRAM for all SRAM macros. However, we did not include the SRAM macro of the instruction cache in our analysis, as it is highly technology-dependent and independent of the evaluated extensions. Therefore, we report only the core area to provide a detailed view of the changes within Ibex itself. Later, we will provide insights into the area impact of a surrounding system.
We evaluated three different Ibex implementations:
Ibex Baseline (RV32EMCB)
- 16 32-bit registers
- Support for the M (multiply/divide) and B (bit manipulation) extensions
- WB pipeline-stage
- Dedicated Branch Target ALU
- Instruction cache ECC
- No PMP, CHERIoT, or dual-core lockstep (DCLS)
Ibex PMP
- Same features as the Baseline
- Additional physical memory protection (PMP) support for 16 regions, including the Smepmp extension (PMP enhancements for memory access and execution prevention in Machine mode)
Ibex CHERIoT
- Same features as the Baseline
- CHERIoT support, which includes a CHERI execute stage, checks for all memory accesses, the background revocation engine (TBRE), capability load filter, and expanded register file. The core does not include the optional stack zeroization engine (STKZ).
Results
The areas, measured in gate equivalents (GE), are reported in the table below. Note, in the FreePDK45 technology, one gate (NAND2_X1) corresponds to 0.798 µm2. In its embedded configuration, Ibex itself is relatively small. To enable protection against memory vulnerabilities, both the PMP and CHERIoT extensions introduce an overhead roughly equivalent to half the core size. The CHERIoT-enabled core is 11% larger than the PMP-enabled core in the configurations we investigated.
Configuration | Area (kGE) | Area (Overhead) |
Ibex | 57 | Baseline |
Ibex+PMP | 81 | Baseline +42% |
Ibex+CHERIoT | 90 | Baseline +57% |
Area Breakdown
To analyse the overhead sources in detail, we provide an area breakdown in the table and figures below, comparing the Baseline, PMP, and CHERIoT extensions. In the figures, each box is annotated with the area in kGE and a percentage increase to the other version in the same figure, while black boxes represent modules specific to the respective extension. The arrangement of the blocks follows the hierarchy in the source code, but does not suggest an actual floor plan in silicon.
Block | Area Baseline (kGE) | Area PMP (kGE) | Overhead PMP> | Area CHERIoT (kGE) | Overhead CHERIoT |
Ibex | 57.3 | 81.4 | 42.1% | 90.3 | 57.5% |
|-Core | 36.0 | 60.2 | 67.2% | 62.7 | 74.2% |
| |-CS Register | 7.4 | 13.7 | 84.2% | 14.5 | 95.1% |
| |-IF Stage | 10.5 | 10.7 | 2.1% | 10.8 | 2.8% |
| |-ID Stage | 2.8 | 2.8 | 0.4% | 3.3 | 18.0% |
| |-EX Block | 13.5 | 13.5 | 0.0% | 13.8 | 2.2% |
| |-LSU | 1.1 | 1.1 | 2.1% | 2.1 | 99.1% |
| |-WB Stage | 0.7 | 0.7 | 0.4% | 1.5 | 130.8% |
| |-PMP | – | 17.5 | ∞ | – | – |
| |-CHERI EX Block | – | – | – | 12.3 | ∞ |
| |-CHERI TBRE | – | – | – | 3.2 | ∞ |
| |-CHERI Load Filter | – | – | – | 1.1 | ∞ |
|-ICache Data Control | 9.6 | 9.5 | -0.7% | 9.5 | -1.1% |
|-ICache Tag Control | 5.9 | 5.9 | 0.1% | 5.9 | 0.3% |
|-Register File | 5.7 | 5.7 | -0.2% | 12.2 | 112.5% |
The PMP core’s area increase comes primarily from two modules:
- The PMP block, which implements PMP checking for 16 regions in Ibex’s pipeline stage, is approximately 18 kGE in size—comparable to Ibex’s ALU.
- Control and Status Registers (CSRs) also contribute significantly. To store PMP configurations and protected address ranges, multiple CSRs must be added, depending on the number of regions. Supporting 16 address regions increases the CSR size by a factor of 1.8x.
The CHERIoT core exhibits a similar overhead pattern:
- New instruction logic for CHERI EX, TBRE, and the load filter contributes 16 kGE.
- CSRs increase further, due to additional registers tracking system state, enlarging the CSR block by a factor of 1.9x.
- Register file size doubles to accommodate the additional capability registers.
- The LSU and WB stages grow to handle the extra functionality. While they double in size, their absolute contribution to the total overhead remains small, as these modules are inherently small.
A direct comparison of the PMP and CHERIoT-enabled cores reveals an interesting trend: both require additional execution blocks and CSRs, resulting in similar core sizes. The key difference lies in CHERIoT’s extension of the register file, which ultimately makes the CHERIoT core 11% larger than the PMP core.
System Impact
So far, we have primarily discussed the impact of memory safety extensions on the Ibex core. However, in a complete system, the core typically occupies only a small fraction of the total chip area. Components such as the instruction and data paths, interconnects, peripherals, and accelerators quickly take up significantly more space—especially with respect to small processors like Ibex. As a result, even a doubling of core size may have only a minor impact on the overall chip area.
To assess the overall impact on the system, we synthesised the OpenTitan Earl Grey top level, i.e., the discrete chip implementation of OpenTitan. We use the successfully taped-out v1.0.0 version publicly available on GitHub and follow the same methodology as in previous experiments with the open-source FreePDK45 technology and targeting a main clock speed of 100 MHz. Technology-specific and proprietary memory macros including embedded Flash, OTP fuses, as well as ROM and all SRAMs account for roughly 50% of the total chip area. This gives us the breakdown of Earl Grey’s chip area, shown in the pie chart below.
The Baseline Ibex configuration discussed in this blog post would occupy 1.4% of the total chip area in such an SoC or 2.8% of the total logic area. Enabling the PMP extension increases Ibex’s area by 42%. Ibex then makes up 3.9% of the total logic area, and the total logic area increases by 1.2% compared to the baseline SoC. Enabling the CHERIoT extension instead increases Ibex’s area by 58%. Ibex then makes up 4.3% of the total logic area. In this version of the SoC, the logic area is 1.6% larger than the baseline SoC’s. With respect to the total chip area, the area increase of these core extensions amounts to roughly 0.6% (PMP) and 0.8% (CHERIoT).
That said, security hardening of an SoC involves additional overhead beyond just the core. As discussed, CHERIoT transforms all pointers in the system into capabilities. This requires an extra validity tag bit per capability, which necessitates an expansion of the data memory to accommodate this additional bit. Specifically for Earl Grey, this means to increase the width of the main and retention SRAMs from 32 to 33 bits, which results in an overall increase in chip area of 0.2%. Together with the area increase of the core extension, the estimated cost of adding support for CHERIoT in a system like OpenTitan Earl Grey is around 1%.
Summary
In conclusion, memory safety features such as PMP and CHERIoT offer different security benefits for embedded and low-power applications. In this blog post, we have analysed the area overhead of incorporating them into Ibex-based processors. The results show that the CHERIoT extension causes a slightly larger increase in core area compared to PMP (57.5% vs 42.1%), primarily due to the expansion of the register file. However, in a complete system such as a secure microcontroller like OpenTitan Earl Grey, the estimated impact on overall chip area would be only 0.6% (for PMP) and 1% (for CHERIoT), as the core typically occupies a small fraction of the total system size. The adoption of these memory safety features provides enhanced protection against vulnerabilities without significantly increasing the system’s area, with CHERIoT additionally enforcing fine-grained spatial and temporal memory safety, as well as scalable software compartmentalisation, at only a modest area increase, making them valuable choices for systems where security is paramount.
About lowRISC®
Founded in 2014 at the University of Cambridge Department of Computer Science and Technology, lowRISC is a not-for-profit company/CIC that provides a neutral home for collaborative engineering to develop and maintain open source silicon designs and tools for the long term. The lowRISC not-for-profit structure combined with full-stack engineering capabilities in-house enables the hosting and management of high-quality projects like OpenTitan and Sunburst via the Silicon Commons® approach.
For more information, visit https://lowrisc.org/
₁ Under the tooling used in this work, specifically targeting the FreePDK45 process and using the OpenRAM memory macro compiler, a 4 KiB SRAM macro (1024 rows, 32-bit words, byte-wise write enable, 1 read/write port) corresponds to roughly 63 kGE, which translates to a density of 1.94 GE/bit. The size of the baseline Ibex would therefore correspond to roughly 3.6 KiB of SRAM memory, with CHERIoT adding another 2 KiB on top. This is intended as a rough rule of thumb to aid comparison. It is important to note that this translation factor is highly dependent on the technology node, the SRAM implementation, and the SRAM macro size/configuration.
Related Semiconductor IP
- Message filter
- SSL/TLS Offload Engine
- TCP/UDP Offload Engine
- JPEG-LS Encoder IP
- JPEG XS - Low-Latency Video
Related White Papers
- VLSI Based On Two-Dimensional Reconfigurable Array Of Processor Elements And Theirs Implementation For Numerical Algorithms In Real-Time Systems
- How to analyze processor features for network use
- Real-Time Video System Design Based on the NIOS II Processor and µCLinux
- The Impact of Make vs Buy Decisions for Memory Interface Solutions
Latest White Papers
- Memory Safety Features Impact on Ibex based processor area
- A Survey on the Design, Detection, and Prevention of Pre-Silicon Hardware Trojans
- Learning Cache Coherence Traffic for NoC Routing Design
- Generative AI for Analog Integrated Circuit Design: Methodologies and Applications
- HIPR: Hardware IP Protection through Low-Overhead Fine-Grain Redaction