An Implementation Study on Fault Tolerant LEON-3 Processor System
IHP GmbH, Frankfurt (Oder), Germany
*Gaisler Research, Göteborg, Sweden
Abstract
:
The paper presents a case study on implementation of the fault tolerant LEON-3 processor system on a chip for space applications. The single-event upset (SEU) tolerance is provided by design. The technique applied detects and corrects up to 4 errors in the register file and caches. The implementation details and system-on-chip features are summarized.
1. INTRODUCTION
The requirement to survive the rough trip into space is necessitating using radiation-tolerant electronics. Among radiation-tolerant electronics, radiation-tolerant (or fault-tolerant) processors play the key role. The heart of a fault-tolerant processor system must be a high-performance, high-speed, modular, low-power RISC microprocessor to meet the ever growing application demands of today’s and tomorrow’s space missions.
We present the fault-tolerant LEON-3 processor system that has been designed for operation in the harsh space environment, and includes functionality to detect and correct single-event upset (SEU) errors in all on-chip RAM memories. The fault-tolerant LEON-3 processor supports most of the functionality in the standard LEON-3 processor [1], and adds the following features:
- Register file SEU error-correction of up to 4 errors per 32-bit word;
- Cache memory error-correction of up to 4 errors per tag or 32-bit word;
- Autonomous and software transparent error handling;
- No timing impact due to error detection or correction.
2. SYSTEM ARCHITECTURE
The system includes a LEON-3 processor core, caches, a combined PROM/SRAM memory controller, an AMBA bus (AHB and APB) including an AHB controller and an AHB/APB bridge, and a standard set of peripheral cores including timers, UARTs, I/O port, interrupt controller and debug interfaces [1].
2.1 Processor Core
LEON-3 is a 32-bit processor core conforming to the IEEE-1754 (SPARC V8) architecture. It is designed for embedded applications, combining high performance with low complexity and low power consumption. The processor core has the following main features: 7-stage pipeline with Harvard architecture, separate instruction and data caches, hardware multiplier and divider, on-chip debug support and multi-processor extensions.
2.2 Caches
LEON-3 has a highly configurable cache system, consisting of a separate instruction and data cache. Both caches can be configured with 1 - 4 sets, 1 - 256 kbytes/set, 16 or 32 bytes per line. Sub-blocking is implemented with one valid bit per 32-bit word. The instruction cache uses streaming during line-refill to minimize refill latency. The data cache uses write-through policy and implements a double-word write-buffer. The data cache can also perform bus-snooping on the AHB bus. Both tag and data arrays are protected with four parity bits, allowing detecting up to four simultaneous errors per cache (tag or data) array word. Upon a detected error, the corresponding cache line is flushed and the instruction is restarted. For diagnostic purposes, error counters are provided to monitor detected and corrected errors in both tag and data arrays of the caches.
2.3 AMBA Bus
Two on-chip buses are provided: AMBA AHB and AMBA APB. The APB is used to access peripherals and on-chip registers, while the AHB is used for high-speed data transfers. The full AHB/APB standard is implemented [2].
AHB is designed for high-performance, high-clock-frequency system modules. It acts as a high-performance system backbone bus. This bus supports the efficient connection of processors, on-chip memories and off-chip external memory interfaces with low-power peripheral functions. LEON-3 uses the AMBA-2.0 AHB to connect the processor cache controllers to the memory controller and other high-speed units. In our configuration, two masters are attached onto the bus: the processor and the UART of debug communication link, and four slaves are provided: memory controller, debug support unit, JTAG, and AHB/APB bridge.
AHB/APB bridge acts as the only master on the APB. All communication between masters on the AHB and slaves on the APB pass through this bridge. The APB is optimized for minimal power consumption and reduced interface complexity to support peripheral functions. It is configured to connect five slaves: interrupt controller, timer, two UARTs, and parallel I/O port.
2.4 Interrupt Interface
LEON-3 supports the SPARC V8 interrupt model with a total of 15 asynchronous interrupts. The interrupt interface provides functionality to both generate and acknowledge interrupts. Interrupts from AHB and APB units are routed through the bus, combined together, and propagated back to all units.
2.5 Fault Tolerant Memory Controller
The fault tolerant 32-bit PROM/SRAM controller uses a common 32-bit memory bus to interface PROM, SRAM and I/O devices. In addition, it also provides an Error Detection And Correction (EDAC) unit, correcting one and detecting two errors. Configuration of the memory controller functions is performed through the APB bus interface.
2.6 Timer Unit
The modular timer unit implements one prescaler and one to seven decrementing timers. Number of timers is configurable through a VHDL-generic. The timer unit acts a slave on APB bus. The unit is capable of asserting interrupt on when timer(s) underflow. Interrupt is configurable to be common for the whole unit or separate for each timer.
2.7 I/O Port
I/O unit implements a scalable I/O port with interrupt support. The port width can be set to 2 - 32 bits through the nbits generic. Each bit in the port can be individually set to input or output, and can optionally generate an interrupt. For interrupt generation, the input can be filtered for polarity and level/edge detection.
2.8 UARTs
AHBUART consists of a UART connected to the AHB bus as a master. A simple communication protocol is supported to transmit access parameters and data. Through the communication link, a read or write transfer can be generated to any address on the AHB bus.
APBUART is provided for serial communications. The UART supports data frames with 8 data bits, one optional parity bit and one stop bit. To generate the bit-rate, each UART has a programmable 12-bit clock divider. Hardware flow-control is supported through the RTSN/CTSN hand-shake signals. Two configurable FIFOs are used for the data transfers between the bus and UART.
2.9 Debug Support Unit
The LEON-3 pipeline includes functionality to allow non-intrusive debugging on target hardware. To aid software debugging, up to four watch-point registers can be enabled. Each register can cause a breakpoint trap on an arbitrary instruction or data address range. When the debug support unit is attached, the watch-points can be used to enter debug mode. Through a debug support interface, full access to all processor registers and caches is provided. The debug interfaces also allows single stepping, instruction tracing and hardware breakpoint/watch-point control. An internal trace buffer can monitor and store executed instructions, which can later be read out over the debug interface.
2.10 JTAG Debug Link
The JTAG debug link provides access to on-chip AHB bus through JTAG. The JTAG link implements a simple protocol which translates JTAG instructions to AHB transfers. Through this link, a read or write transfer can be generated to any address on the AHB bus.
2.11 Power-down Mode
The LEON-3 processor core implements a power-down mode, which halts the pipeline and caches until the next interrupt. This is an efficient way to minimize power-consumption when the application is idle, and does not require tool-specific support in form of clock gating.
3. IMPLEMENTATION AND VERIFICATION
The LEON-3 processor system has been configured and implemented to support the single-event upset (SEU) tolerance of instruction and data caches. Each of the caches consists of a tag array and a data array. The tag array is implemented as an embedded SRAM block of 512 bytes. The data array is composed of two embedded SRAM blocks (one of 2 kbytes and another of 512 bytes). The register file has been implemented in flip-flops and protected against SEU errors. The triple-module-redundancy (TMR) has been provided on all flip-flops. A block diagram of the configuration is shown in Figure 1. The cache organization is presented in Table 1.
Figure 1: Implemented configuration of LEON-3
For system implementation and verification, we have used the original simulation and synthesis scripts [3] having provided necessary modifications. First, modifications have been done to incorporate custom SRAM Verilog simulation models into the original VHDL processor model.
Table 1: Cache organization
Cache Array | Size (KB) | No. of Words | Data Width | Address Width |
I/D Data | 2.5 | 512 | 36 of 40 | 9 |
I/D Tag | 0.5 | 128 | 29 of 32 | 7 |
3.1 Synthesis
The system is fully synthesizable with most synthesis tools. After the configured processor system including SRAM models had been verified, we have modified the synthesis scripts to map the design into the target library. The design with directly instantiated SRAM blocks and pads has been synthesized for a target frequency of 125 MHz using Synopsys Design Compiler [4]. An SDF (Standard Delay Format) file of the synthesized gate-level netlist has been generated too.
3.2 Verification
A generic testbench is provided for generation of a few testbench configurations: FUNC testbench performing a quick check of most on-chip functions, MEM testbench testing all on-chip memory, and FULL testbench combining memory and functional tests, suitable to generate test vectors for manufacturing testing [3]. Numerous simulations using these testbenches have been carried out after synthesis to prove the correct functionality of the design gate-level netlist. All the simulations without and with the corresponding SDF file have been done using ModelSim simulator [5]. Same simulations are used for verification of the netlist of the generated layout.
3.3 Layout
After functionality of the synthesized netlist had been verified, we have created a floorplan using Cadence First Encounter [6]. In floorplanning phase, the memory blocks have been placed as hard macros. Design layout has been generated using a standard sequence of the back-end process steps: power planning, placement, clock tree generation, routing and verification of geometry. The processor system has been fabricated in the IHP’s 0.25µm CMOS technology [7]. The chip photo is shown in Figure 2. Geometrical and electrical features of the chip are summarized in Table 2. The data shows the high performance and low energy consumption of the implemented system-on-chip.
3.4 Testability
The design is highly testable as in addition to functional testing of the complete system-on-chip, a chain of scanable flip-flops (a scan-chain) is implemented. For the inserted scan-chain (made of more than 15000 scanable flip-flops), we have generated more than 1000 manufacturing test vectors by Synopsys TetraMAX Automatic Test Pattern Generator [8] in form of a WGL file. A Verilog DPV testbench has been prepared for serial simulation of all scan data too. All the tests (functional tests and scan test) are executed on the Agilent's chip tester 93000.
Figure 2: Chip photo
Table 2: System-on-chip features
Area (mm2) | 22 |
Signal Ports | 105 |
Power Ports | 20 |
Scan Ports | 1 (3) |
Transistors (x106) | 0.83 |
Cache Memory (kbytes) | 6 |
Scanable Flip-Flops (x103) | 15 |
Power/Frequency (mW/MHz) | 6.2 |
Maximum Frequency (MHz) | 160 |
3.5 Radiation test
To test the SEU tolerance, the LEON-3 processor has been subjected to heavy-ion-error injection using Californium (Cf-252). The tests have been carried out for 3 hours, with a flux of 25 particles/s/cm2 at the device surface. The on-chip monitoring logic reported a total of 281 effective SEU errors, of which 99% were corrected. The cross-section for a memory RAM bit was measured to 7.2E-8 cm2.
4. CONCLUSIONS
This paper presents an experience in implementation of the fault tolerant LEON-3 processor system configured to operate in space conditions. The implemented processor system has been verified and tested. We have demonstrated the performance and features of this processor system (fabricated in the IHP’s 0.25m CMOS technology) that meet requirements imposed by target application.
REFERENCES
1. GRLIB IP Core User’s Manual, Gaisler Research
2. AMBA On-Chip Bus Standard, ARM Inc.
3. GRLIB IP Library User’s Manual, Gaisler Research
4. Design Compiler, Synopsys Inc.
5. ModelSim, Model Technology
6. First Encounter, Cadence Design Systems
7. SiGe:C BiCMOS technologies, IHP GmbH
8. TetraMAX ATPG, Synopsys Inc.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- Tips on using CPLDs to reduce system processor power consumption
- VLSI Based On Two-Dimensional Reconfigurable Array Of Processor Elements And Theirs Implementation For Numerical Algorithms In Real-Time Systems
- Implementation of the AES algorithm on Deeply Pipelined DSP/RISC Processor
- Design and Implementation of an OCP-IP Compliant 64-Node Butterfly Network on Chip on Multi-FPGA
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience