Interstellar: Fully Partitioned and Efficient Security Monitoring Hardware Near a Processor Core for Protecting Systems against Attacks on Privileged Software
By YongHo Song, Byeongsu Woo, Youngkwang Han and Brent ByungHoon Kang
Korea Advanced Institute of Science and Technology
Abstract
The existing approaches to instruction trace-based security monitoring hardware are dependent on the privileged software, which presents a significant challenge in defending against attacks on privileged software itself. To address this challenge, we propose Interstellar, which introduces a partitioned hardware near the CPU’s main core and leverages the benefit of hardware-level security monitoring. Interstellar is fully partitioned, parallelized, and simultaneously detecting security monitoring hardware. Interstellar’s design makes it hard for malicious software to reverse-engineer how Interstellar detects the attacks, and Interstellar efficiently protects the system against the attacks on the privileged software (e.g., Trusted Execution Environment (TEE)). Moreover, Interstellar not only monitors but also blocks various attacks in a timely manner without stalling a CPU core by designing with a finite-state machine.
We implemented a prototype of Interstellar in Rocket chip using a hardware description language and evaluated Interstellar with a Linux kernel and a custom TEE-equipped Linux kernel for Rocket chip on two different FPGA boards. The performance overhead of Interstellar is negligible for benchmark applications. The average performance overhead incurred from Interstellar on 50MHz Rocket core for three different benchmarks is 0.102%.
CCS Concepts
• Security and privacy → Hardware security implementation; Operating systems security; Information flow control.
Keywords
Hardware security; Partitioned security monitor; Instruction-tracing; Finite-state machine
1 Introduction
Progress in privilege-based systems. Security mechanisms for modern computing systems are dominantly based on granting different privileges to each software in a hierarchical manner, and the system software that is protected by a privilege mechanism is employed to prevent faults and malicious behaviors in some applications from propagating to their entire system. For example, an Operating System (OS) kernel can access privileged resources with its privileged instructions, which can only be executed in the kernel mode of the CPU hardware. Also, the OS kernel can manage the page mapping for applications by leveraging the OS privilege.
However, the system software, such as the OS kernel, has a probability of finding vulnerabilities in the system software due to its large attack surface. Also, the system software can access any classified data related to the system’s security or private data using its privilege. Thus, if an attacker uncovers vulnerabilities in a large amount of code of the system software, the attacker can easily leak important data by exploiting the vulnerabilities and by taking advantage of the system software’s privilege. To protect important data from the system software that has large attack surfaces, Trusted Execution Environments (TEEs) [26, 48] and custom TEEs [24, 27, 39, 52] have been proposed on the privilege-based systems by introducing additional minimal Trusted Computing Base (TCB). Still, the trusted software and security hardware components included in TCB depend on another privilege-based security mechanism.
Security vulnerabilities in the design of privilege-based systems. Unfortunately, despite these advances in the privilege-based systems, the existing attacks on OS kernel that exploit vulnerabilities in the privileged software’s code and shared hardware to leak secrets are still effective on the TCB of the TEE or custom TEE with some variations. For example, a Return-oriented Programming (ROP) attack [23, 40], one of the attacks that work effectively on both OS and TEE if the attacker finds any buffer overflow vulnerability and necessary gadgets among the code of the privileged software or the application executed on TEE. In addition, hardware sidechannel-based attacks [20, 21, 41, 42], which exploit vulnerabilities in microarchitectural hardware resulting from design flaws or interference within shared hardware between different privileges, are also still effective not only on the OS kernel, but also on the TEEs and many custom TEEs.
Interestingly, the existing privilege-based system, including the OS kernel and TEE, has a generalized design principle for implementing its security mechanism. They have TCB consisting of trusted software, hardware security primitives, and dedicated microarchitectural components. Then, they utilize the trusted software that has the highest (or special) privilege to isolate the TCB from the other vulnerable software. However, this generalized design still faces difficulties in that the privileged software included in the TCB must be free from any vulnerabilities that software-based attacks can exploit. In addition, the hardware security primitives and the dedicated microarchitectural components included in the TCB must be safe from hardware side-channel attacks across different privileges.
Drawbacks of prior related works. Prior works [30, 31, 33] for instruction tracing security monitoring hardware get the instructions executed by the monitored software and their corresponding microarchitectural information to detect malicious actions or accelerate security features for applications running in privilege-based systems. However, all the prior security monitoring hardware is directly controlled by the privileged software (i.e., OS kernel), same with the design of the security mechanism for the privilege-based system, to support the software-programmability on security monitoring rule utilized by the monitoring hardware. Also, some prior works [30, 31] do not include the cache side-channel attacks in their threat models. Hence, the prior works are also vulnerable to existing attacks on the privileged software, and the prior works cannot be utilized for protecting the TEE against OS-level attackers.
Moreover, due to their hardware implementation to support the software-programmability on the security monitoring, some prior works [30, 31] cannot monitor the software running on the main core simultaneously, so it is difficult to detect attacks before attacks are carried out. Also, another work [33] is unable to generate comprehensive security monitoring rules by referring to various microarchitectural resources. To sum up, these design choices of prior works for the instruction tracing security monitoring hardware cannot leverage the benefits of security monitoring hardware at the microarchitecture level, which can safely and efficiently detect attacks across different privileges.
Interstellar. To efficiently protect the privilege-based systems against existing attacks on privileged software, this paper presents Interstellar, which is fully partitioned, parallelized, and simultaneously detecting instruction tracing security monitoring hardware. Interstellar addresses the limitations of prior related works [30, 31, 33] and leverages the benefits that security monitoring hardware can achieve at the microarchitecture level.
In particular, Interstellar introduces a design of fully partitioned security monitoring hardware, which is not accessible from any software and is separated from the CPU’s main core and cache, to safely detect and block the existing attacks on privileged software. In addition, Interstellar monitors every fetched instruction from software running on a CPU main core in parallel using multiple attack detection rules. Interstellar utilizes Finite-state Machines (FSMs) to efficiently implement multiple comprehensive attack detection rules, which refer to the executed instructions and the various corresponding information at the microarchitecture level for the attack detection. Lastly, Interstellar can simultaneously detect attacks performed on the main core before the attacks are carried out (i.e. before the instructions for the attacks are committed). Notably, since Interstellar can be optimized at the microarchitecture level when designed for each attack detection case, Interstellar can block the detected attacks in a timely manner without stalling the main core’s pipeline.
Furthermore, to enable bug-free simultaneous detection on Interstellar coupled with the pipeline core of the Rocket chip [15], we address the following challenges. First, FSM of Interstellar must be designed considering the instruction squashing situation resulting from the branch misprediction or hardware interrupt before the monitored instructions are committed. We address this problem by introducing an instruction squashing handler for the correct recovery of the FSM’s state from the invalidation situation. Second, to avoid stalling the progress of the main core, Interstellar must determine attack detection results by referring to microarchitectural information within at most three CPU clock cycles between the fetch stage and commit stage of the Rocket chip’s main core. To solve this challenge, we optimize the design of the FSM to directly refer to the required microarchitectural information in each pipeline stage for each attack detection use case. In addition, we design the determination logic in the FSM to complete the detection of the attack within one CPU clock cycle.
To evaluate Interstellar with three attack monitoring use-cases implemented in parallel, we implement a prototype of Interstellar in a RISC-V Rocket chip using Chisel [17], a hardware description language. We evaluated the Interstellar-enabled Rocket chip on the AMD Vertex 7 FPGA VC707 board [6] with TEE-enabled Linux kernel and on AWS EC2 F1 utilizing Firesim [8] with RISC-V Linux kernel. Also, we verify the functionalities of Interstellar and analyze that Interstellar is safe from existing and possible threats. Our FPGAbased evaluation demonstrates that the performance overhead of Interstellar is negligible on average when executing three benchmarks [9–11]. Meanwhile, the area overhead and relative power consumption of Interstellar with all three attack detection rules compared to the Rocket core are 21.72% and 34.10%, respectively.
Our contribution. In summary, these are our contributions:
- To protect privilege-based systems from existing attacks on privileged software, we present Interstellar, a safe security monitoring hardware that can be utilized for protecting OS kernel and TEE, by introducing fully partitioned monitoring hardware.
- To achieve both the efficient parallel monitoring with multiple attack detection rules and the simultaneous detection for blocking the attacks in a timely manner, Interstellar utilizes FSMs to define and optimize each attack detection rule in hardware logic, considering the behavior of pipelined Rocket core.
- We evaluate a prototype of Interstellar implemented in Rocket chip with TEE-enabled Linux kernel and RISC-V Linux kernel on two different FPGA boards by running three different benchmarks, and the performance overhead of 50MHz Rocket core for three different benchmarks is 0.102%, on average.
To read the full article, click here
Related Semiconductor IP
- Motorola MC6845 Functional Equivalent CRT Controller
- Display Controller – Ultra HD LCD / OLED Panels (AXI4/AXI Bus)
- Display Controller – LCD / OLED Panels (Avalon Bus)
- High-Performance Memory Expansion IP for AI Accelerators
- General use, integer-N 4GHz Hybrid Phase Locked Loop on TSMC 28HPC
Related White Papers
- SoC Test and Verification -> Assertions speed processor core verification
- MPEG-4 is accelerated and footprint reduced by use of a configurable processor core
- IP Processor Core Platform Selection According to SoC Architecture: a case study
- Choosing between dual and single core media processor configurations in embedded multimedia designs