Scalable I/O Virtualization: A Deep Dive into PCIe’s Next Gen Virtualization
The demands of modern cloud computing—massive scale, constant agility, and tight security—are pushing traditional I/O virtualization to its limits. While SR-IOV (Single Root I/O Virtualization) was a foundational technology, it wasn't built for the high-density, multi-tenant environments common today.
To meet this challenge, the PCIe specification has evolved with Scalable I/O Virtualization (SIOV), a new architecture designed for the hyperscale era.
The SIOV architecture introduces a software-centric approach to I/O virtualization, designed for hyperscale cloud environments. At its core, the Host OS manages resource allocation and abstracts hardware complexities from virtual interfaces. The hypervisor sits atop, acting as a Virtual Machine Monitor (VMM) and I/O Manager, dynamically mapping resources and orchestrating the creation and lifecycle of lightweight Scalable Device Interfaces (SDIs). Guest OSs within Virtual Machines (VMs) then interact with these virtualized interfaces, enabling multiple Application Layers to efficiently share physical hardware resources with enhanced isolation and scalability.
This blog dives into SIOV's architecture, how it works alongside SR-IOV, and critically, how to verify it for building secure and scalable cloud platforms.
SR-IOV's Scaling Challenges in the Modern Cloud
SR-IOV (Single Root I/O Virtualization) was a foundational technology for giving virtual machines direct hardware access, but its limited scalability in terms of exposing selected VFs and hardware-centric requires its own dedicated configuration space, struggles to meet the demands of handling today's high-density multi-tenant systems.
Introducing SIOV: The Software-Centric Model
Challenges of SR-IOV gave way to Scalable I/O Virtualization (SIOV) that addresses these issues by removing per-interface hardware complexity and shifting the control to virtualization software. It :
- Thousands of lightweight virtual interfaces
- Better fit for containers and microservices
- Seamless integration with trusted computing (TEE Device Interface Security Protocol)
The key element of SIOV is the Scalable Device Interface (SDI), a "lightweight" and composable virtual interface designed for high-density environments. SDIs do not meet the definition of “function,” though they do have a Routing ID and that RID is used for all traffic initiated by the SDI.
SDI has no configuration space or Bus/Device/Function (BDF) identifier. Instead, it is defined by a unique Routing ID (RID). This is stripping away rigid hardware requirements.
Unlike a traditional Virtual Function, an SDI operates under a new set of principles: Simpler hardware, software-defined control, and flexible, scalable resources. A single device can now support thousands of SDIs, perfect for high-density and cloud computing environments.
SIOV Extended Capability
A device advertises its SIOV capabilities through a standard SIOV extended capability structure (ID 0x0038) in its Physical Function (PF). This structure tells the software:
- Total SDIs supported
- Stride or dynamic RID mapping
- First SDI offset and stride
- Mixed IOV support flag
- SIOV-enabled bit
Why SDI Reset Matters and How it Works
SDI reset mechanism is essential when software crashes or a new SDI instance needs to be rebuilt with the new configuration. It ensures SDI instances return to a clean, known state, which supports secure isolation, dynamic partitioning, and reliable fault recovery.
During SDI reset, all operations for that SDI are aborted, and its configuration is restored to a known state. SDI states are either architected or un-architected. Architected SDI state restores the architected state to its previous configuration or reconfigures to a new configuration that has new memory space allocations, new SDI with a new RID, and reprogrammed configuration registers. Un-architected SDI state doesn’t have to be reconfigured. This state persists across all SDI instantiations without the need for software.
Mixed IOV Challenges: Device Supporting SRIOV and SIOV
Some PCIe devices may advertise support for both SR-IOV and SIOV simultaneously. This capability is indicated by the Mixed_IOV_Supported bit in the SIOV Extended Capability structure.
When set, it permits the concurrent instantiation of Virtual Functions (VFs) and Scalable Device Interfaces (SDIs). In such configurations, the hypervisor or virtualization intermediary must maintain distinct and non-overlapping RID namespaces for VFs (which are BDF-addressable PCIe Functions) and SDIs (which are non-enumerable but RID-identified endpoints).
This dual-mode operation increases complexity, as the virtualization software must manage two different types of virtual interfaces, keep their resources separate, and ensure both legacy and modern guest drivers can use the device correctly.
RID and BDF Assignment in Mixed SR-IOV/SIOV Devices
Interface |
Has BDF? |
Uses RID? |
Visible to PCIe Scan? |
PF |
Yes |
Yes |
Yes |
VF |
Yes |
Yes |
Yes |
SDI |
No |
Yes |
No |
Virtual BAR Mapping
- SDIs use “page-aligned regions” of PF's BAR
- Software (SR-PCIM or VI) manages mapping using system page tables
- Shared pages across Multiple SDIs, but designers properly isolate the PF BAR region among SDIs.
The Central Role of the Routing ID (RID)
SDIs do not have BDFs (Bus:Device: Function), so they don’t show up during traditional PCIe enumeration. The PF and software stack know exactly which RID was assigned to which SDI, and which tenant owns it. A unique RID is present in every PCIe Transaction Layer Packet (TLP). When the SDI initiates a transaction (like a DMA write), the PF is responsible for tagging the outgoing TLP with the correct RID.
Common Question: How TLPs from SDIs Are Identified Without Enumeration?
- All PCIe TLPs carry a Requester ID (RID).
- Since SDIs have no BDF, RIDs are assigned by the PF or managed by Standard Device drivers or Virtualization software.
- The PF tags SDI-generated TLPs with correct RIDs.
- IOMMU, ATS, and interrupts use the RID to:
- Authorize or block DMA based on Configured Address Space: The IOMMU checks the RID against its page tables to ensure an SDI can only access memory belonging to its assigned tenant.
- Validate access: The Root Complex can use the RID to enforce security policies. If an SDI somehow sends a TLP with an invalid or forged RID, it will be blocked, as there will be no corresponding entry in the IOMMU or interrupt tables.
- Route interrupts correctly: The interrupt controller uses the RID to deliver an SDI's interrupt message to the correct guest OS.
- TDISP: SDIs in a LOCKED state cannot access memory or BAR pages.
Securing SIOV: Ensuring Robust Cloud Virtualization
The flexibility and power of SIOV come with a critical responsibility: rigorous verification is paramount to building a secure, multi-tenant cloud environment. Given SIOV's reliance on precise interaction between hardware and virtualization software, thorough validation is fundamental.
Here’s a guide to key verification areas for SIOV:
- RID isolation and uniqueness: All SDIs must have unique, valid RIDs. No overlap with PF, VFs. Protocol checkers must monitor Requester ID fields in DMA, MMIO, ATS, PRI, and MSI TLPs.
- Page-level BAR security: SDIs access device memory via page-aligned regions within the PF's Base Address Registers (BARs), with mappings managed by the hypervisor. Verification must confirm that these memory mappings are correctly page-aligned and securely isolated.
- Interrupt integrity: SDIs are designed to generate only MSI/MSI-X interrupts. Each interrupt Transaction Layer Packet (TLP) must reliably carry the correct source of RID, ensuring interrupts are delivered accurately to the intended guest OS.
- SDI reset handling: Reset one SDI while others are active. Ensure that in-flight DMA is aborted, and the state is cleared without affecting any other active SDIs. Tearing down one tenant’s virtual device in a multi-tenant cloud without impact on others. Checking on the architected state, restoring or reconfiguring SDI while the un-architected state persists across resets.
- PF level reset: On PF FLR, all subordinate SDIs must be invalidated and removed based on the total SDI count advertised by PF.
Conclusion
SIOV is a software-defined architecture built for cloud-scale virtualization. It overcomes the limitations of SR-IOV, enabling thousands of lightweight, isolated interfaces. SIOV reduces hardware complexities while empowering more flexible and scalable software-driven management. Flaws in lifecycle management could lead to resource leaks, system instability, or compromise the security of the entire server. Device vendors and architects must ensure rigorous verification: RID isolation, secure memory access, and clean resets are essential to delivering secure and scalable PCIe solutions for the future.
Related Semiconductor IP
- PCI Express PHY
- Multi-Channel Flex DMA IP Core for PCI Express
- PCIe - PCI Express Controller
- PCI Express PIPE PHY Transceiver
- Scalable Switch Intel® FPGA IP for PCI Express
Related Blogs
- Improving I/O Performance and Reducing Costs with Single Root I/O Virtualization (SR-IOV)
- Unraveling the PCIe ECN Unordered IO (UIO) Feature
- Do you have the right 'connection'?
- Intel’s Atom-based Tunnel Creek SOC with integrated PCIe interface opens new era for embedded developers
Latest Blogs
- What It Will Take to Build a Resilient Automotive Compute Ecosystem
- The Blind Spot of Semiconductor IP Sales
- Scalable I/O Virtualization: A Deep Dive into PCIe’s Next Gen Virtualization
- UEC-LLR: The Future of Loss Recovery in Ethernet for AI and HPC
- Trust at the Core: A Deep Dive into Hardware Root of Trust (HRoT)