Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU
By Huanzhi Pu 1, Rishabh Ravi 2, Shinnung Jeong 1, Udit Subramanya 1, Euijun Chung 1, Jisheng Zhao 1, Chihyo Ahn 1, Hyesoon Kim 1
1 College of Computing, Georgia Institute of Technology, Atlanta, USA
2 Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
Abstract
RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model. However, modern GPU programming increasingly relies on warp-level features, which diverge from the conventional SPMD paradigm. In this paper, we explore how RISC-V GPUs can support these warp-level features both through hardware implementation and via software-only approaches. Our evaluation shows that a hardware implementation achieves up to 4 times geomean IPC speedup in microbenchmarks, while software-based solutions provide a viable alternative for area-constrained scenarios.
Index Terms:
GPU, Warp-level features, Microarchitecture, Code Optimization
I. Introduction
In recent years, GPU programming models have expanded the scope of GPU programming by enabling fine-grained parallelism through warp-level features. This expansion allows GPU to operate within the SPMD programming model while diverging from the conventional SPMD paradigm by fine-grained thread control. In particular, CUDA, one of the most widely used GPU programming models, introduces warp-level features such as cooperative groups and warp-level functions to facilitate fine-grained thread control and synchronization. These warp-level features provide abstractions that go beyond fixed-size granularity and synchronization, allowing for more complex code and reducing the need for block-level synchronization barriers
Therefore, supporting warp-level features in RISC-V GPUs can provide opportunities by enhancing their generality and applicability. While RISC-V GPUs, such as the Vortex RISC-V GPU, present a promising path for supporting GPU applications with publicly available software and hardware stacks and offer high reconfigurability for diverse GPU hardware features, they lack support for recent high-level features such as warp-level functionality. As a result, they miss the opportunity to explore higher performance by leveraging their reconfigurable hardware features, such as the number of threads and warps, in combination with warp-level features.
To efficiently support warp-level features, we explore implementation methods on Vortex GPUs by considering both software-only and hardware-supported approaches with their own trade-offs. While hardware extensions can offer performance benefits, they also incur additional area costs. Conversely, a software-only approach avoids hardware overhead but requires additional instruction overhead and compiler support. To enable both approaches, we implement these features in Vortex RTL and extend compiler support.
Through a comparative analysis of the two implementation approaches, this paper evaluates how these approaches support warp-level features and adapt to Vortex GPU architectures. The results show that warp-level feature support requires minimal hardware overhead with only a 2% increase in logic area, while the performance difference between software and hardware implementations can be as much as 4× in microbenchmarks.
To read the full article, click here
Related Semiconductor IP
Related White Papers
- IoT security: hardware vs software
- ISA optimizations for hardware and software harmony: Custom instructions and RISC-V extensions
- The Growing Imperative Of Hardware Security Assurance In IP And SoC Design
- A RISC-V Multicore and GPU SoC Platform with a Qualifiable Software Stack for Safety Critical Systems
Latest White Papers
- How to design secure SoCs, Part II: Key Management
- Seven Key Advantages of Implementing eFPGA with Soft IP vs. Hard IP
- Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU
- Data Movement Is the Energy Bottleneck of Today’s SoCs
- Breaking Barriers in SoC Design with Smart NoC Automation