Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU

By Huanzhi Pu 1, Rishabh Ravi 2, Shinnung Jeong 1, Udit Subramanya 1, Euijun Chung 1, Jisheng Zhao 1, Chihyo Ahn 1, Hyesoon Kim 1
1 College of Computing, Georgia Institute of Technology, Atlanta, USA
2 Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India 

Abstract

RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model. However, modern GPU programming increasingly relies on warp-level features, which diverge from the conventional SPMD paradigm. In this paper, we explore how RISC-V GPUs can support these warp-level features both through hardware implementation and via software-only approaches. Our evaluation shows that a hardware implementation achieves up to 4 times geomean IPC speedup in microbenchmarks, while software-based solutions provide a viable alternative for area-constrained scenarios.

Index Terms:

GPU, Warp-level features, Microarchitecture, Code Optimization

I. Introduction

In recent years, GPU programming models have expanded the scope of GPU programming by enabling fine-grained parallelism through warp-level features. This expansion allows GPU to operate within the SPMD programming model while diverging from the conventional SPMD paradigm by fine-grained thread control. In particular, CUDA, one of the most widely used GPU programming models, introduces warp-level features such as cooperative groups and warp-level functions to facilitate fine-grained thread control and synchronization. These warp-level features provide abstractions that go beyond fixed-size granularity and synchronization, allowing for more complex code and reducing the need for block-level synchronization barriers

Therefore, supporting warp-level features in RISC-V GPUs can provide opportunities by enhancing their generality and applicability. While RISC-V GPUs, such as the Vortex RISC-V GPU, present a promising path for supporting GPU applications with publicly available software and hardware stacks and offer high reconfigurability for diverse GPU hardware features, they lack support for recent high-level features such as warp-level functionality. As a result, they miss the opportunity to explore higher performance by leveraging their reconfigurable hardware features, such as the number of threads and warps, in combination with warp-level features.

To efficiently support warp-level features, we explore implementation methods on Vortex GPUs by considering both software-only and hardware-supported approaches with their own trade-offs. While hardware extensions can offer performance benefits, they also incur additional area costs. Conversely, a software-only approach avoids hardware overhead but requires additional instruction overhead and compiler support. To enable both approaches, we implement these features in Vortex RTL and extend compiler support.

Through a comparative analysis of the two implementation approaches, this paper evaluates how these approaches support warp-level features and adapt to Vortex GPU architectures. The results show that warp-level feature support requires minimal hardware overhead with only a 2% increase in logic area, while the performance difference between software and hardware implementations can be as much as 4× in microbenchmarks.

To read the full article, click here

×
Semiconductor IP