Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

By Ruimin Shi ¹, Maya Gokhale ², Pei-Hung Lin ², Xavier Teruel ³, and Ivy Peng ¹
¹ KTH Royal Institute of Technology, Sweden
² Lawrence Livermore National Laboratory, USA
³ Barcelona Supercomputing Center, Spain

Abstract

The RISC-V Vector Extension (RVV) is a cornerstone for supporting compute throughout in scientific and machine learning workloads. Yet compiler support and performance monitoring on real RVV 1.0 hardware are still evolving. In this work, we design a suite of assembly microbenchmarks to establish performance ceilings and calibrate performance counters on RVV hardware. Leveraging the assembly bench marks, we find that predication overhead and stride load pose performance challenges that current compiler cost models do not yet fully address. Moreover, we present the first evaluation of GCC 15 and LLVM 21 autovectorization in HPC and ML proxy applications. GCC 15 outper forms LLVM 21 in four out of six applications. LLVM 21 only outperforms GCC 15 in SGEMM and DGEMM, driven by more aggressive instruction reduction confirmed through validated perf counters on the RVV hardware. We further show that the default LMUL selection in compilers performs close to the optimal. To study the RVV support for product-level application, we also evaluate the state-vector quantum sim ulator, Google’s Qsim, with both manual RVV intrinsics and compiler auto-vectorization, revealing immaturity in current RVV compiler for complicated memory access pattern.

To read the full article, click here

CPU IP Selector

Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

Abstract

Related Semiconductor IP

Related Articles

Latest Articles

Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors

Abstract

Subscribe to the Semi IP Hub Newsletter

Related Semiconductor IP

Related Articles

Latest Articles