Memory Prefetching Evaluation of Scientific Applications on a Modern HPC Arm-Based Processor
By Nam Ho 1; Carlos Falquez 1; Antoni Portero 1; Estela Suarez 1,2,3; Dirk Pleiter 4,5
1 Jülich Supercomputing Centre, Institute for Advanced Simulation, Forschungszentrum Jülich GmbH, 52428 Jülich, Germany
2 Computer Science Department, Rhenish Friedrich Wilhelm University of Bonn, 53113 Bonn, Germany
3 SiPearl, 78600 Maisons-Laffitte, France
4 Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden
5 Department of Computer Science, Bernoulli Institute, FSE, University of Groningen, 9712 CP Groningen, The Netherlands
Abstract
Memory prefetching is a well-known technique for mitigating the negative impact of memory access latencies on memory bandwidth. This problem has become more pressing as improvements in memory bandwidth have not kept pace with increases in computational power. While much existing work has been devoted to finding appropriate prefetching techniques for specific workloads, few provide insight into the behavior of scientific applications to better understand the impact of prefetchers. This paper investigates the impact of hardware prefetchers on the latest Arm-based high-end processor architectures. In this work, we investigate memory access patterns by analyzing locality properties and visualizing delta and repetitive address patterns. A deeper understanding of memory access patterns allows the use of the appropriate prefetcher and reaching a better correlation between access pattern properties and prefetcher performance. This can guide future co-design efforts. We evaluated traditional and innovative prefetchers using a gem5-based model of Arm Neoverse V1 cores. The model features a 16-core architecture, using Amazon’s Graviton 3 processor as a hardware reference, but substituting DDR5 by high bandwidth memory (HBM2). We performed a detailed prefetching evaluation focusing on stencil, sparse matrix-vector multiplication, and Breadth-First Search kernels. These kernels represent a broad range of the applications running on today’s High-Performance Computing (HPC) systems, which are sensitive to memory performance.
To read the full article, click here
Related Semiconductor IP
- CAN XL Verification IP
- Rad-Hard GPIO, ODIO & LVDS in SkyWater 90nm
- 1.22V/1uA Reference voltage and current source
- 1.2V SLVS Transceiver in UMC 110nm
- Neuromorphic Processor IP
Related White Papers
- Paving the way for the next generation of audio codec for True Wireless Stereo (TWS) applications - PART 5 : Cutting time to market in a safe and timely manner
- Interstellar: Fully Partitioned and Efficient Security Monitoring Hardware Near a Processor Core for Protecting Systems against Attacks on Privileged Software
- A Survey on the Design, Detection, and Prevention of Pre-Silicon Hardware Trojans
- Memory Safety Features Impact on Ibex based processor area
Latest White Papers
- OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis Designs
- Balancing Power and Performance With Task Dependencies in Multi-Core Systems
- LLM Inference with Codebook-based Q4X Quantization using the Llama.cpp Framework on RISC-V Vector CPUs
- PCIe 5.0: The universal high-speed interconnect for High Bandwidth and Low Latency Applications Design Challenges & Solutions
- Basilisk: A 34 mm2 End-to-End Open-Source 64-bit Linux-Capable RISC-V SoC in 130nm BiCMOS