AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance
By Seungkwan Kang 1, Seungjun Lee 1, Donghyun Gouk 2, Miryeong Kwon 2, Hyunkyu Choi 2, Junhyeok Jang 2, Sangwon Lee 2, Huiwon Choi 1, Jie Zhang 3, Wonil Choi 4, Mahmut Taylan Kandemir 5, Myoungsoo Jung 1,2
1 Computer Architecture and Memory Systems Laboratory, KAIST,
2 Panmnesia, Inc.,
3 Peking University,
4 Hanyang University,
5 Pennsylvania State University

Abstract
Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce AutoGNN, an FPGA-based accelerator designed to address these challenges by leveraging FPGA’s reconfigurability and specialized components. AutoGNN adapts to diverse graph inputs, efficiently performing computationally intensive tasks such as graph conversion and sampling. By utilizing components like adder trees, AutoGNN executes reduction operations in constant time, overcoming the limitations of serialization and synchronization on GPUs.
AutoGNN integrates unified processing elements (UPEs) and single-cycle reducers (SCRs) to streamline GNN preprocessing. UPEs enable scalable parallel processing for edge sorting and unique vertex selection, while SCRs efficiently handle sequential tasks such as pointer array construction and subgraph reindexing. A user-level software framework dynamically profiles graph inputs, determines optimal configurations, and reprograms AutoGNN to handle varying workloads. Implemented on a 7nm enterprise FPGA, AutoGNN achieves up to 9.0× and 2.1× speedup compared to conventional and GPU-accelerated preprocessing systems, respectively, enabling high-performance GNN preprocessing across diverse datasets.
To read the full article, click here
Related Semiconductor IP
- ReRAM NVM in DB HiTek 130nm BCD
- UFS 5.0 Host Controller IP
- PDM Receiver/PDM-to-PCM Converter
- Voltage and Temperature Sensor with integrated ADC - GlobalFoundries® 22FDX®
- 8MHz / 40MHz Pierce Oscillator - X-FAB XT018-0.18µm
Related Articles
- Integrating Ethernet, PCIe, And UCIe For Enhanced Bandwidth And Scalability For AI/HPC Chips
- Boosting RISC-V SoC performance for AI and ML applications
- CANDoSA: A Hardware Performance Counter-Based Intrusion Detection System for DoS Attacks on Automotive CAN bus
- Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators
Latest Articles
- An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks
- Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings
- A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback Latency
- SNAP-V: A RISC-V SoC with Configurable Neuromorphic Acceleration for Small-Scale Spiking Neural Networks
- An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS