Learning Cache Coherence Traffic for NoC Routing Design
By Guochu Xiong, Xiangzhong Luo and Weichen Liu (Nanyang Technological University, Singapore)
The rapid growth of multi-core systems highlights the need for efficient Network-on-Chip (NoC) design to ensure seamless communication. Cache coherence, essential for data consistency, substantially reduces task computation time by enabling data sharing among caches. As a result, routing serves two roles: facilitating data sharing (influenced by topology) and managing NoC-level communication. However, cache coherence is often overlooked in routing, causing mismatches between design expectations and evaluation outcomes. Two main challenges are the lack of specialized tools to assess cache coherence's impact and the neglect of topology selection in routing. In this work, we propose a cache coherence-aware routing approach with integrated topology selection, guided by our Cache Coherence Traffic Analyzer (CCTA). Our method achieves up to 10.52% lower packet latency, 55.51% faster execution time, and 49.02% total energy savings, underscoring the critical role of cache coherence in NoC design and enabling effective co-design.
To read the full article, click here
Related Semiconductor IP
- Network-on-Chip (NoC)
- NoC Verification IP
- FlexGen Smart Network-on-Chip (NoC) IP
- NoC System IP
- Cloud-active NOC configuration tool for generating and simulating Coherent and Non-Coherent NoCs
Related Articles
- Leveraging OCP for Cache Coherent Traffic Within an Embedded Multi-core Cluster
- Achieving cache coherence in a MIPS32 multicore design
- Using OCP and Coherence Extensions to Support System-Level Cache Coherence
- SoC design: When is a network-on-chip (NoC) not enough?
Latest Articles
- ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs
- ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training
- OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs
- CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees
- CXL-ClusterSim: Modeling CXL-based Disaggregated Memory Cluster for Pooling and Sharing using gem5 and SST