Learning Cache Coherence Traffic for NoC Routing Design
By Guochu Xiong, Xiangzhong Luo and Weichen Liu (Nanyang Technological University, Singapore)
The rapid growth of multi-core systems highlights the need for efficient Network-on-Chip (NoC) design to ensure seamless communication. Cache coherence, essential for data consistency, substantially reduces task computation time by enabling data sharing among caches. As a result, routing serves two roles: facilitating data sharing (influenced by topology) and managing NoC-level communication. However, cache coherence is often overlooked in routing, causing mismatches between design expectations and evaluation outcomes. Two main challenges are the lack of specialized tools to assess cache coherence's impact and the neglect of topology selection in routing. In this work, we propose a cache coherence-aware routing approach with integrated topology selection, guided by our Cache Coherence Traffic Analyzer (CCTA). Our method achieves up to 10.52% lower packet latency, 55.51% faster execution time, and 49.02% total energy savings, underscoring the critical role of cache coherence in NoC design and enabling effective co-design.
To read the full article, click here
Related Semiconductor IP
- Network-on-Chip (NoC)
- NoC Verification IP
- Smart Network-on-Chip (NoC) IP
- NoC System IP
- Cloud-active NOC configuration tool for generating and simulating Coherent and Non-Coherent NoCs
Related Articles
- The SoC design: What’s next for NoCs?
- Leveraging OCP for Cache Coherent Traffic Within an Embedded Multi-core Cluster
- Achieving cache coherence in a MIPS32 multicore design
- Using OCP and Coherence Extensions to Support System-Level Cache Coherence
Latest Articles
- RISC-V Functional Safety for Autonomous Automotive Systems: An Analytical Framework and Research Roadmap for ML-Assisted Certification
- Emulation-based System-on-Chip Security Verification: Challenges and Opportunities
- A 129FPS Full HD Real-Time Accelerator for 3D Gaussian Splatting
- SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
- TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks