Achieving cache coherence in a MIPS32 multicore design
Embedded.com (08/17/08, 12:00:00 PM EDT)
Historically, memory coherence in multiprocessor systems was often achieved through bus "snooping," where each core was connected to a common multitier bus and was able to snoop on memory-access traffic of processor peers to regulate the coherence status of individual cache lines. For that, each core maintained the coherence status of L1 cache lines locally and posted status changes to peers via the common bus.
The increasing size and complexity of the system-on-a-chip (SoC) led to restructuring of the multitier-bus philosophy in favor of localized point-to-point connections with centralized traffic routing. This configuration enabled speed and power improvements on now localized bus segments due to reduced load and segment length. Also, bus-contention problems eased, and throughput increased for the localized data exchange. In response to this trend in system architecture, the Open Core Protocol (OCP) standard emerged to consolidate this design philosophy. Further, emergence of IP-provider business models catalyzed the standardization of IP interconnect and design methods to facilitate design reuse centered on an open standard.
To read the full article, click here
Related Semiconductor IP
- UCIe D2D Adapter & PHY Integrated IP
- Low Dropout (LDO) Regulator
- 16-Bit xSPI PSRAM PHY
- MIPI CSI-2 CSE2 Security Module
- ASIL B Compliant MIPI CSI-2 CSE2 Security Module
Related Articles
- Multi-core multi-threaded SoCs pose debugging hurdles
- Realising the Full Potential of Multi-core Designs
- Realising the Full Potential of Multi-core Designs
- Bringing Order to Multi-Core Processor Chaos
Latest Articles
- RISC-V Functional Safety for Autonomous Automotive Systems: An Analytical Framework and Research Roadmap for ML-Assisted Certification
- Emulation-based System-on-Chip Security Verification: Challenges and Opportunities
- A 129FPS Full HD Real-Time Accelerator for 3D Gaussian Splatting
- SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
- TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks