Using scheduled cache modeling to reduce memory latencies in multicore DSP designs
By Ofer Lent, Moshe Anschel, Erez Steinberg, Itay Peled and Amir Kleen (Freescale)
Embedded.com, (10/13/09, 08:28:00 PM EDT)
The most advanced high-end DSP cores in the market today are fully cache-based by concept while maintaining low latency when accessing higher memory hierarchies (L2/L3). Performance of cache-based DSP systems is highly affected by the cache hit ratio and by the miss penalty.
Hit ratio - the number of accesses that are "hit" in the cache divided by the total number of accesses ("hit" count + "miss" count) - depends on the application locality in time and place. Miss penalty - the number of cycles that the core waits for a "miss" to be served - depends on the physical location of data in the memory system at the time of a cache miss.
Traditional systems rely on the Direct Memory Access (DMA) model in which the DMA controller is used to move data to a memory closer to the core. This method is complicated and requires precise, restrictive scheduling to achieve coherency.
As an alternative, this article describes a new software model - and hardware mechanisms that support it - used in the Freescale SC3850 StarCore DSP subsystem residing in the MSC8156 multi-core DSP. Called the scheduled cache model,, it reduces the need for DMA programming and synchronization to achieve high core utilization.
The scheduled cache model relies on hardware mechanisms (some of which are controlled by software) to increase cache efficiency. Using these mechanisms can yield DMA-like performance while maintaining
To read the full article, click here
Related Semiconductor IP
- AFDX 1G Switch IP
- AFDX 1G End-System IP
- Simplified Integration USB PD Capable Type-C Sink IP
- eFPGA on GlobalFoundries GF12LPP
- MIPI C‑PHY/D‑PHY IP on TSMC N2P
Related Articles
- Embedded DSP Software Design Using Multicore a System-on-a-Chip (SoC) Architecture: Part 2
- Leveraging OCP for Cache Coherent Traffic Within an Embedded Multi-core Cluster
- Achieving cache coherence in a MIPS32 multicore design
- Taking a multicore DSP approach to medical ultrasound beamforming
Latest Articles
- Design and Development of a Neuromorphic Silicon Suite: PVT Sensing, Stochastic LIF Inference, On-Chip STDP Learning, and Crossbar Programming
- LLM4RTL: Tool-Assisted LLM for RTL Generation
- Towards Delta Aware Training: Efficient DNN Weight Storage for Resource-Constrained FPGAs
- CHERI-D: Secure and efficient inline object ID for CHERI temporal memory safety
- AIA: A 16nm Multicore SoC for Approximate Inference Acceleration Exploiting Non-normalized Knuth-Yao Sampling and Inter-Core Register Sharing