Providing memory system and compiler support for MPSoc designs: Customization of memory architectures (Part 2)
Embedded.com (01/06/09, 02:17:00 PM EST)
To follow on the review and assessment of various memory architectures in Part 1 in this series, we will now survey some research efforts that address the exploration space involving on-chip memories. A number of distinct memory architectures could be devised to exploit different application-specific memory access patterns efficiently.
Even if we restrict the scope of the architecture to those involving on-chip memory only, the exploration space of different possible configurations is too large, making it infeasible to simulate exhaustively the performance and energy characteristics of the application for each configuration. Thus, exploration tools are necessary for rapidly evaluating the impact of several candidate architectures. Such tools can be of great utility to a system designer by giving fast initial feedback on a wide range of memory architectures.
Cache
Two of the most important aspects of data caches that can be customized for an application are: (1) the cache line size and (2) the cache size. The customization of cache line size for an application is performed in the study just referemced link above by using an estimation technique for predicting the memory access performance, that is, the total number of processor cycles required for all the memory accesses in the application.
There is a tradeoff in sizing the cache line. If the memory accesses are very regular and consecutive, i.e., exhibit spatial locality, a longer cache line is desirable, since it minimizes the number of off-chip accesses and exploits the locality by prefetching elements that will be needed in the immediate future.
On the other hand, if the memory accesses are irregular, or have large strides, a shorter cache line is desirable, as this reduces off-chip memory traffic by not bringing unnecessary data into the cache. The maximum size of a cache line is the DRAM page size.
The estimation technique uses data reuse analysis to predict the total number of cache hits and misses inside loop nests so that spatial locality is incorporated into the estimation. An estimate of the impact of conflict misses is also incorporated. The estimation is carried out for the different candidate line sizes, and the best line size is selected for the cache.
Related Semiconductor IP
- RISC-V CPU IP
- AES GCM IP Core
- High Speed Ethernet Quad 10G to 100G PCS
- High Speed Ethernet Gen-2 Quad 100G PCS IP
- High Speed Ethernet 4/2/1-Lane 100G PCS
Related White Papers
- A Performance Architecture Exploration and Analysis Platform for Memory Sub-systems
- Reconfiguring Design -> C-based architecture assembly supports custom design
- High-Performance DSPs -> Processor boards: Architecture drives performance
- Opto-electronics -> High-density fiber-optic modules eye next-gen switching architecture
Latest White Papers
- New Realities Demand a New Approach to System Verification and Validation
- How silicon and circuit optimizations help FPGAs offer lower size, power and cost in video bridging applications
- Sustainable Hardware Specialization
- PCIe IP With Enhanced Security For The Automotive Market
- Top 5 Reasons why CPU is the Best Processor for AI Inference