Architecture-oriented C optimization, part 2: Memory and more
Here's how to optimize C to account for memory alignment, cache features, endianness, and application specific instructions.
By Eran Belaish, CEVA
dspdesignline.com (September 03, 2008)
Memory related guidelines
Alignment considerations
Architectures may allow or disallow unaligned memory access. While no special guidelines are required when unaligned memory access is allowed, if disallowed, the programmer must be careful. Ignoring alignment considerations causes severe performance issues and even malfunctions. To avoid malfunctions, all memory accesses need to be executed with the proper alignment. To improve performance, the compiler needs to be aware of the alignment of pointers and arrays in the program. Optimizing compilers normally track pointer arithmetic to identify alignment at each stage of the code in order to apply SIMD (Single Instruction Multiple Data) memory accesses and maintain correctness. In some cases the compiler can tell that a pointer alignment allows memory access optimization (for example, when a pointer to a 16-bit variable is aligned to 32 bits) and then SIMD memory operations are emitted. In other cases, the pointers are not aligned. Then the only option is to make them aligned by copying them to aligned buffers or by using the linker.
In most cases, the compiler simply cannot tell the alignment. It therefore assumes the worst case scenario and avoids memory access optimization as a consequence. To overcome this lack of information, advanced compilers offer a user interface for specifying the alignment of a given pointer. The compiler then uses this information when considering memory access optimization for the pointer. For loops with excessive memory accesses (such as copy loops), this feature allows two and even four times acceleration.
By Eran Belaish, CEVA
dspdesignline.com (September 03, 2008)
Memory related guidelines
Alignment considerations
Architectures may allow or disallow unaligned memory access. While no special guidelines are required when unaligned memory access is allowed, if disallowed, the programmer must be careful. Ignoring alignment considerations causes severe performance issues and even malfunctions. To avoid malfunctions, all memory accesses need to be executed with the proper alignment. To improve performance, the compiler needs to be aware of the alignment of pointers and arrays in the program. Optimizing compilers normally track pointer arithmetic to identify alignment at each stage of the code in order to apply SIMD (Single Instruction Multiple Data) memory accesses and maintain correctness. In some cases the compiler can tell that a pointer alignment allows memory access optimization (for example, when a pointer to a 16-bit variable is aligned to 32 bits) and then SIMD memory operations are emitted. In other cases, the pointers are not aligned. Then the only option is to make them aligned by copying them to aligned buffers or by using the linker.
In most cases, the compiler simply cannot tell the alignment. It therefore assumes the worst case scenario and avoids memory access optimization as a consequence. To overcome this lack of information, advanced compilers offer a user interface for specifying the alignment of a given pointer. The compiler then uses this information when considering memory access optimization for the pointer. For loops with excessive memory accesses (such as copy loops), this feature allows two and even four times acceleration.
To read the full article, click here
Related Semiconductor IP
- ReRAM NVM in DB HiTek 130nm BCD
- UFS 5.0 Host Controller IP
- PDM Receiver/PDM-to-PCM Converter
- Voltage and Temperature Sensor with integrated ADC - GlobalFoundries® 22FDX®
- 8MHz / 40MHz Pierce Oscillator - X-FAB XT018-0.18µm
Related Articles
- Architecture Oriented C Optimizations
- A Multi-Objective Optimization Model for Energy and Performance Aware Synthesis of NoC Architecture
- Architecture-oriented C optimization, part 1: DSP features
- Boot time optimization for automobile and consumer applications
Latest Articles
- An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks
- Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings
- A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback Latency
- SNAP-V: A RISC-V SoC with Configurable Neuromorphic Acceleration for Small-Scale Spiking Neural Networks
- An FPGA Implementation of Displacement Vector Search for Intra Pattern Copy in JPEG XS