Better Benchmarks through Compiler Optimizations: Codasip Jump Threading
By Codasip
September 9, 2019
The architectural efficiency of embedded processor IP is measured by a small set of industry standard benchmarks, that even though often bear little correlation to real workloads, continue to persist. The most popular benchmarks are Dhrystone and CoreMark.
An interesting observation regarding these test suites is that the performance numbers continue to improve for a given architecture, even when the architecture itself remains unchanged. The reason for this improvement is the focus on continuous compiler optimizations intended to improve the performance of a given benchmark.
The RISC-V community makes extensive use of open source compiler technologies. The most widely used C/C++ compilers today are GCC by the GNU Project and Clang by the LLVM project.
Each compiler comes with a set of advantages and disadvantages, and most users of RISC-V today employ the GNU toolchain. However, the Codasip C/C++ compiler is based on LLVM. LLVM is an umbrella project that hosts a set of related low-level toolchain components (assemblers, compilers, debuggers, etc.). LLVM and its C/C++ frontend, Clang, provide a number of benefits over GCC, specifically faster compilation and lower memory usage, expressive diagnostics, and modular library-based architecture that allows easy customization and addition of custom extensions in the form of new architectures, instructions, and optimizations.
However, one of the stronger points of GCC is that its jump threading pass is more powerful than the same pass in LLVM, which also has difficulties in threading jumps used in CoreMark benchmarking. To mitigate it and improve our own LLVM solution, Codasip developed an innovative implementation of jump threading that helped us achieve significantly faster code and better CoreMark results.
These techniques are described in detail in our latest whitepaper.
Related Semiconductor IP
- Process/Voltage/Temperature Sensor with Self-calibration (Supply voltage 1.2V) - TSMC 3nm N3P
- USB 20Gbps Device Controller
- SM4 Cipher Engine
- Ultra-High-Speed Time-Interleaved 7-bit 64GSPS ADC on 3nm
- Fault Tolerant DDR2/DDR3/DDR4 Memory controller
Related White Papers
- SoC Test and Verification -> Leveraging memory for better fault tolerance
- When Dis-integration is a better solution
- SOC isn't cutting it yet. Is multi-chip package a better answer today?
- SoCs: Design tools -> Design flow is key to crafting a better SoC
Latest White Papers
- Fault Injection in On-Chip Interconnects: A Comparative Study of Wishbone, AXI-Lite, and AXI
- eFPGA – Hidden Engine of Tomorrow’s High-Frequency Trading Systems
- aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on RawAudio
- Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference
- Hardware Acceleration of Kolmogorov-Arnold Network (KAN) in Large-Scale Systems