Hit performance goals with configurable processors
By Steven Leibson, Tensilica
October 17, 2006 -- dspdesignline.com
With mainstream DSPs, code must be hand-tuned using assembly code in order to meet performance goals. A more productive approach is to tailor the processor to the algorithm. This article explains why, using FFT, Viterbi, and MPEG4 examples.
For more than 30 years, the fixed-ISA (instruction set architecture) microprocessor has largely defined electronic system design. Similarly, single-chip DSPs have dominated DSP system design since they were introduced 20 years ago. Fixed-ISA processors have addressed application performance problems in two ways. The first approach has been to increase clock rate, mirroring the same trend in PC processors. If power dissipation is not important, rapidly escalating clock rates can go a long way towards force-fitting a particular processor to an application. The alternative approach for fixed-ISA processors is to provide more computational resources that operate in parallel so that the processor can perform more work per clock.
Designers of fixed-ISA microprocessors and DSPs attempt to develop architectures that are good at executing a wide range of algorithms, but not tailored to any specific application. This design approach can reduce clock-rate requirements, but it doesn't match the clock-rate or gate-usage efficiencies of a tailored solution. The design of complex portable and battery-powered processor-based systems increasingly calls for taking a different approach, one that more closely matches the processing resources to the application tasks so that very high clock rates (and correspondingly high power and heat dissipations) are not required. FPGA, ASIC and SOC technologies provide an ideal medium for tailoring processors to specific applications, using newly available configurable and extensible processor technology.
October 17, 2006 -- dspdesignline.com
With mainstream DSPs, code must be hand-tuned using assembly code in order to meet performance goals. A more productive approach is to tailor the processor to the algorithm. This article explains why, using FFT, Viterbi, and MPEG4 examples.
For more than 30 years, the fixed-ISA (instruction set architecture) microprocessor has largely defined electronic system design. Similarly, single-chip DSPs have dominated DSP system design since they were introduced 20 years ago. Fixed-ISA processors have addressed application performance problems in two ways. The first approach has been to increase clock rate, mirroring the same trend in PC processors. If power dissipation is not important, rapidly escalating clock rates can go a long way towards force-fitting a particular processor to an application. The alternative approach for fixed-ISA processors is to provide more computational resources that operate in parallel so that the processor can perform more work per clock.
Designers of fixed-ISA microprocessors and DSPs attempt to develop architectures that are good at executing a wide range of algorithms, but not tailored to any specific application. This design approach can reduce clock-rate requirements, but it doesn't match the clock-rate or gate-usage efficiencies of a tailored solution. The design of complex portable and battery-powered processor-based systems increasingly calls for taking a different approach, one that more closely matches the processing resources to the application tasks so that very high clock rates (and correspondingly high power and heat dissipations) are not required. FPGA, ASIC and SOC technologies provide an ideal medium for tailoring processors to specific applications, using newly available configurable and extensible processor technology.
To read the full article, click here
Related Semiconductor IP
- Chiplet Die-to-Die Interconnect IP Solution
- High speed MACsec Engine 100G/200G/400G/800G/1.6T
- Temperature/Voltage sensors
- AMBA Bus Host to eSPI Controller/Target
- AMBA Bus Host to eSPI Controller
Related Articles
- Tuning Fork - A Tool For Optimizing Parallel Configurable Processors
- Configurable processors or RTL -- evaluating the tradeoffs
- Configurable Processors: Ready for Prime Time
- Configurable Processors for Video Processing SOCs
Latest Articles
- ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs
- ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training
- OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs
- CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees
- CXL-ClusterSim: Modeling CXL-based Disaggregated Memory Cluster for Pooling and Sharing using gem5 and SST