Towards Activity Based System Level Power Estimation
ChipVision, Germany
Abstract :
This paper discusses power estimation at the system level. It shows why power estimation at the pre-RTL level is important and how it can be meaningfully applied during system level design. Special attention will be given to the role of dynamic data during the estimation process.
1. Introduction
The power consumption of SoCs has nowadays become a first class design objective. It is no longer a soft issue, but a hard constraint: In 2003 already 11% of all design re-spins were due to power issues [1] and the ITRS03/04 roadmap lists power reduction as the second most important crosscutting design challenge [2]. With the steep increase in leakage, the power issue is getting ever more pressing since.
High power consumption of ICs is critical in three aspects: supplying the energy, coping with the currents on chip and dealing with the heat. The issue of supplying the energy is especially pronounced for mobile applications, where available energy is limited and MIPS/Watt becomes a critical sales feature. High on-chip currents lead to reliability issues (“power integrity”) and may even destroy the chip, e.g. through electro-migration. As a last factor heat removal increases the system production cost, its risk of failure and may negatively impact customer acceptance.
While power minimization continues to gain in importance, productivity remains the traditional top challenge of the industry. Similar to the software crisis in the 70s the two main strategies that are applied to survive the design productivity crisis are (IP) reuse and abstraction. To crack the power problem while maintaining acceptable productivity requires power methodologies that support reuse and abstraction as well. The root problem in such methodologies is the determination of the eventual power cost of different design solutions, i.e. power estimation. In this paper we describe approaches to power estimation in the presence of reuse and abstraction using examples from the multimedia application domain, specifically a “JPEG Decoder” for video decoding.
Figure 1: Deviation of energy consumption due to different input stimuli. Each column represents an RT architecture optimised to minimise the energy consumption when decoding the depicted data. The rows stand for the data used for estimating the power of that respective architecture.
2. Traditional Power Estimation
There are several levels of abstraction at which power can be analyzed and estimated. At each level the term “Intellectual Property” represents different aspects.
Late in the design phase the layout development leads to the delivery of a GDSII tape for actual implementation. The transistor model itself combined with interconnect information become the IP characterized at lower levels and simulated at the SPICE level. Sufficient data is available to allow accurate simulation and estimation of energy consumption. Depending on the estimates at this level of abstraction design teams can still do transistor sizing and layout re-arrangements to achieve optimization based on placement and interconnect.
The accuracy at this level of abstraction is very high but the amount of data to be processed and the simulation speed are so limited that typically only a small number of test vectors can be used for dynamic analysis. The leverage on energy consumption is comparatively low as no major design changes like adjustments of the number of arithmetic resources can be made anymore.
At gate level – after logic synthesis and technology mapping – power models for technology cells are used for estimation. Such models are usually generated based on repeated circuit-level estimation (“characterization”) of the cells. They relate power to digital events at the cell inputs. The gate level power estimation problem then becomes one of effective gate level simulation and application of cell models.
Prior to place and route interconnect capacitance can only be roughly estimated. A more accurate estimation is possible afterwards by back annotating the layout capacitance to the gate net list.
At this level design teams can utilize optimizations minimizing the capacities which have to be driven by the most active nodes in the design. In addition energy consumption can be optimized using balancing of path delays to avoid spikes and spurious transitions and re-timing.
The accuracy of the estimation is still quite high at this level of abstraction but similar to the layout level the amount of data to be dealt with is still limiting. Even if the gate-level simulation is performed without back-annotated timing, users have to take into account significant simulation times.
At the RT Level – prior to logic synthesis – energy consumption can be estimated using event based or probabilistic simulation. While probabilistic estimation ([3][4][5]) has the advantage of circumventing time consuming vector simulation, its accuracy suffers from its inherent limitations in handling signal correlations. Consequently the vast majority of estimation techniques rely on (at least) cycle accurate simulation and the application of macro models. Such models are more abstract than at gate level: they relate the power consumption of RT blocks (such as adders, multipliers, but also controllers) to parameters derived from the processed data stream (e. g. Hamming distance) [6].
At the RT Level the number of options to reduce the energy consumption is still quite high, allowing reasonable overall leverage. Specific areas of the design can be dynamically switched to low-power modes trading performance vs. energy during execution. Popular methods are reduced clocking or full clock gating of areas in the design. Optimized resource sharing, isolation of operands and optimized coding of controller and bus states can contribute to reduce the capacities to be switched.
In principle architecture changes are still possible at this level. For example a user may change the number of multipliers to reduce switching via utilization of correlation between subsequent data in a data stream. However, users often focus completely on functional verification during this phase. In addition simulation times are still significant and the actual effort to design the RTL often prohibits focus on energy optimization at this level of abstraction.
With the ever increasing complexity of designs these traditional estimation methods are running out of steam as they produce feedback to the design teams relatively late in the design flow. New, more abstract estimation techniques are required to enable early design decisions.
3. ESL Power Estimation
The term ‘electronic system level’ (ESL) has caused some confusion. A major reason for this effect is that ESL implies both, a scope (the entire system) as well as a level of abstraction (above RTL). The ironic twist to this situation is that abstract descriptions of the entire system rarely ever exist in electronic form. This is due to the fact that system level design is rarely a single coherent effort from first idea to implementation. Rather there exist different design approaches tailored to the specific aspects and problems of the sub-system under consideration. The absence of a complete system model is usually tolerable.
We advocate that the same holds true for system level power estimation. In the following we will demonstrate how power estimation can be applied and exploited in different design scenarios at system level. Note that even if the system is designed separated in sub-designs its metrics must remain comparable. Relative estimates are therefore of limited value. This applies to power in the same manner as to area, performance, etc.
Design planning
As a first shot the system architecture is often planned in a quantitative style: “we need 1 DSP, 2 SRAMs, etc”. Such planning usually is performed by using spreadsheet software or layout planning tools. Without topological or dynamic information the estimation of power, but also of area, performance, etc. is based on simple models: Power is e.g. given as a function of clock frequency or frequency of operations performed. These models are either taken from data sheets or self-generated based on in-house experience.
Power estimation during design planning serves to define the systems’ power requirements in rough bounds. This can be used to define the systems packaging, required cooling and first power plan.
Algorithm Design
In this scenario the design activity concentrates purely on functionality. The challenge is to find the right algorithms for the system’s computational components and optimize them for hardware implementation. Algorithmic (“behavioural”) description languages (like C, C++, Matlab) are the natural choice for this task.
During algorithm design power estimation can guide trade-offs like algorithm selection, quality of service, quantization or algorithmic transformations. These optimizations can have a drastic impact on the power consumption.
Verification Model Design
“Golden Models” of sub-systems are often designed for verification purposes of both the hardware under development and the software interfacing to or running on the hardware. Although the concrete implementation might differ from the functional description used here, power estimates based on verification models can serve as a first hint on the power behaviour of the sub-system in question. Verification models are often written in C or SystemC, but also in E. The estimation techniques and models are similar to those used during algorithm design.
Behavioural Data Path Design
Behavioural data path design means the identification of an optimal data path implementation for a given algorithm. This design step can be automatic (“behavioural synthesis”) or manual.
Power estimation in this design scenario helps to identify the power optimal schedule, allocation and binding of the algorithm.
4. System level power models
Power models used for IP blocks differ in level of detail. Here are the most common types, often used “ad hoc” for specific design tasks:
Constant power
At the most abstract level – for example using a spreadsheet for design planning - designers use a constant power value per block. These models use values often extracted from datasheets and do neither recognise the activation state, nor the working mode or the data processed.
Models of this type are only applicable for blocks with very regular activity patterns. With the increasing importance and penetration of power management techniques these models are only useful for very coarse, early design planning.
Figure 2 – Power Planning for C5000 Family
Energy per activation
When taking into account the amount of activations for a block, users can model the effects of power management at a high level. However, these models still do not recognizing working mode or data processed in a module. These models are applicable for simple processors, memories etc.
A common characterization for processors in their datasheets is a value for the power consumption in mW per MIPS. As a result users can estimate the power consumption as a function of the processor load caused by the applications running on it. Error! Reference source not found. shows such a spreadsheet for the Texas Instruments C5000 Family.
Figure 3 – Example of abstract models extracted from datasheets.
Energy per operation mode
It has also become common to characterize in datasheets the energy consumption as a function of the operation mode – i.e. a low resolution and high resolution mode for a video decoder in mobile applications.
Given that these values are averaged they produce not very accurate results, but can again be used for early design planning.
Energy per state
Similar in principle to the concept of energy per operation mode models, but usually more fine grain are state based power models. They model the power behaviour as a “power state machine” that contains all relevant power states of a block and their respective transition conditions. States and transitions are attributed with their power cost.
Figure 4: Power State Machine of a hard disk [7].
Power state machines are a widely used system model and have become part of standards like APCI. They are suitable for all system blocks whose power but neglect the influence of data processed on the power consumption.
Data aware Energy Models
Switching capacities especially in very active data path portions of designs contribute significantly to the overall energy consumption in a design. In order to take into account these dynamic effects the actual stimulus needs to be taken into account.
Figure 5 – Abstraction of gate-level information into IP models at the RT-Level
In contrast to stimulus used for verification – with which each corner case needs to be covered - the stimulus required for power consumption estimation has to represent typical use cases from a power perspective. For a JPEG decoder a set of typical pictures has to be considered.
Figure 5 illustrates the principle of abstracting data aware models from lower levels of abstraction to be used at higher levels and taking into account activity information. In this case gate level information is abstracted for use at the RT- and pre-RT level. The characterization of IP models is performed for arithmetic components, registers and memories using logic synthesis, gate-level simulation and gate-level power analysis. The resulting models are dependent on the input data and semiconductor technology. They are also scalable in size and have to be characterized only once per technology.
5. The Importance of Activity Information
As mentioned in the previous sections, several power estimation approaches seek to abstract from the concrete circuit activity to avoid the complexity of event simulation. This might seem reasonable especially on higher levels of abstraction as the activity might be seen as a low level implementation aspect. In practice however the circuit activity proves to be important not only for estimation, but also to guide design choices at system level.
Figure 1 documents this aspect with the help of a JPEG decoder design. The table shows different architectures optimised to minimise the energy consumption of the decoder using different reference images as stimulus (columns). The columns are ordered by decreasing “activity” within the pictures, starting with the most active “white noise” through a comic strip, two photos and ending with a grey picture of constant values.
The rows show the deviation when estimating the logic (top) and memory (bottom) energy of these architectures using different input stimuli. An increase of up to 63% can be observed when operating the design (optimized for “constant grey) with different data then originally optimised for (“white noise”).
Figure 6 : Energy consumption as function of dynamic stimulus used.
The cells in the main diagonal of Figure 1 show the relative deviation of each architecture with respect to the one optimised to the white noise picture (always using the stimuli they where optimised for). Their relationship is also shown graphically in Figure 6. We can see that optimising the architecture for certain data characteristics saves up to almost 40% of energy.
In summary the data shows that activity must be considered for accurate power estimation as well as meaningful design choices even and especially at system level.
6. Activity based system level estimation
In view of the high impact of the processed data on the power consumption it becomes clear that not only accurate power models are required, but also an accurate prediction of the circuit activity. At pre-implementation level, where specifications still allow a large degree of freedom, simple simulation, e.g. transaction level simulation in SystemC is not sufficient. This is due to the fact that resource sharing can have a high impact on the data streams eventually consumed by hardware modules. Figure 7 illustrates this effect: by sharing two operations the total Hamming distance at the operator inputs triples in this case.
Figure 7 : The effect of resource sharing on circuit activity. By mapping the two operations on the left onto one hardware resource the input activity triples from 4 to 12 transitions.
While an easy way to avoid this problem would be to decide on resource sharing before simulation, this would mean having to re-simulate for every candidate architecture during design space exploration. We therefore propose a two-pronged strategy: perform system level simulation prior to the sharing decision and emulate the effect of the sharing afterwards [8].
In the ChipVision methodology the system blocks under investigation are simulated on transaction level. The transaction traces are recorded by instrumentation based data profiling, For the estimation an automated design space exploration is then performed, evaluating different resource sharing possibilities.
The cost function applied in this phase includes power values obtained by emulating the effect of scheduling and sharing on the data streams. The result of this process is a power estimate of the blocks under investigation as well as a suggested scheduling and sharing of resources.
7. Summary
With the example of a JPEG Decoder this paper analyses IP modelling techniques for low power at different abstraction levels and for different design scenarios. Examples of IP models representing power / energy information and their application for power estimation have been provided. Special attention has been put on the importance of representing dynamic activity information.
8. References
[1] Aart de Geus: „SNUG keynote address“, Boston, 2003 (http://www.deepchip.com/posts/0417.html)
[2] ITRS 2003 Edition (http://public.itrs.net/Files/2003ITRS/Design2003.pdf)
[3] J. Costa, J. Monteiro, L. M. Silveira and S. Devadas, “A Probabilistic Approach for RT-Level power modeling”, The 6th IEEE International Conference on Electronics, Circuits and Systems, September 1999.
[4] D. Marculescu, R. Marculescu, and M. Pedram, `Information Theoretic Measures for Power Analysis', in IEEE Trans. on CAD of Integrated Circuits and Systems, vol.15, No.6, June 1996
[5] P. Sathishkumar, “Stimulus-Free RT Level Power Model using Belief Propagation”, Master. Thesis, 2004, (http://www.eng.usf.edu/~bhanja/Sanjukta%20Bhanja/pdfs/thesis/Shatishfinal.pdf ),
[6] Gerd von Cölln (Jochens), Lars Kruse, Eike Schmidt, Wolfgang Nebel, „A new parameterizable power macro-model for datapath components”, DATE 1999
[7] L. Benini, R. Hodgson, P. Siegel, „System-level power estimation and optimization”, in Proceedings of the 1998 international Symposium on Low Power Electronics and Design (Monterey, California, United States, August 10 - 12, 1998). ISLPED '98. ACM Press, New York, NY
[8] W. Nebel, “Predictable Design of Low Power Systems by Pre -Implementation Estimation and Optimization”, ASP-DAC 2004, Yokohama, Japan
[9] Texas Instruments, http://www.ti.com/
Related Semiconductor IP
- AES GCM IP Core
- High Speed Ethernet Quad 10G to 100G PCS
- High Speed Ethernet Gen-2 Quad 100G PCS IP
- High Speed Ethernet 4/2/1-Lane 100G PCS
- High Speed Ethernet 2/4/8-Lane 200G/400G PCS
Related White Papers
- Accurate System Level Power Estimation through Fast Gate-Level Power Characterization
- VMM based multi-layer framework for system level verification
- System configurations for power systems based on PMBus 1.3
- Performances Estimation Metamodel for MDA Based SoC Design
Latest White Papers
- New Realities Demand a New Approach to System Verification and Validation
- How silicon and circuit optimizations help FPGAs offer lower size, power and cost in video bridging applications
- Sustainable Hardware Specialization
- PCIe IP With Enhanced Security For The Automotive Market
- Top 5 Reasons why CPU is the Best Processor for AI Inference