Accurate and Efficient Power estimation Flow For Complex SoCs
Gaurav Jain, Arunendra Tomar, Umesh Pratap (Freescale Semiconductor)
Abstract:
NextGen SoC designs are expected to deliver highest performance as well as least power consumption at the same time. It is imperative to meet the stringent power targets in different operating modes of the device thus resulting in a need to devise very accurate Power Estimation techniques. This paper discusses the methods used for early and accurate power estimation techniques for next generation multi-million gate designs targeted for ever shrinking design technologies. This paper also discusses the challenges in correlating the power numbers from pre-silicon phase (theoretical power estimation, VCD based estimation) to post silicon platforms (Analog Validation, Tester).
Introduction:
In today’s world energy consumption has recently emerged as an important factor of system performance with the increasing requirement for low power reconfigurable SoC design. We estimate the overall power consumption of a SoC platform which includes processor, on-chip bus and the peripherals.
SoC power estimation is done by static/average power and dynamic-IR drop power analysis. Initial power consumption estimates are done using theoretical analysis based on activity factor of core, peripherals and memory components. Verification team runs use cases to stimulate maximum possible design components in conjunction and delivers the maximum activity window in form of Value Change Dump (VCD) or Toggle Count Format (TCF) to the power estimation team for determining the power numbers. Large dump/VCD size (like more than 100GB) is not handled by power analysis tools. We have an increasing need to use emulation platforms for power analysis to simulate long running use cases which are not possible to run in software simulation.
SoC technology is characterized by the "feature size" used. Feature size like 90nm, 55nm or 28nm. The feature size determines the minimum dimension of a wire or transistor. Thus for a given size chip, the density of transistors can grow immensely. However, as the feature size decreases, the voltage used must decrease as well. This might seem to reduce power consumption but the number of transistors are growing roughly exponentially and the clock rates are increasing, more transistors must be switched more frequently causing a net increase in power use. During the design process the power estimation has to be estimated in every design step with meeting the constraints of every part of the design as well as the whole design.
Why power estimation is required in SoCs:
Power estimation is required to determine whether the device will meet the targeted power number specifications. In today’s world, the electronics devices have high transistor density. Thus more interconnects between these elements are required. And power consumption by these interconnects remains at certain levels because they can’t be made smaller and need to be at close to each others. The share of power consumed in the interconnections increases compared to overall energy dissipation.
The main consequences of this trend are the addition of cooling circuits and the battery lifetime reduction for systems. That is power estimation also helps to avoid the problems concerning cooling and reliability. Market forces are demanding low power for not only better life but also reliability, portability, performance, cost and time to market.
Challenges in accurate and efficient power estimation in SoCs:
In the today’s billion gate SoC chip design, the runtime and generated waveform database size are challenging issues for accurate power estimation.
- To generate suitable input sequences for SoC that triggers the H/W and S/W together, we need system representation that abstract architectural design. For power estimation, an input sequence can be used that activates all the functions of the design that maximize the power it consumes.
- Simulation is needed to analyze power. However, for billion gates SoC, the simulation runtime is becoming too long for reasonable design cycle.
- The simulation waveform generated from simulation can occupy more than hundreds of GBs, which puts significant burden on power analysis tools to deal with. Such large simulation dumps cause significant performance and memory degradation in power analysis tools, and it is impractical to perform power analysis using such large dump files.
- Generally the VCD (Value changed Dump) file can be very large in terms of total simulation duration. While static power calculation can process such full VCD file and derive average activity to calculate static power, running dynamic simulation on full VCD is unrealistic due to performance and resource limitation. Hence it is recommended that you identify the power hungry cycles or cycles with high switching activity up-front to perform dynamic power calculation.
- In general, there is iteration for VCD generation and power analysis on that VCD between the verification and design team.
Why power estimation is required at RTL and GATE(GLS) level for SoCs:
Power analysis is done at RTL as well as gate level. This is required to optimize the design and power analysis.
At RTL level,when simulation data is dumped for use case scenarios , it takes lesser runtime and memory usage of simulator or emulator. It also significantly reduces the size of simulation dump.
This flow also helps to reduce runtime and memory usage in power analysis. If there is unexpected power(not as per specifications) for any signal/module from RTL simulated dump than, it is easy to improve the design and power at this stage.
Disadvantage of RTL power estimation is that, there is 15-20% deviation from real silicon power,So we need gate level power analysis where deviation from real silicon power is 10%.
Disadvantage of GLS power analysis is that, simulation run time at GLS is large and dump size and memory size increase. Large dump size(VCD) is difficult to handle by power tools. Also If there is unexpected power(not as per specifications) for any signal/module from GLS simulated dump than, it is difficult to improve the design and power at this stage as compare to RTL level.
Power analysis:
The power dissipation of SoC-design can be described by
Pavg =P dynamic + P short-circuit + P leakage + P static
Fig1: Different powers at transistor level.
Pavg is the average power dissipation.
P dynamic is the dynamic power dissipation due to switching of transistors, dynamic power dissipation is caused by the charging.
P short-circuit is the short-circuit current power dissipation when there is a direct current path from power supply down to ground.
P leakage is the power dissipation due to leakage currents.
P static is the static power dissipation. Static power is the power dissipated by a gate when it is not switching that is, when it is inactive or static.
In SoCs, we focus on estimating the dynamic and static power dissipation of digital circuit, because this is directly related to chip heating and battery lifetime.
Power analysis process:
Fig2: Power estimation flow for RTL and Gate level
The general trend for SoCs power estimation is that the verification team runs the use case scenarios to generate the dump/VCD and power analysis team uses this dump/VCD for analyzing power. In this there are iterations b/w both the team(verif and power analyzing team) to get maximum toggling dump/VCD.
SoC power estimation is done by static/average power and dynamic-IR drop power analysis.
- Average Power : Average power is the power consumed for a window of time, based on the modes it is working on ( for Eg run mode , stop modes ) etc.
- Dynamic IR drop : It is the actual instantaneous peak power , not the average power.
Average power analysis:
For billion gate SoCs, because of large size VCD, toggle count format(TCF) is generated for entire simulation run instead of the VCD for average power analysis. The TCF is generated from the time when the first reset vector has been fetched till the end of the simulation and then power calculation is done for that TCF.
Using the above mentioned (TCF) average power analysis process, we can achieve the average power data very efficiently and faster than the traditional power analysis process.
Dynamic IR drop Analysis:
Dynamic power is the dynamic power dissipation due to switching of transistor, dynamic power dissipation is caused by the charging at time instant.
For dynamic IR drop the above flow doesn’t work , because the toggle information is not sufficient instead the actual toggles ( read as 1 or 0) is required to get the actual power at that instant of time.
For Dynamic IR Drop different TCF (every around 2000cycle) for entire simulation is generated. And then power tool process these numbers of TCFs. All these TCFs generated only in one simulation run and takes less times and memory.
Thus after processing these TCF’s, exactly at which time the maximum activity is happening is found. For this maximum activity window(found by TCFs processing),VCD /FSDB is generated which gives accurate IR drop.
Static Power Analysis should be run first and all static problems should be resolved prior to running dynamic analysis.
Strategic approach to handle the challenges during power Analysis for SoCs:
At RTL/Gate level: During the vcd generation for power analysis, dump size becomes very huge (more than ~100GB) that resulting frequently tool crashes (due to large memory usage) . Most of the industry standard power analysis tool support till 100GB dump size.it makes power analysis process more time consuming and takes multiple iterations.
Solutions:
- We can break the VCD in to smalls vcd slices (VCD Segmentations )and provide to the power team .Power team can efficiently use those VCD slices with very rare tool crashes .
- For calculating the average power, Opt for TCF method as mentioned above.
- For minimizing the dump size in Gate simulation, we can generate the FSDB dump instead.
- If your SOC verification environment have the analog behavioral model then vcd may contains the real value ($VAR_REAL,$READY,$DRIVE,-b,-,r# etc.) during the VCD conversion that may also cause the power analysis tool crash , then we need to post processed the VCD before delivering the VCD to power team .
Summary:
The SoCs accurate power analysis have the flow:
- Generate the VCD form RTL/GLS simulation.
- Power analysis tool will process the provided VCD/FSDB/TCF.
- we get the power number from the power analysis tool after processed the VCD/FSDB/TCF .
- Later on we shared the power numbers from tester team (silicon) as well.
- Compare the both power numbers (power from GLS flow and tester)
- Now compare the power from expected /application power as per specification.
Thus in SoC system power is calculated from the gate and tester team (silicon) and then compared to get the optimized power analysis.
Conclusions and Future Work:
Power Estimation for different circuits from RTL level to Gate level using different power estimation tools is done.
However, these results are still very impressive on the reduction of the power model complexity and the feasibility for a wide range of input signal distribution. The lower complexity can also reduce the characterization time and estimation time sufficiently. It can be concluded from these power estimations at different levels of abstraction how inaccurate values at RTL are compared to Transistor level.
If there are much accurate and efficient power estimation methods on RTL level, that will be biggest achievement because it is more feasible to improve the design from this stage. It will avoid the power estimation challenges abrupt surprise in sense of power numbers after getting the Silicon.
References:
V. Tiwari, S. Malik and A. Wolfe, “Instruction Level Power Analysis and Optimization of
C. Talarico, J.W. Rozenblit, V. Malhotra, A. Stritter, “A new framework for power estimation of embedded systems”
R.A. Bergamaschi, Y.W. Jiang, “State–Based Power Analysis for Systems–on–Chip”, DAC2003, June 2–6, 2003, Anaheim, California, USA, pp 638–641
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- An Automated Flow for Reset Connectivity Checks in Complex SoCs having Multiple Power Domains
- Novel and efficient power grid design for lesser metal layer process SOC's
- How NoCs ace power management and functional safety in SoCs
- SignatureIP's iNoCulator Tool - a Simple-to-use tool for Complex SoCs
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience