Optimizing clock tree distribution in SoCs with multiple clock sinks
Alberto Ferrara and Pierpaolo De Laurentiis, STMicroelectronics
Embedded.com (March 10, 2013)
In the design of high-performance high-speed integrated circuits, clock tree organization is fundamental to distribution of e-clock signals to the whole area of an integrated circuit or to a predefined part of it.
In this article we describe a structure and a method for propagating clock signals to a multiplicity of clock sink nets in a system-on-chip (SoC) design. We include an improved buffering and wiring apparatus that allows reduction of the number of clock stages, the overall latency, the clock skew, and uncertainty.
The problem of clock distribution from root (PLL) to sinks (FlipFlops) is addressed, using two phases: (1) top level optimal distribution and (2) local or block based clock distribution. A method for integrating the two phases within an automation system is also described.
The role of clock trees in complex SoCs
The growing complexity of integrated circuit design is leading to several requirements for bringing a layout to completion. Modern technology nodes (32nm and beyond) are challenging because their reduced physical geometries introduce uncertainties due to local variation (random effects) and the impact of parasitics in terms of wire capacitances and resistances.
These variations are typically called on-chip variation (OCV). There are two source classes of variation that must be considered in design: global and local. Global chip-to-chip variations cause performance differences among dies and are modeled as operating corners. Local on-chip variations cause performance differences among transistors within the same die and are modeled as an added derating factor to get skew calculations.
OCV derating is calculated as a certain percentage of the total insertion delay. Consequently, in order to optimize performance of the clock tree, designers need also to take into account structures that are inherently not prone to OCV, while minimizing the overall latency. In this context one of the things that must be considered is the impact of automated methods for the computer-aided creation of a layout of an optimized clock tree circuit. This is of crucial importance because existing software tools tend to arrange the clock tree unfavorably for OCV and latency.
Standard Clock Tree Synthesis engines are driven by timing closure and, hence, are not PVT (process/voltage/temperature) variation aware. They are used to fix setup/hold violations by adjusting the clock skew, adding, removing, and swapping buffers, or exploiting different clock wire lengths and levels and so on. As a result, the skew sensitivity with respect to PVT variations cannot be kept low, since it has several contributors originating from different physical phenomena.
To read the full article, click here
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- Multiple clock domain SoCs: Verification techniques
- Achieving Low power with Active Clock Gating for IoT in IPs
- Verifying Dynamic Clock switching in Power-Critical SoCs
- Asynchronous Logic: large CMOS devices without a clock tree
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience