Thorough validation: the conundrum of Pulsed latch libraries turned practical as Spinner systems
By Louvat Mathieu, Dolphin Integration
Abstract
Using pulsed latches instead of flip-flops is a solution that has been thoroughly studied for its advantages in speed, density, and power consumption reduction [1] [2]. Even so, this solution has not been widely adopted by standard cell library providers because of the difficulties related to timing verifications: pulse width integrity and hold time closure. There is also a lack of EDA tools natively supporting this feature. Dolphin Integration delivers standard cell libraries based on pulsed latches (SESAME uHD libraries) that can be used in standard design flows and fully compatible with the most common EDA tools.
Introduction
Pulsed latches (Spinner cells) are an alternative to flip-flops for synchronous memorization, which had seen a number of failed attempts over the years. Their industrialization until now was lacking. On the one hand, a flip-flop allows capturing data on an active edge (high or low) of a clock signal (Fig 1).
Fig 1: Flip-flop design
One the other hand, a pulsed latch allows capturing data in an active state (high or low) of a clock signal. It is composed of a single latch (so called Spinner cell) and a pulse generator (PG) to generate a pulsed clock signal (Fig 2 and Fig 3).
Fig 2: Latch design
Fig 3: Pulse generator design
The patented Dolphin Integration Spinner system [3] [4] [5] enables the use of pulsed latches in standard SoC integration flows [6] using the tools provided by the major EDA vendors.
This article demonstrates the benefits of Dolphin Integration’s standard cell libraries based on Spinner cells and Pulse Generators, so called Spinner system.
The first part is dedicated to demonstrating the performance improvements compared to flip-flops. A complete validation methodology of the integrity of the pulse is presented in the second part. Then robustness of both the Spinner system and flip-flops is investigated using high-sigma simulations. Finally, a test-chip methodology for timing closure will be given along with examples of mass production of Silicon IP using the Spinner system in various technologies.
Performance improvements with the Spinner system
Area reduction
Circuits implemented using standard cell libraries providing the Spinner system are 10 % to 20 % denser (Fig 4) than circuits implemented with flip-flop based libraries [6].
Fig 4: SoC area versus frequency on the Motu-Uta 5.1 benchmark for spinner and flip-flop based libraries at 65 nm
Power saving
As the Spinner cell is two times less complex than its flip-flop counterpart, its power consumption is drastically reduced, as illustrated in Table 1.
Rise power CP -> Q (pW/MHz) | Fall power CP-> Q (pW/MHz) | Internal power CP (pW/MHz) | Average Leakage (pJ) | |
Spinner | 4.957 | 3.801 | 2.325 | 106.354 |
Flip-flop | 6.705 | 5.469 | 3.858 | 157.96 |
Power gain of Spinner versus flip-flop | 26 % | 31 % | 40 % | 32 % |
Table 1: Power consumption (index 1x1) for flip-flop and spinner at 55 nm (TT 1.2V 25°C)
To compute power savings at SoC level, the comparison needs to take into account changes in the clock tree structure linked to the Pulse Generator (called PG hereafter) insertion [6]. Due to strong pulse integrity requirements, the PG consumes much more power than a standard clock buffer. Its impact on the clock tree power consumption depends on the clock tree structure, the number of spinners and the average PG fan-out. Clock gating options also have a big impact on the clock tree structure and therefore on the power consumption of the Spinner system through the fan-out of the pulse generators.
Pulse Generator insertion is performed using a patented algorithm [3] [4] implemented using the Tcl language compatible with the P&R tools provided by the major EDA vendors.
Fig 5 presents the power analysis of the clock tree, the registers and the entire SoC after the P&R step on the Motu-Uta 5.1 public benchmark, with flip-flops and Spinner system in the same conditions with clock gating options optimized for power consumption. The total power consumption of registers is reduced by 60 %, but the power consumption of clock tree is increased by 15 %. Overall, 27 % power reduction is reached thanks to the Spinner system.
Fig 5: Power analysis after P&R on the Motu-Uta 5.1 benchmark (clock period of 5ns) at 55 nm (TT @25°C)
Spinner system validation methodology by Dolphin Integration
To take advantage of the performance improvements granted by the Spinner system, mastery of the pulse integrity is a requirement since it conditions the proper behavior of the pulsed latch. To this end, Dolphin Integration has developed a validation methodology for the Spinner system involving 2 steps: mismatch validation and 45 corners validation.
Mismatch simulations are used to validate the pulse integrity over local statistical process variations whereas the 45 corners validation checks the pulse integrity over systematic process variations.
The following simulations are performed:
- Capture of a logic “0” with min clock transition
- Capture of a logic “0” with max clock transition
- Capture of a logic “1” with min clock transition
- Capture of a logic “1” with max clock transition
Design Under Validation (DUV)
The schematic used for both validations is presented in Fig 6. Two pulse generators respectively drive their minimum and maximum fan-out, and are connected to a Spinner cell. The output of each Spinner cell is loaded with its maximum capacitance. The input pins (CP and D) are set according to the 4 cases described above.
Fig 6: Validation schematic (DUV)
Mismatch validation
The mismatch validation is composed of 3 steps:
- The determination of the worst case netlist for the PG
- The determination of the worst case netlists for the Spinner cell
- The simulation of the DUV with these netlists
This entire validation is performed with worst and best corners for parasitic extraction and process. The worst netlist for the PG is the netlist which generates the smallest pulse under the worst mismatch variations. The worst netlists for the Spinner cell are the same as above for 2 distinct cases: the worst for the capture of a logic “1” and the worst for the capture of a logic “0”.
Any failure running the simulations with the worst netlists for each case leads to re-sizing of the Spinner cell.
45 corners validation
45 corners validation is the simulation of the DUV with total process corners (TT, FF, SS, FS, SF), three parasitic extractions (minimum, maximum and typical) and three temperatures: highest, smallest and room (Fig 7).
Fig 7: 45 corners validation
In case of simulation failure, the pulse generator and Spinner cell need to be re-sized.
Spinner system robustness
The validation methodology is qualified by comparing the robustness of the Spinner system with that of flip-flops using high-sigma simulations and silicon qualification.
High sigma simulations
High-sigma simulations with fast Monte-Carlo simulators have been run on a fully validated Spinner system and flip-flop, and the yield of both solutions have been compared.
The simulated schematic is based on the DUV in Fig 8.
Fig 8: High sigma netlist
The success criterion is defined as a propagation time (CP to Q) with less than 20 % variation over the characterized propagation time.
Fig 9 shows the results for the capture of a logic “0” (worst case) for both the Spinner system and a flip-flop at 55 nm.
Fig 9: High sigma simulation results
The results show that the yield of the Spinner system is identical to the yield of the flip-flop.
Many test-chips embedding the Spinner system have been designed and measured to correlate silicon with the characterization models (Liberty files) and specifically the timings.
Correlation of propagation timings
To correlate propagation timings, the frequency (F_ro) of a ring oscillator composed of tiles (Fig 10) looped with a nand gate controlled by a start signal is measured.
Fig 10: Tile of a spinner RO
From F_ro, we deduce the propagation time of a tile (Tp_meas). At the same time, we run static timing analysis (STA) to determine the simulated propagation time (Tp_sim).
To correlate, Tp_meas must satisfy the following equation:
Correlation of constraint timings
Constraint timings of the Spinner system are described using the “nochange” syntax. The methodology used [7] has been adapted to measure the “nochange high high fall” constraint (hold falling).
To correlate, the result (Tc_meas) must be included in the meta-stability window defined in STA (Fig 11).
Fig 11: constraint timing measurement correlation
The Spinner system is in mass production in many customer SoCs for the implementation of ultrahigh density logic. It is also embedded in Virtual Components of audio codecs shCODlv-90.15 2G
(180 nm), sCODS100-LB-IO-N.12 (40 nm) and sDACa-MT1.03 (28 nm), provided by Dolphin Integration, which are also in mass production in many customer SoCs.
Conclusion
Dolphin Integration has industrialized the technique of pulsed latch design for digital circuit implementation through the introduction of an innovative Spinner system, at a time when its use has become most relevant. This system has demonstrated significant performance improvements suitable for low-power and high-density applications, such as IoT, compared to flip-flop based libraries.
Bibliography
[1] Pulsed-Latch Aware Placement for Timing-Integrity Optimization by Yi-Lin Chuang, Sangmin Kim, Youngsoo Shin and and Yao-Wen Chang
[2] Pulsed-Latch Circuits: A New Dimension in ASIC Design by Youngsoo Shin and Seungwhun Pai
[3] FR2963688 (A1) - 2012-02-10 - ARBRE D'HORLOGE POUR BASCULES COMMANDEES PAR IMPULSIONS
[4] US2012032721 (A1) - 2012-02-09 - CLOCK TREE FOR PULSED LATCHES
[5] FR2972087 (A1) - 2012-08-31 - Circuit de bascule commandée par impulsions [6] Spinner System: optimized design and integration methodology based on pulsed latch for drastic area reduction in logic designs by Lionel Juré
[7] « Self-Calibrate Two-Step Digital Setup/Hold Time Measurement» by Luo Zhihong, Zhang Yihao, Henry Law
Check some related IPs
To find the silicon IP related to your need, contact Dolphin Integration: contact@dolphin.fr
About the Author
Louvat Mathieu joined Dolphin Integration in 2011 as standard cell designer. He is now involved in standard cells and low power system architectures. Mathieu holds a master’s degree in Electronics and Integrated System Design from Joseph Fourier University
About Dolphin Integration
Dolphin Integration contributes to "enabling low-power Systems-on-Chip" for worldwide customers - up to the major actors of the semiconductor industry - with high-density Silicon IP components best at low-power consumption.
The "Foundation IP" of this offering involves innovative libraries of standard cells, register files and memory generators. The "Fabric IP" of voltage regulators, Power Island Construction Kits and their control network MAESTRO enable a flexible assembly with their loads. They especially star the "Feature IP": from high-resolution converters for audio and measurement applications to poweroptimized 8 or 16 and 32 bit micro-controllers.
Over 30 years of experience in the integration of silicon IP components, providing services for ASIC/SoC design and fabrication with its own EDA solutions, make Dolphin Integration a genuine one-stop shop addressing all customers' needs for specific requests.
It is not just one more supplier of Technology, but the provider of the DOLPHIN INTEGRATION know-how!
The company strives to incessantly innovate for its customers’ success, which has led to two strong differentiators:
- state-of-the-art “panoplies of Semiconductor IP components” for high-performance applications securing the most competitive SoC architectural solutions,
- a team of Integration and Application Engineers supporting each user’s need for optimal application schematics, demonstrated through EDA solutions enabling early performance assessments.
Its social responsibility has been from the start focused on the design of integrated circuits with low-power consumption, placing the company in the best position to now contribute to new applications for general power savings through the emergence of the Internet of Things.
Related Semiconductor IP
- JESD204D Transmitter and Receiver IP
- 100G UDP IP Stack
- Frequency Synthesizer
- Temperature Sensor IP
- LVDS Driver/Buffer
Related White Papers
- Setup/hold interdependence in the pulsed latch (Spinner cell)
- Royalty-based libraries cost more than you think
- Accellera's ALF language lets designers control libraries
- Design of Base I/O Libraries (by Ron Nikel, Co-Founder and CTO of TriCN)
Latest White Papers
- New Realities Demand a New Approach to System Verification and Validation
- How silicon and circuit optimizations help FPGAs offer lower size, power and cost in video bridging applications
- Sustainable Hardware Specialization
- PCIe IP With Enhanced Security For The Automotive Market
- Top 5 Reasons why CPU is the Best Processor for AI Inference