Does asynchronous logic design really have a future?
Does asynchronous logic design really have a future?
By Shekhar Borkar, Intel Fellow and Director of Circuit Research, Microprocessor Research Lab, Intel Corp., Hillsboro, Ore., EE Times
June 6, 2003 (4:14 p.m. EST)
URL: http://www.eetimes.com/story/OEG20030606S0037
The synchronous vs. asynchronous design debate erupts every now and then, usually when a vocal minority swears by asynchronous design, often claiming that asynchronous design delivers higher performance, lower power, or both, mostly with arguments and without any supporting evidence. Since clock distribution will get even harder with future multi-GHz designs, asynchronous-design advocates boldly predict demise of synchronous logic, to be displaced by asynchronous logic. So, we will look carefully into these arguments, and see if the claims and predictions hold any water. Design techniques, such as self-timed logic, often get classified as asynchronous. This is not accurate because the logic paths are bound by the clocks on either side of the self-timed logic, making it virtually synchronous. Modern synchronous designs contain small domains of self-timed logic if necessary, yet well contained within the clock boundaries, making the overall system fully synchronous. This is not the case of a true asynchronous logic, and in this discussion we will focus on the true handshake-based asynchronous logic style, where there is no clock, and in which logic domains create request and acknowledgment or completion signals as evaluation progresses, in the absence of a clock. One of the biggest claims to fame for asynchronous logic is that it consumes less power due to the absence of a clock. Clock power is responsible for almost half the power of a chip in a modern design such as a high-performance microprocessor. If you get rid of the clock, then you save almost half the power. This argument might sound reasonable at a glance, but is flawed. If the same logic was asynchronous, then you have to create handshake signals, such as request and acknowledgment signals that propagate forward and backwards from the logic evaluation flow. These signals now become performance-critical, have higher capacitive load and have the same activity as the logic. The refore, the power saving that you get by eliminating the clock signal gets offset by the power consumption in the handshake signals and the associated logic. That is why in practice, it is not evident whether asynchronous logic really consumes less power. The slowest logic path between the clock boundaries determines the performance of a synchronous design. That is, the clock frequency is determined by the delay of the worst-case logic path on the chip in the worst-case conditions, such as high temperature and the lowest supply voltage. Asynchronous logic, on the other hand, is self-throttling. The request/acknowledgment handshake ensures proper operation adapting to the signal delays, which could depend on temperature, supply voltage and even on the data. Therefore, the performance of an asynchronous design is determined by the average delay, not the worst-case delay. This may sound very attractive, but it can cause several practical problems. For example, the performance is not determinist ic; it depends on environmental conditions, and even on the data input. This behavior may be acceptable in certain embedded applications, but, in general, non-deterministic performance behavior is not desired. Synchronous designs, on the other hand, exploit the extra slack available in the non-critical logic paths to make these paths slower, try to bring them closer to the worst-case paths, thereby saving power and making the design more power efficient. Hurdles to overcome There are several other practical hurdles that need to be crossed to successfully employ asynchronous logic. Today's design tools are barely adequate for synchronous designs, let alone asynchronous design where tools are virtually nonexistent. You have to use synchronous tools with tricky modifications to fool them into thinking that the design is synchronous, making logic design verification, timing rollups, and race condition checks messy and involved. Debugging an asynchronous design is extremely difficult. In a synchronous design you can lower the clock frequency to see when and where it fails, and then investigate what fails, and how it can be fixed. In an asynchronous design there is no such clock and the logic must be debugged at full speed--to identify a failing logic path, you have to be a very good detective, basing your judgment mostly on circumstantial evidence. How will you test these asynchronous logic chips? Once again, you will have to be clever to fool the testers into thinking that the logic is synchronous. Since the performance of the logic depends on environmental condition and data patterns, performance binning of these logic chips will need a new testing methodology and paradigm. Finally, to use an asynchronous design in a platform, you will have to make the entire platform asynchronous. Interface hardware, memory and the supporting glue logic will all have to be asynchronous; shoe-horning asynchronous logic in a synchronous platform will be inefficient if not p ractically impossible. As evident from this discussion, you will have to cross several major hurdles in employing asynchronous logic, and all of these are surmountable. But there has to be an incentive to do so; you have to establish the benefits and make them evident. The benefits have to be substantial, not just marginal to warrant the major paradigm change mentioned before. So far, the benefits are not clear and not evident, if there are any. Multi-GHz clock distribution is getting harder, especially considering interconnect parasitic, clock skew and jitter. But designers have invented skew- and jitter-tolerant circuits, and have devised means to control clock skew and jitter. Still, future interconnect delays on a chip will be of the order of multiple clock periods, raising questions concerning the merits of synchronous design philosophy, the effectiveness of clock distribution, and whether synchronous logic will then survive. Asynchronous aficionados are quick to point out that GALS (gl obally asynchronous, locally synchronous) is the way to go. However, GALS will make the design look asynchronous when interfacing to the platform, a big hurdle as discussed before, making this option not so attractive. Instead, synchronous design will respond by making a gradual transition to mesochronous design on a chip -- an evolution of synchronous design where clock accompanies data. Since both clock and data traverse the same interconnect path, phase relationship is maintained, and the data is realigned to the local clock at the destination. Each logic domain will have a local clock distribution scheme where phase relationship is critical, as it is today, but the global clock phase could be arbitrary; that is, synchronized mesochronously. This technique is well versed in the platforms, where this transition was made a while ago due to difficulties in high-frequency clock distribution in a platform. You will find several mesochronous domains, such as memory subsystems, in most modern platforms.< /p> We have not seen any clear substantial benefits of asynchronous design. On the contrary there are several major practical hurdles to cross in employing asynchronous design in the mainstream. Synchronous designs have the momentum and have a clear evolutionary path in the future.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- A Virtual Reality Camera Design with 16 Full HD Video Inputs Sharing a Single DRAM Chip
- Keynoter sees asynchronous future for digital designs
- Asynchronous Logic: large CMOS devices without a clock tree
- The future of programmable logic
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience