The 'why' and 'what' of algorithmic synthesis
Bryan Bowyer
(05/02/2005 9:00 AM EDT)
Algorithmic synthesis helps hardware designers build and verify hardware more efficiently, giving them better control over optimization of their design architecture. The starting point of this flow is a subset of pure C++ that includes a bit-accurate class library. The code is analyzed, architecturally constrained, and scheduled to create synthesizeable HDL. Verification of this RTL is also an important part of the design process.
In a traditional design flow, crafting the hardware architecture and writing VHDL or Verilog for RTL synthesis requires considerable effort. The code must follow a synthesis standard, meet timing, implement the interface specification, and function correctly.
Given enough time, a design team is capable of meeting all these constraints. However, deadlines imposed by time to market pressures often force designers to compromise in area by re-using blocks and IP that are over-designed for their application.
Five years ago Mentor Graphics came up with rough criteria for making high-level synthesis compelling for real-world design. First, in order to justify the switch to a new flow, a designer needs to be able to build verified RTL an order of magnitude faster using high-level synthesis over manual methods.
So if a designer is currently building and verifying 1,000 gates per day, the same designer needs to be able to build and verify 10,000 gates per day to justify a methodology shift. Second, the design has to meet the same performance with no more than a 10% increase in area.
Why behavioral synthesis didn't work
Next we analyzed why behavioral synthesis had failed. Designers expected they could use wait statements to define their interface protocol. Unfortunately, a wait statement does not convey enough information to schedule a design without violating the interface protocol. There is nothing in the source language to prevent the time added by behavioral synthesis from breaking the interface protocol.
In an attempt to address this problem, new constraints were created to define interface timing, and the result was a synthesis language that fully defines interface timing. Unfortunately, interest in behavioral synthesis tools faded as designers realized wait statements could not be the only source of timing constraints. The wait statements in all behavioral languages, including SystemC, suffer from this drawback.
The timing constraints developed for behavioral synthesis, however, can be used for designs without putting timing or parallelism in the source. Removing timing and parallelism from the source language is what separates first-generation (behavioral) high-level synthesis from second-generation (algorithmic) high-level synthesis.
An algorithmic synthesis tool has a concise I/O timing constraint language that is separate from the functionality of the source. This allows the functionality and design timing to be developed and verified independently.
American National Standards Institute (ANSI) C++ is probably the most widely used design language in the world. It incorporates all the elements to model algorithms concisely, clearly and efficiently. A class library can then be used to model bit-accurate behavior. And C++ has many software design and debugging tools that can now be re-used for hardware design.
New system modeling languages such as SystemC or SystemVerilog can also be used, but this means teaching everyone a new language. The hierarchy and parallelism in these languages can be generated by a tool that synthesizes sequential C++ and that allows companies to take advantage of abstract system modeling without teaching every designer a new language. Designers are also able to quickly change the structure of the entire system without re-writing their source code.
As in RTL synthesis, the analysis of the design is just as important as the generated hardware because it is not helpful to know that a design is too slow without knowing how to fix it. Most high-level synthesis tools have a combined datapath and FSM viewer that allows quick analysis of the design. Certain interactive data from the tool, while it is running, is also useful so the designer does not have to wait for a full synthesis run to know if the design will meet all constraints.
Constraints for algorithmic synthesis
Synthesis constraints for the architecture can then be applied based on the design analysis. Separating design intent from functionality avoids the tedious process of rewriting and retesting code to make architectural changes. These constraints can be broken into hierarchy, interface, memory, loop and low-level timing constraints.
Hierarchy constraints allow the sequential design to be separated into a set of hierarchical blocks, and define how those blocks run in parallel. The interface constraints define the transaction-level communication, pin-level timing and flow control in the design.
Memory constraints allow the selection of different memory architectures, both within blocks and in the communication between blocks. Loop constraints are used to add parallelism to each block in the design, including pipelining. Finally, low-level timing constraints are available if needed. Once the design is constrained, it can be scheduled.
At the core of every high-level synthesis tool is a scheduler. Once all the architectural constraints are selected, the scheduler applies the constraints to create a fully timed design. The scheduler is responsible for meeting all the timing constraints, including the clock period.
One of the biggest conceptual changes between RTL synthesis and algorithmic synthesis is that the design is not written to run at a specific clock speed. Rather, the high-level synthesis tool builds a design based on the clock speed constraint. Many tools claim to be high-level synthesis tools, but without a scheduler, they are merely translators and much less powerful than high-level synthesis tools.
Once the scheduler has added time to the design, the RTL output can be generated. The RTL generation involves extracting the datapath, control and FSM for the design. Now the design needs to be verified. An algorithmic synthesis flow is shown in Figure 1.
Figure 1 — C-based design flowchart
An algorithmic synthesis tool knows the full detail about the timing and structure that is added to a design. This means the tool can also allow the original untimed C++ testbench to be re-used for every output from the tool. In other words, the untimed testbench can be re-used for every architecture created from a C++ design. Once it is verified the design can be run through a standard RTL synthesis flow.
At last year's DAC, Mentor launched Catapult C Synthesis. Today, the product is most widely used in the design of wireless communication devices. Video and image processing is the second most common type of application.
Algorithmic synthesis is the first approach to allow practical hardware synthesis from sequential languages to timed RTL. By starting with C++, the same language can now be used for software, hardware and system modeling. By allowing more optimization options, hardware designers consistently achieve better results than hand-coded designs in days rather than months for many datapath intensive designs. Now the same goals must be met for all designs to fulfill the challenges of the next 10-15 years.
Bryan Bowyer is a technical marketing engineer in Mentor Graphics' High-level Synthesis Division. In 1999, he joined Mentor as a developer and has been responsible for advances in interface synthesis technology and design analysis.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- Why VAD and what solution to choose?
- Colibri, the codec for perfect quality and fast distribution of professional AV over IP
- Paving the way for the next generation of audio codec for True Wireless Stereo (TWS) applications - PART 5 : Cutting time to market in a safe and timely manner
- Automotive Architectures: Domain, Zonal and the Rise of Central
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience