C link remains frayed in hardware design flows
Ron Wilson
(10/17/2005 10:00 AM EDT)
Since before RTL and synthesis, C has been a part of chip design. In fact, before C, a good deal of basic design work-from algorithm exploration to analog simulation to logic optimization-was done on time-share systems in languages such as Basic. But that's another story.
This story is about C-how it is actually being used today in hardware design, its strengths and weaknesses in these roles, and what lies ahead for the venerable programming language. Rather than ask vendors of C-based tools, who understandably might have some systematic error in their data, we have spoken with designers who are really using the stuff. You might find their stories informative.
In the beginning, C was used much like Basic before it: to solve simple computing problems for which there were no ready tools at hand, be they circuit-parameter calculations, algorithm simulations or shell scripts. But as the nascent EDA industry filled in the gaps in the RTL design flow, C became increasingly specialized as a way of quickly implementing, simulating and exploring algorithms-especially data flow-intensive algorithms. C was and remains cumbersome for dealing with control-oriented algorithms, since its control constructs are so different from those used in hardware.
"Our charter is to take algorithms from other groups, analyze them and implement them, usually in FPGAs," said Dennis McCain, research manager at the Nokia Research Center. "Typically, we will start with a Matlab model of an algorithm. We will then convert that to fixed-point C code, which allows us to explore the algorithm and its hardware implication more efficiently."
This is not an uncommon use of C. "We generally explore signal-processing algorithms in C," said Unpendra Patel, CTO of eInfoChips. "For many algorithms, we do architectural work directly from C. We find that engineers who are expert at signal processing can identify the correct partitioning of algorithms into hardware and software from that level. In some cases, this gives us better results than working with more-detailed implementations of the algorithm-especially if the engineer is more familiar with C than with the application under investigation."
A dialect of C can also be useful in modeling the movement of data through chips, as QLogic has found. "We use SystemC for architectural modeling of data flows through switch chips," said Tom Paulson, principal engineer there. "Working at a more-abstract level than individual cycles, we can see the impact of critical things like steering algorithms and buffer sizes on the overall chip. We see basically the same architectural effects as we would with an RTL model, but with greater execution speed."
Beyond exploration
But investigating algorithms is far from the end of the story. Once an algorithm is represented in C, a number of other valuable possibilities present themselves. Partitioning into hardware and software can be very straightforward. It may be possible, as McCain suggested, to infer directly from the C code the characteristics of the final hardware. And the C code-which is, in effect, a fast-executing behavioral model of the block-may be useful directly, or with some modification, in the verification process.
This last point is important in eInfoChips' design flow. "We use the C model from architectural exploration as the golden behavioral model for verification," Patel said.
In fact, C is so efficient at expressing the behavior of digital blocks-at least from a programmer's point of view-that it plays a key role in some verification flows even when it's not used at the architectural level.
"We just don't do a lot of behavioral modeling on the front end," said David Crohn, director of engineering at Broadcom. "Most of our designs represent incremental changes to existing designs, so most of the IP [intellectual property] is already available in RTL. We can drop it into an emulation system and do high-speed modeling of the block's behavior directly from the RTL."
But C does play a central role in testbench development for these blocks. "Our designers typically develop their testbenches in C and then pass those into the Verilog verification environment," said Russell Vreeland, senior principal engineer at Broadcom. "As the block is reused and evolves, designers make changes to the testbench as well. There is a very disciplined revision-control environment here."
QLogic also writes data generators and checkers for its SystemC models. But its engineers do not use the native constructs in SystemC to do this. The appeal of SystemC in the first place, Paulson suggested, was that design team members could lever their existing knowledge of C to quickly get reliable models of elapsed time and concurrency in complex operations, without having to learn the whole SystemC language (or find a new implementation of the language for each extension they tried). So it makes perfect sense to write constructs in C that are native to SystemC but would have to be learned in the new language.
System estimation
Once one has a functional model, or even a working embodiment of an algorithm, in C, it is tempting to attempt to estimate from the C code something about the characteristics of the completed chip. It is also tempting to dismiss this thought out of hand. After all, the structure of the C code has no necessary connection to the structure of the final hardware, so on what basis would one make predictions?
Certain predictions, though, can be made almost without the use of tools. A well-crafted C model can already begin to reveal the structural partitioning of the final chip. From this partitioning, a clear picture can emerge of the number of blocks, their inputs, outputs and interconnections, and, less obviously, their spatial relationship to each other.
But is it possible to infer more-detailed information about the blocks themselves? Some designers say yes, at least in some cases. Nokia's McCain, for instance, uses Mentor's Catapult C not so much as a synthesis tool, but as a design estimation tool. "The tool output for us is usually just the area and speed report on the synthesized blocks," McCain said. "We have found that these estimates match very well to the best synthesis results we will eventually get from our RTL designs of the same blocks. That allows us to explore the design space with an eye on silicon implications, with RTL conversion and synthesis removed from the loop. That way we get each individual estimate much faster, so we can explore further."
But getting accurate design estimates has its price. The C code has to be written and structured in a way that makes sense for the design-and to the synthesis or estimation tools. And the approach won't work for all types of blocks.
"We have found this technique is definitely better for data flow-centric blocks," McCain said. "Algorithms like FFTs and matrix operations tend to give good estimates. We consciously choose to transfer our algorithms from Matlab to an algorithmic subset of the language."
Writing in a subset of C turned out to be a good deal easier than adopting a different version of the language-like SystemC-that would require engineers to learn a new dialect, McCain said.
Even within the bounds of normal C, there are things that shouldn't be written. There are, for instance, known problems with memory allocation in the C code. "The process requires intelligent partitioning to give best results," McCain said.
Ideally, one could write out the algorithms and verify them in a consistent subset of C, and then move that code directly into the verification process to serve as a reference behavioral model. Ideally, one could even synthesize the C directly to a netlist and bypass RTL altogether. In fact, both of these steps are possible, sometimes. In using C models in the verification process, the key word is "behavioral." Since the C code is likely to be without any sense of timing or simultaneity, it will be necessary to have some rock-solid definition of the end of a process, to know when it is meaningful to compare the timing-centric simulation or emulation results against the output of the C model that may be at best transaction-based.
One way around this problem is to impose some sort of timing information on the C model. This may be done by creating a cycle-tracking mechanism in the C code, organizing functions into groups that will occur simultaneously in the hardware and updating the cycle counter each time the group of functions has been traversed. But this adds considerable complexity to the C programming task, pulling it into the much more specialized realm of hardware modeling. Again, the results are more likely to be useful on data paths than in control structures.
If using C models in the verification process is valuable, many design mangers see the idea of direct behavioral synthesis as even more attractive. Or, if that is out of reach, simply converting C into synthesizable RTL would be a big help. Most teams today still appear to convert C models to RTL by hand. "We continue to evaluate C-to-RTL conversion tools," said eInfoChips' Patel. "We look at the quality of the results, the amount of hand-coding necessary, things like that. There is a lot of demand for a good conversion tool."
For instance, he said, "in video or imaging applications, you can model the algorithms in C, but you know the final hardware will have to use some hardware acceleration in order to meet its requirements. It would be good to be able to simply partition the C code to separate out the blocks that will require acceleration, and then convert those blocks directly to C."
But many engineers believe the existing tools aren't quite ready for prime time. "We've kicked the tires on C-to-RTL conversion, but the quality of the tools is just not there," said Broadcom's Crohn. "Power, area and performance are all critical to us, and we would take a significant hit on those design metrics if we relied on automatic conversion. When we do a design, we are looking for the best possible chip, not for the easiest design."
There are, however, teams that use C synthesis successfully. Usually they tend to be doing primarily data paths, are often targeting FPGAs rather than ASICs or COT designs, and have learned to be very careful about the C code with which they start.
Frank Mayer, group manager for digital IC design at Fraunhofer Institute, said that his group routinely synthesizes C code for use in FPGAs, and occasionally for ASICs. The primary application is to move an algorithm that has been investigated and tuned in C onto an FPGA platform for extensive hardware emulation. Fraunhofer uses this approach primarily for signal-processing blocks. "In principle the tool also works for control logic," Mayer said. "But it tends to produce rather high overhead, and debugging becomes quite difficult. So we synthesize the signal-processing blocks individually and then tie them together with handwritten RTL to provide the control, monitoring and debug links."
Such a flow requires partitioning with forethought. If the C code has been written by a hardware-skilled engineer specifically for synthesis, Mayer said, the process is extremely straightforward, and gives consistently first-time-right results. If the code was written by programmers for exploration purposes, "it can require some modifications to the C," he said.
Right flavor
For instance, the synthesis tool does not care for the dynamic memory-management techniques or pointer arithmetic so beloved of C programmers. In fact, it does not take to any kind of unconstrained pointer. "Normally this just requires a walk-through of the code, changing some standard programming techniques to things that we know are more workable," Mayer said. "Recently we did this for a student-written MP3 decoder. It only took about a day to bring the fixed-point libraries and the code into our environment."
The importance of starting with the right flavor of C code cannot be overemphasized. Indeed, some teams start with RTL and work backward to C, a tactic that nearly guarantees the C code used in system modeling will be constrained to the realities of the RTL. In designs where much of the IP already exists as verified RTL, it may make sense to generate C-level models from the RTL for chip-level behavioral modeling. The time spent on conversion may be more than made up by the time saved in exploring the chip-level system in C rather than Verilog.
At least one vendor, Tenison Design Automation, provides a tool that can automatically generate cycle-accurate C models from RTL. In this case, the tool completely flattens the RTL during the conversion process and inserts cycle-tracking code into the package. This makes automatic synthesis impractical, so any results found during system exploration that require changes to the underlying RTL blocks will necessitate some careful revision management. And there is a dearth of RTL-vs.-C checking tools; Tenison had to create its own.
Clearly, C in hardware design is here to stay. The early stages of architectural and algorithmic design would be unthinkable without it today. Yet moving from C to RTL or to netlist-level data is much more problematic. It can work with reasonable results for signal-processing blocks, and it can make good sense for FPGA implementation, where, as Fraunhofer's Mayer put it, "a difference of 10 percent in size is not relevant." But whether C conversion can eventually go beyond this level is still an unanswered question.
The first question to ask is why C should be such a ubiquitous tool in hardware design in the first place. It is certainly not suited to the task, either by history or by nature. Originally, as any septuagenarian programmer knows, C came from the same Bell Labs culture that produced, more or less concurrently, Unix. The hardware available to engineers at Bell Labs in those days was primarily PDP-11s. And in fact, C was originally conceived as a means of generating PDP-11 assembly code quickly.
This is not to confuse C with high-level languages (HLLs), which were actually quite advanced by that time. The direction of HLL work in those days was to find formalisms in which algorithms could be expressed precisely and concisely, eliminating by grammatical rules and by construction the most common errors in computer programming.
The result of this work ranged from languages that emphasized elegance and concision at the expense of all else, such as APL, to those that emphasized clarity and correctness at the expense of elegance. Among the latter category were Algol, Pascal and PL-1.
In a hurry
-C attempted nothing of the sort. It was intended as a compromise-an attempt to sharply reduce coding time while maintaining, as much as possible, the clever tricks that a generation of assembly language PDP-11 programmers had learned. Time proved that approach to be a brilliant compromise. The point that the Bell Labs folks understood, but that mainstream language developers missed, was that programmers didn't really care about clarity, correctness or even, in most cases, elegance. They wanted to be cute, and they were in a hurry.
So what does this have to do with hardware? Two coincidences. First, a coincidence of birth: In an abstract sense, both modern RTL design and PDP-11 assembly language are register-transfer languages. There are huge differences-one describing how data is transformed as it flows between registers under the control of state machines, and the other describing how data is taken from registers, transformed and replaced under control of a sequential program. But fundamentally the two are describing similar formalisms.
The second coincidence involved the decision of Bell Labs to grant to the academic community very easy license access to Unix. At a time when PDP-11s were inexpensive platforms but DOS-11 and VAX VMS were expensive production operating systems, this caused Unix-and with it, C-to spread like an airborne virus throughout the academic community in the United States. The result: Fortran and Basic died, and every engineering student in North America learned C.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related News
- Efabless Announces the Release of the OpenLane 2 Development Platform, Transforming Custom Silicon Design Flows
- TSMC and Cadence Collaborate to Deliver AI-Driven Advanced-Node Design Flows, Silicon-Proven IP and 3D-IC Solutions
- Saankhya Labs receives approval under Semiconductor Design Linked Incentive (DLI) scheme for Development of a System-on-Chip (SoC) for 5G Telecom infrastructure equipment
- TranSwitch Licenses RaSer Serial Link Technology from Rambus Inc.
Latest News
- BrainChip Provides Low-Power Neuromorphic Processing for Quantum Ventura’s Cyberthreat Intelligence Tool
- Ultra Accelerator Link Consortium (UALink) Welcomes Alibaba, Apple and Synopsys to Board of Directors
- CAST to Enter the Post-Quantum Cryptography Era with New KiviPQC-KEM IP Core
- InPsytech Announces Finalization of UCIe IP Design, Driving Breakthroughs in High-Speed Transmission Technology
- Arm Announces Appointment of Eric Hayes as Executive Vice President, Operations