Virtual system prototypes speed multiprocessor design

Graham Hellestrand
(05/30/2005 9:00 AM EDT)

  By today's standards, early microprocessor-based systems were simple, not least because they typically employed only a single processor (possibly with a few co-processors, such as a floating-point co-processor) with a relatively simple instruction set running at low clock frequency. This processor communicated with a small number of comparatively simple memory and peripheral devices by means of a single 8-bit or 16-bit data bus with a simple read/write and signaling protocol.

Those days have long gone. There is currently a tremendous growth in the development of systems that involve tens or hundreds of complex processors and hardware accelerators in closely coupled or networked topologies. In addition to tiered memory structures and multi-layer bus structures, these supersystems — which may be executing hundreds of millions to tens of billions of instructions per second — feature extremely complex software components, and this software content is currently increasing almost exponentially.

Aggressive competition makes today's electronics markets extremely sensitive to time-to-market pressures. This is especially true in consumer markets such as cell phones, where the opportunity for a new product to make an impact can sometimes be as little as two to four months. However, a recent report showed that more than 50 percent of embedded system developments run late, while 20 percent either fail to meet their requirements specifications or are cancelled in their entirety.1

The problem is that, in conventional system development environments, hardware design precedes software development. This sequential process simply cannot support the development of today's supersystems. This article first introduces examples of supersystems and outlines the problems presented by increasing system size and complexity.

The concept of architecture-driven design based on the use of virtual system prototypes (VSPs) is then discussed as a potential solution. Finally, a productivity, development time, and risk comparison is made between the back-end engineering resource loading associated with the conventional environment and the front-end loading resulting from the architecture-driven, VSP-based methodology.

Today's supersystems

In some respects, the term "supersystem" may be misleading, because it may cause some readers to imagine a physically large implementation. Actually, a supersystem is often realized on a single system-on-chip (SoC) device.

For example, a modern cell phone may contain an SoC comprising several general-purpose central processing units (CPUs), and one or two digital signal processing (DSP) units, controlling 40 or more peripheral devices providing control functions, multimedia functions, 2D and 3D graphics functions, crypto functions, camera interfaces, and a variety of other interfaces such as WiFi and USB.

The DSPs with associated accelerator devices provide a variety of base band processing, filtering, modulation, and decoding functions. Having multiple cores allows a broader range of processing traffic to be handled in real-time, which is a critical requirement for many of today's applications.

Moving away from the handheld portion of the wireless network, the base stations controlling wireless communications systems are themselves a hierarchy of closely-coupled multi-processor systems. For example, a typical base station capable of executing billions of instructions per second can comprise five to 20 major subsystems and more than 100 individual processors.

In addition to multiprocessor implementations, today's supersystems employ tiered memory structures. Some of the memory elements will be tightly coupled to individual processing engines by means of dedicated busses, other memory subsystems may be local to a cluster of processing engines, and yet other memory units may be shared between multiple groups of processing engines. Each of these memory subsystems may have different speed requirements, different bus widths, and use different clock domains.

In today's supersystems, different processing engines can have separate buses for control, instructions, and data, and each of these complex buses can feature a wide variety of structures and protocols. In addition to the general-purpose processor buses, there may be a variety of dedicated peripheral buses, tightly-coupled memory buses, external memory buses, and shared memory buses.

Many of these buses will feature pipelined structures with multiple transaction requests and responses scheduled in the pipeline. The bus system also may employ sophisticated cross-bar switches that can attempt multiple read and write operations simultaneously.

Even the average modern car contains from 20 to 80 processors performing a huge range of tasks and executing several hundred million to several billion instructions per second (Figure 1).


Figure 1 — The electronics content of automobiles is growing at an ever-increasing rate2.

These automotive processing engines typically communicate through two to four distinct networks, each of which uses its own protocols with varying packet transit times, bus bandwidths, and failure tolerances.

And that's only the hardware. Meanwhile, the software content of today's supersystems is increasing at an almost exponential rate, to the extent that software development and test now tend to dominate the costs, timelines, and risks associated with these systems.

For example, a GSM phone circa 2005 may contain 2 million lines of code and is about 70% of the engineering effort. This represents a dramatic increase from only a few years ago, and the trend will continue. Software content of the typical cell phone is expected to rise tenfold to 20 million lines of code by 2007 to 2008.

Problems with conventional development environments

The software associated with today's supersystems requires a significant amount of time and resources to develop, integrate, and debug. In the case of conventional development environments, however, hardware design must be finished before software development can even begin (Figure 2a).

This means that, after the overall architecture of the system has been determined, the hardware is designed, and a hardware prototype is constructed. Only then can the operating system and middleware be installed, integrated, and tested, following which the application software is developed, ported, integrated, and debugged.

There are several problems with the conventional methodology. The interactions between the hardware, the device drivers, the operating system, the middleware, and the applications software in multiprocessor supersystems are now so complex that there is little chance of achieving a first-time working system using a sequential approach.

Instead, critical hardware-software interactions and failures will almost invariably require substantial modifications to both the software and hardware. For example, in the case of embedded systems, 40 percent of chip designs must be re-spun (re-designed, tested, and manufactured).3

Furthermore, using a conventional sequential development environment, it is not until a working system is available — 85% through the development process — that it is possible to fully determine whether the system architecture can meet its required data processing and communications performance and bandwidth goals. This is an unacceptable project risk.

Architecture-driven, VSP-based design

In order to satisfy the requirements for architecting and implementing today's supersystems, the development environment must address several major issues:

  • Since for supersystems, software is on the critical development path, it is no longer viable to wait for hardware to become available before commencing software development.
  • Both software and hardware are undergoing phenomenal increases in generational complexity, but software complexity is increasing at a higher rate.
  • Natural language specification documents comprise thousands of pages which take months to write, and this time adds to the design period. Specification documents are inadequate for expressing and communicating the requirements unambiguously for complex supersystem developments with short design cycles.

One solution is to use a virtual system prototype (VSP), which is a functionally-accurate and timing-accurate software model of the entire system. Unlike instruction set simulator (ISS)-based prototype solutions — which typically achieve sub-MIPS of performance — today's best VSP prototypes are based on simulation engines that offer both high performance and timing accuracy.

For example, a single processor VSP-based simulation can achieve anywhere between 50 to 200 MIPS, while multi-processor systems with tiered memory structures and multi-level buses can be simulated at 10 to 100 MIPS per processor, depending on the configuration.

These levels of accuracy and performance mean that VSP-based environments can support the concept of architecture-driven design. First, VSPs by nature support the rapid design of experimental systems and then facilitate architectural exploration and evaluation by supporting rapid iteration of the hardware and software that define the system. Accurate measurements of systems that model real-world behavior under real-world data processing and software workloads allow system architects to make accurate decisions as well as hardware/software tradeoffs early in the design process.

Furthermore, the ability of the VSP to run real software workloads yields invaluable information in guiding the system architects to an optimal architecture. This information includes:

  • Bus contention and bandwidth utilization
  • Processor capabilities and utilization
  • Working set match/mismatch to cache size and mapping policies
  • Latency in system response to internal and external events
  • Correlation of external, hardware, and software conditions and events
  • Transaction and algorithm throughput

Once an optimal architecture has been determined, the VSP model assumes the mantle of the executable specification. This means that the hardware design teams can use the VSP as a "golden reference model" against which they can verify the functionality of the hardware portions of the design.

But perhaps the most significant advantage of the VSP is that the software development teams can commence work as soon as the system architecture is established — six to 12 months earlier than with sequential engineering processes — and the hardware and software portions of the design can be developed concurrently (Figure 2b).


Figure 2 — High-level comparison of conventional versus architecture-driven, VSP-based supersystem design processes.

Front-end versus back-end loading

By their very nature, conventional (sequential) design environments are back-end loaded in terms of engineering resource requirements and risk. By comparison, a VSP-based, architecture-driven design methodology is front-end loaded, which serves to reduce the peak resources deployed, reduce the total resources deployed, reduce the project risk factors, and shorten the total development time

For example, consider the resource requirements, risk factors, and development times associated with conventional versus architecture-driven, VSP-based design processes for developing a multiprocessor for a 2.5G mobile phone project (Figure 3). Each graph shows plots for engineering resource deployment over the course of the project for:

  • Architecture evaluation, development, and specification
  • Hardware design and development
  • Software/firmware design and development
  • System integration (including verification and debug)
  • The overall project (summing all of the above activities)

Each vertical gridline represents a 6-week period. As is reflected by this illustration, the total development time using the conventional (sequential) process was eighteen months. This design flow consumed 800 person weeks, with a peak deployment of 146 engineers in the fifteenth month.

By comparison, the architecture-driven, VSP-based design flow — which allowed the concurrent development if the hardware and software — consumed only 520 person weeks with a peak person deployment of 97 engineers around the fifth and sixth months. Furthermore, the total development time using the new flow was only 13 months (a 5-month reduction in time-to-market, which equals a 5-month speed-up in time-to-revenue).


Figure 3 — Detailed comparison of conventional methodology versus architecture-driven, VSP-based design process for a multiprocessor 2.5G phone

One aspect of all of this that is particularly interesting comes when we consider the "risk" factors associated with different design flows. In the case of the conventional (sequential) flow, for example, it is intuitively obvious that having peak engineering resources so close to the end of the project increases the risk associated with the project.

In order to prove this mathematically, risk factors were calculated for both design flows using the square root of a function of two parameters: the overall standard deviation of the process and the inverse square of the fraction of the project still to be completed at the time of maximum resource deployment. The end result was a risk factor of 40 for the conventional flow compared to a risk factor of only 10 for the architecture-driven flow.

Thus, in summary, the fundamental outcome of designing today's supersystems using a VSP-based architecture-driven design flow is create better products with shorter development times for less money and with less risk.

Graham Hellestrand is founder, chairman and chief technology and strategy officer of Vast Systems Technology Corporation. Dr. Hellestrand is an emeritus professor of computer science at the University of New South Wales, Australia.

References

1 Embedded Software Development Issues and Challenges, by J. Krasner, Embedded Market Forecasters, 2003.

2 Design Process Changes Enabling Rapid Development, by Frank Winters, Carsten Mielenz, and Graham Hellestrand, presented at Convergence 2004, copyright by the Convergence Transportation Electronics Association.

3 Source: Collett International

×
Semiconductor IP