OpenAccess: first impressions at AMD
Ward Vercruysse, Yuri Apanovich, Weiqing Guo
(09/05/2005 9:00 AM EDT)
Design teams are being squeezed from all sides. On the supply side, relentless process scaling enables bigger and more complex designs with multiple cores, wider and faster caches, deeper pipelines and smarter branch predictors.
But this progress comes at a cost. Nanometer manufacturing technologies are more susceptible to process variations with less ideal transistors and more complex metal stacks. Furthermore, critical design resources, such as timing, power and area, are now scarce; design margins continually shrink; and trade-offs involve more design variables. Refinements in manufacturing technology come more frequently, but linear scaling is increasingly less effective. Hence, substantial redesign is required more often.
Meanwhile, on the demand side, the market expects a continuation of Moore’s Law with smaller, faster and cheaper products available in more compressed delivery cycles. Market segments proliferate, demanding targeted solutions for each segment, and market demands change more rapidly.
In the processor business, for example, one has to meet the needs of high performance servers as well as thin and light mobile notebooks. In short, AMD’s design teams need to deliver more variants of more complex processors on less predictable silicon, on shorter notice, and in less time.
To illustrate the complexity of our current designs, here are some statistics of our core processor. The Opteron, shown in Figure 1, comprises over 200 million transistors. The physical representation of the 9 layers of metal requires 2 billion polygons. It takes 10 terabytes to create one chip revision, and tens of billions of simulation cycles and hundreds of thousands of useful CPU days per month to complete verification of a single version of the design.
The lure of OpenAccess
Addressing these design challenges requires highly flexible and efficient design environments. For the foreseeable future, our design environment will continue to be a mix of internal and commercial tools. Consequently, we prefer to work with industry standards, and we believe OpenAccess has critical mass and is an emerging standard.
The benefits of OpenAccess as an electronic design automation infrastructure have been well documented.1,2 As a database with an application programming interface (API), it eliminates the information loss that can occur with file translation and improves interoperability between tools. Regardless of whether or not OpenAccess becomes a formal industry standard, OpenAccess is attractive as a database and a data model for three reasons.
First, as designs become larger, we do want a database to meet today’s design challenges. Besides the opportunity it offers for better versioning and better multi-user and multi-site support, a database has become necessary to meet one of our key design principles: the cycle time for evaluating a design change should be proportional to the size of the change rather than the size of the design.
With ASCII files, there is really no good way to partition all of the design information. So we are left with two bad choices — opening and scanning large files to extract or change only a tiny bit of design information, or managing and opening a huge number of small files.
Secondly, because of deep submicron effects, combined with the need for more aggressive design margins, most tools need a broader variety of information. For instance, the IR tool can use logic and temporal relationships between signals to reduce IR drop pessimism, and the timing tool can use the results of IR drop analysis to reduce timing pessimism. Without a comprehensive data model, as we found out the hard way, it is easy to spend a lot of time, over and over again, re-assembling design views from various sources and dealing with name mappings and tool idiosyncrasies.
Thirdly, OpenAccess is attractive as an open source standard. It allows us to leverage technology developed by a wide variety of technical partners, without forcing us to relinquish control of our destiny. We consider design methodology to be a core competence.
First steps
Before we seriously engaged in an evaluation of OpenAccess, we verified that a few necessary conditions were met. The Si2 license agreement was agreeable, and AMD became a member of Si2. The installation of OpenAccess 2.2 went as smoothly as could be expected. Initial impressions of capacity and performance were positive, and the data model and its extensibility looked promising.
Digging deeper
For our experiment, we wanted to build a real tool that we could deploy and gain immediate benefits. The initial goal was to measure capacity, performance, quality and completeness of the data model. Equally important, we wanted to get a better insight into the cost of achieving the benefits described above. In short, we wanted to answer these questions:
- Once you are an OpenAccess expert, does this infrastructure help you to build a tool faster?
- How much performance and memory overhead is there in using more generalized design data structures tuned for persistent storage versus data structures tuned for the specific problem at hand?
- And lastly, how good is OpenAccess at dealing with incomplete and inconsistent design data? This is important because design data is typically incomplete and inconsistent during design iterations, yet tools need information from various local and global sources that usually generate this data at different times during the process. Good CAD tools can deal with imperfect data.
An experiment
We decided to write an OpenAccess replacement for our resistance-capacitance (RC) parasitic network stitcher. This tool reads parasitic information extracted individually from abutted blocks and generates minimal subnetworks to enable accurate noise analysis of all nets spanning one or more blocks. During analysis, it has to handle feedthroughs, and it must cope with imperfect data because extracted information might be older than the current logic netlist.
The existing flow is shown in Figure 2. The Classic Stitcher reads interconnect parasitics from Standard Parasitic Exchange Format (SPEF) files produced by a third-party parasitic extractor. It also reads the logic netlist and layout information to find matching pins.
It writes the subnetworks in a netlist format that the internally developed noise analysis tool accepts. The current tool was implemented in Perl and was carefully designed for capacity and speed; it pre-processes the SPEF files and builds indexes with a minimum amount of data in memory.
The OpenAccess tool is not a direct replacement. We wanted to build the new tool to fit the OpenAccess paradigm. So we built a more general RC parasitic toolkit in C++. The new flow is shown in Figure 3.
The OA-based Stitcher accesses logical connectivity, hierarchy, layout geometry and parasitic information from the OpenAccess database through an API. It can traverse any net through the different hierarchical levels of a design, and can query the database to find the driver and receivers as well as all the nets that couple to it.
It builds the coupled RC parasitic network associated with that net in OpenAccess, but currently does not store it persistently in the database. An export module translates the result into the format our downstream tools expect.
Our goal is to evolve to the flow shown in Figure 4, where the parasitic extractor will populate an OpenAccess database instead of writing out a SPEF file, and the noise analysis tool will read the stitched RC networks from OpenAccess as well.
One of the benefits of OpenAccess is that it maintains concurrent views of a design’s logical structure (the module domain), unfolded hierarchy (the occurrence domain) and physical implementation (the block domain). This enables applications to analyze flat data and tie this easily to the module hierarchy.
In contrast, a SPEF file contains only parasitic information. So, to infer hierarchy, for example, our application needs to parse net names and search for hierarchical delimiters, and then map that back to the hierarchical netlist description.
Results
The dataset used for the benchmark comprised compressed SPEF files representing 33 blocks from the 90nm Opteron. The files were about 5 GBytes in size and described the RC parasitics of 1.8 million logical nets.
In brief, the project went well. One point worth noting was the issue of selecting the "right" OpenAccess version, trading off new features in advanced releases versus robustness of more mature production releases. In the end, we needed to switch versions to work around a SPEF parser bug. While we applaud progress in the form of newer and better versions, quality of releases should be consistent.
On the positive side, it turned out that loading the OpenAccess database went faster than expected: DEF information loaded at 13MB/minute, while SPEF information loaded at 17MB/minute.
The following table summarizes the comparison between the two stitchers.
Although lines of code are not a very good measure, the result was largely what we expected. C++ is a bit verbose, while conversely, many Perl programmers tend to pack a great deal in a single line. The time-to-beta-quality was a bit longer than expected. We thought that starting with the OpenAccess data model and not having to deal with parsing ASCII files would be a clear advantage, but it wasn’t.
The memory usage, shown in Figure 5, was higher than we had expected, but again well within bounds. In the benchmark, all blocks ran within the available 4GB physical memory, so the difference is not significant to us as long as it does not affect run time. Moreover, the OA-based Stitcher has a fixed 2GB overhead, but incremental memory consumption rises more slowly than our Classic Stitcher.
Disk usage requirements reached expected levels for the OpenAccess-based generalized data structures compared to the Classic Stitcher’s direct access approach. But the difference is within a reasonable range, especially since this larger design representation does not seem to degrade performance.
Performance did surprise us. While our Classic Stitcher took nine hours to complete all blocks, the OA-based Stitcher took only three hours. The on-demand loading of data, a major feature of OpenAccess, must be working very well.
Figure 6 shows that the OA-based Stitcher is always at least 2.5 times faster than the Classic Stitcher. Even more important, the performance gap grows with the size of the block. This is much better than we expected.
Final thoughts
OpenAccess is here, and it works.
This was only one experiment, and more are needed before we make the leap, but this was an important one. We were able to build a better and faster tool in the same amount of time, and there was only a little memory and disk overhead to pay. Moreover, the overhead was mostly a fixed cost. OpenAccess seems to scale very well.
We plan to expand the RC toolkit and perform more rigorous analysis to understand how OpenAccess behaves. We will stretch the tool’s capabilities and find out how hard it is to accommodate more incomplete and inconsistent data.
So far, so good.
References
1. Scott Makinen, Alva Barney, Rick Ferreri, Jim Wilmore, “How Infineon implemented OpenAccess.”
Ward Vercruysse is an AMD Fellow who manages the CAD team of AMD's microprocessor design unit in Sunnyvale, Calif. Yuri Apanovich and Weiqing Guo are members of technical staff at AMD.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- Inside the Xilinx Kintex-7 FPGA: A closer look at the first FPGA to use HKMG technology
- Design planning for large SoC implementation at 40nm: Guaranteeing predictable schedule and first-pass silicon success
- Reconfiguring Design -> Reconfigurable computing aims at signal processing
- Novel advances for embedded memory emerge at CICC
Latest White Papers
- Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard