SoC silicon is first-time success following simulation and validation of novel array processor
SoC silicon is first-time success following simulation and validation of novel array processor
By Luc Martel and Denny Wong, EE Times
January 17, 2003 (3:02 p.m. EST)
URL: http://www.eetimes.com/story/OEG20030117S0036
Faced with the prospect of developing custom logic for a new, highly parallel processor architecture for a multimedia processor system on chip (SoC), an R&D team had to implement a strategy to accurately model, design and develop the chip and implement efficient algorithms. By using an ASIC design flow for the IC development and early development of an accurate software simulation, the team succeeded by having first pass silicon working with an application running in software.
The SoC development undertaken required the custom design of an array processor (AP) optimized for handling video processing and compression algorithms. It also required the development of optimized algorithms to execute on the array processor. This had the potential to create a "chicken before the egg" situation - discovering later in the development that algorithms require a fundamental change in the hardware architecture or the implementation of the hardware require s a change in the structure of algorithms.
To avoid this problem, a simulator and a set of software development tools were developed at the beginning of the project. These were critical to the timely success of the system development involving the parallel design of the SoC and algorithms. The simulator provided a platform for tying together the design of the array processor and algorithms, co-development of the software and SoC, and finally tying together a system simulation of the IC and algorithms that could be tested on an FPGA platform.
The array processor core development involved the design of custom logic for computational units tightly coupled with memory and a custom array controller. The SoC development involved the integration of this custom core with other hard cores such as a RISC processor and peripherals. To give as much flexibility as possible in the selection of physical back-end design tools and service providers and to minimize uncertainty in the development schedule, the decision was taken to fit the IC design flow as close as possible to an ASIC flow, characterizing the fully-custom logic that was developed so it could be incorporated as if it were standard IP. This made it possible to bring the custom logic into the synthesis process and to use static timing analysis to detect any timing issues related to the interaction between custom logic and synthesized RTL. An accurate representation of the custom logic was later placed with other synthesizable RTL on a FPGA for cycle-accurate testing and system testing with software.
An instruction-accurate simulated version of the array processor was implemented in software. This simulator was precise enough to accurately profile the AP algorithms and to model the performance expected from the chip.
The simulator was designed with the necessary flexibility to represent the components of the AP design as parameters. Doing so made it possible to change design parameters such as the number of registers per computation unit, to adjust the pipelining depth and to change the number and the arrangement of the computation units. This made it easier to design the processor to optimally fit the software algorithm requirements and to profile the performance of algorithms.
The software simulator was also the tool used to develop and refine the processor's instruction set. It was designed to make it possible to include new instructions programmed by an algorithm designer. The instruction set was then defined to be flexible enough to allow the programming of any kind of algorithm. It was also designed to optimize critical parts of the software affecting performance and efficiency by implementing specialized instructions to perform very specific tasks. Some critical parts of video encoding algorithms were then programmed at the microcode level to maximize performance. This was done without continuously affecting the hardware design since it was possible for the designer to test the validity of the idea in simulation before proposing hardware changes.
The simulation also provided a platform to develop and debug algorithms before the actual hardware was ready. A debugger was developed to work with the simulator to allow developers to step through code and provide debugging capabilities. For validation, test cases that could be executed on an FPGA platform were also generated using the software simulator.
To speed up the algorithm development process, the R&D team developed its own programming language, a compiler and a linker. Standard assemblers are not rich enough to provide parallel abstraction. So instead of giving the algorithm designer the responsibility of managing these concepts, they were embedded into the compiler and a set of rules for the designer were implicitly included inside the language - hence abstracting the parallelism from the algorithm designer. This greatly increased the pace of development at the expens e of the relatively small initial effort designing, implementing and testing the compiler.
The RISC API was also implemented in simulation and the RISC code was developed and tested using a standard C/C++ compiler before the hardware was developed. In this model, the array processor was linked as a static library and commands called through API calls. This was done to test the data flow. In the final FPGA validation effort, there were only small synchronization issues between RISC and array processor and the R&D team was able to identify them quickly and resolve them.
Developing an instruction-accurate simulator rather than cycle-accurate simulations up-front saved effort and risk early in co-designing software and hardware. This also saved time later in the development, performing cycle-accurate testing on a FPGA platform. Finally, the simulator saved significant effort in getting application software to run on the SoC target by identifying and eliminating system issues early on and duri ng development.
Luc Martel and Denny Wong are senior engineers at Atsana Semiconductor Corp.(Ottawa, Canada)
Related Semiconductor IP
- AES GCM IP Core
- High Speed Ethernet Quad 10G to 100G PCS
- High Speed Ethernet Gen-2 Quad 100G PCS IP
- High Speed Ethernet 4/2/1-Lane 100G PCS
- High Speed Ethernet 2/4/8-Lane 200G/400G PCS
Related White Papers
- When Developing New Silicon IP, Is First Pass Success Possible?
- Using HW emulators to get HW/SW right the first time on the Sun UltraSPARC T1 processor
- Reduce SoC verification time through reuse in pre-silicon validation
- Reduce ATPG Simulation Failure Debug Time by Understanding and Editing SPF
Latest White Papers
- New Realities Demand a New Approach to System Verification and Validation
- How silicon and circuit optimizations help FPGAs offer lower size, power and cost in video bridging applications
- Sustainable Hardware Specialization
- PCIe IP With Enhanced Security For The Automotive Market
- Top 5 Reasons why CPU is the Best Processor for AI Inference