On-chip instrumentation aids OCP debugging
Neal Stollon
(09/26/2005 9:00 AM EDT)
Leading embedded processor and intellectual property (IP) developers have adopted the Open Chip Protocol (OCP) socket as a standards-based approach to on-chip bus integration. An emerging capability for OCP design is On-Chip Instrumentation (OCI) for analysis and debugging.
OCP-based buses allow a range of high bandwidth implementations, and define a number of features and capabilities in addition to baseline data transfer. These features include the extensions for special bus command modes, burst operations and multiple data tags, and threads that increase the number of traced signals.
In addition to bus interface signals that address the basic address and data operations, OCP provides a range of optional signals that address specialized processor specific performance enhancement features such as data alignment, bursting, and multi-thread operations. All these options are defined at the core to bus interface “socket,” and are largely independent of specifics of the bus fabric itself.
But OCP does present additional considerations in coordinating the operation of more complex implementations. Analysis considerations include the multi-cycle operations specific to a given OCP interface, and more global issues like on-chip bus subsystem performance of shared interfaces and peripherals. These require analyzing and optimizing bus parameters like transmission efficiency, latency, saturation, resource conflicts, and other operational considerations, all of which can have a direct impact on the performance and operation of the processor components and overall system operation.
Simulation is always an important part of the development flow, but it has limitations, especially for system analysis. In many cases, it is just as important to be able to analyze the hardware itself during prototyping and system verification, and on the final products themselves.
But analyzing information at the embedded bus level in hardware devolves to a visibility problem — it is difficult to fix what you cannot see. Visibility cannot be adequately addressed by traditional on-chip test methods such as JTAG scan for several reasons:
- Since bus operations are multi-cycle, with some signals in a bus cycle becoming active at different times, debug should take the form of sequential trace, rather than as a single-cycle snapshot.
- Bus operation problems are interrelated with the operations of at least two communicating blocks (a processor and memory peripheral, as an example). Traditional debug methods such a halting part of a system for test can introduce changes and new variables that interfere with the test scenario and process.
- If problems are intermittent or sparse, trace operations require operation in a triggered mode, so that information for bus cycles of interest is captured in real-time.
On-Chip Instrumentation (OCI) is one approach to embedded OCP debug that allows real time visibility of actual silicon, either in FPGA prototypes or in ASIC products. OCI, in essence, is an IP-subsystem dedicated to efficiently tracing embedded signals, and application specific OCI blocks can be added to various points in a design to improve visibility into sub-system interfaces and operations.
OCI is widely used for debug and run control of embedded processors and for logic analysis traces to add crucial visibility to embedded designs. Many OCI blocks typically operate via a JTAG port configured as a dedicated debug port. OCI solutions that address the SoC specific issues of on-chip bus analysis, and are able to provide sufficient debug visibility of non-observable bus interfaces, have only recently started to be addressed.
Figure 1 — A general OCP based multi-core debug architecture
Figure 1 shows a generic multi-core architecture (dual processors, memory interfaces, and custom IP using an OCP bus). OCI blocks and interfaces provide visibility and control to make this subsystem simpler and more effective to debug. These include:
- Processor debug blocks specifically designed for debug control and support of specific processors or other cores.
- Logic analyzer blocks for general-purpose trace and analysis of user defined signals.
- JTAG interface that controls instrumentation configured in a JTAG chain. JTAG is the most widely used debug interface since it is included on most chips to address boundary scan and test, and can be easily modified for instrumentation interfaces.
The bus analyzer trace instrumentation in Figure 1 is shown in detail, since that is referred to in more detail in following section. In all cases discussed, debug information in a particular trace is buffered on chip to allow a serially scanned JTAG interface to export it for display and analysis. The size of the RAM allocated for debug instruments is typically the dominant factor in size of a debug solution, and is a tradeoff with the amount of trace depth and granularity (number of signals included) for a debug solution.
Processor debug
Instrumentation blocks for processor cores provide processor specific run control, monitoring of hardware and software breakpoints for triggering, and real-time trace of instruction and data. Processor run control (start and stop based on software and hardware breakpoints) and single-stepping are features that provide debug control, but not much visibility.
To address this, some processors also provide some real-time trace, to allow capture of cycle by cycle instruction information triggered on instruction execution, memory, I/O operations, address range, or op code value. Since instructions have a well defined flow, execution trace can be compressed for storage or transmission efficiency and later expanded for integration with code debugger tools, using techniques such as branch trace messaging, which focuses on instruction flow discontinuities. Most processor debug instrumentation is controlled via JTAG.
Logic and bus navigator instrumentation
In most designs, processors are only one of several subsystems that should be included in any systems analysis. Customized logic or IP in many designs includes co-processors or accelerators for specific applications, memory controllers, peripherals and a host of other functions.
Logic Analyzer instrumentation blocks are widely used, especially in FPGA design. Logic Analyzer IP monitors and traces user-defined signals selected prior to synthesis. Logic trace is controlled on-chip by combinatorial or sequential triggers. Typically the more complex the triggering resources, the slower the trigger speed supported.
Bus level instrumentation is an application specific version of logic analysis instrumentation, with additional bus-specific inline and post processing of the bus and protocol information. Bus analysis typically takes one of two forms — signals of interest are traced at the interface (OCP Socket), or are traced from within the bus fabric (as example, the OCP based Sonics Silicon Backplane fabrics).
OCP bus analyzers must support parameterized features such as data word size and configurable numbers of sideband and optional signals for a given design. Cross trigger interfaces to the other debug blocks or processors are often used for low latency triggering of bus trace start and stop or other debug operations. In this discussion, we consider the simpler case of OCP socket level debug of traced signals from each bus master being routed and multiplexed to the bus navigator IP.
Multiplexed trace (shown in example in Figure 1) can be used to trace information from one socket at a time, or selected signals (all the control signals as an example) from all sockets as supported by the bus trace resources for a given application. Other bus trace and analysis issues that relate to trace display and efficiency involve two types of inline processing — cycle alignment and cycle dropping.
Since bus operations are pipelined, information related to a given operation may be distributed over several cycles, where bus elements exchange commands and information. For OCP buses, this is typically a minimum of command, response, and a handshake cycle.
Some trace approaches synchronize and align these portions of a given bus cycle to display as a single cycle. This “Bus Mode” trace provides a view that is cycle aligned with an operation, which is how software perceives bus operations and which allows more intuitive triggering and trace display for software analysis.
Tracing every bus operation on a cycle by cycle basis is not very efficient due to the latencies involved in bus operations. Many cycles are idle or “not ready,” and use up the trace memory without providing useful information.
One approach to conserve trace RAM resources and keep the trace more readable is to include triggering instrumentation for dropping of idle and “not ready” cycles from bus traces. Often some time stamping is added to the trace to maintain synchronization of the trace.
An example showing trace synchronization and idle cycle dropping, along with bus trace mode for a simple command and response case, is illustrated in Figure 2.
Figure 2 — OCP bus mode trace alignment
Other debug considerations
Properly implemented on-chip instrumentation can compliment the testability, maintainability and analysis throughout the lifecycle of a chip design. Implementing on chip debug requires an understanding of how debug tools are planned to be used in hardware verification as well as the considerations for integrating instrumentation solutions into a design. Some features that users should consider follow.
Flexible on-chip triggering, trace, performance analysis — When there is a lot of data passing through an OCP based bus, it is important to get access to signals you need, but only when they are doing what you care about. One lower cost approach is to allow on-chip performance analysis to monitor and send only summary information. As an example, it is overkill to trace data every time a bus is saturated, if you only care about the relative amount of times it was saturated. The former requires detailed trace; the latter can be addressed with a relatively small performance analysis block.
Interoperability and cross triggering for system debug — Few systems work in isolation, so neither should a debug solution. Bus and processor operations and performance are frequently correlated, interdependent or synchronized in systems. A system level debug solution should include instrumentation blocks supporting the entire system of interest.
Integration with other debugger/verification tools — Processor software tool chain support should be able to access processor instruction trace and correlate it to source code for simpler and more intuitive analysis. Similarly, logic and bus analysis is starting to find closure with EDA tools on methods of importing trace into verification tools for comparison of actual versus simulated logic information.
Keeping pace with internal signal speeds and maintaining reasonable gate size — Debug is seldom a reason for additional timing closure issues or pushing to the next size die or package. Debug instrumentation can be small (in some cases a few thousand gates or less) or very large, especially if there are a lot of complex triggering operations or very large defined trace. Typically interfaces are JTAG based, so impact to chip I/O is minimal. OCI speed varies with overall features, but is typically designed with low gate delays to run as fast as everything else in your system.
Configurable to system needs — Extensive debug is not needed if only a limited amount of the design requires visibility. Likewise you don’t want too limited a solution if you are debugging big problems. Debug solutions should be able scale to the need at different parts of a development cycle — larger on prototypes or emulators, smaller on production releases — allowing the same features and interfaces, with less capability or options, to be used across the lifecycle of the product.
Figure 3 — Instrumentation control infrastructure
On chip instrumentation is not just IP, and is only as useful as the ability to interpret the information provided. OCI is typically a paired IP and software solution, and requires a certain features for effective use.
Most instrumentation includes specific drivers and APIs that format and communicate information between the JTAG port and hardware instrumentation and trace and control GUI and interfaces. Standards based drivers and APIs have obvious advantages in terms of flexibility, customization and integration, and in many cases, ease of use. Some industry standards that are relevant to the instrumentation tools flow include:
- Tcl/Tk for vendor neutral interfaces and simpler data transport
- MDI for interfaces with a wide range of debuggers
- XML as a mechanism for IP and triggering information
- Eclipse as a fully featured vendor neutral platform
In conclusion, OCP was developed as a bus architecture for managing the design of complex SoC devices. On chip instrumentation and debug analysis is undergoing an evolution to support these more complex chips, in which new types of on-chip instruments address issues like bus level debug. The ability to integrate debug of processors and IP with OCP buses is an example of the systems oriented analysis capabilities that will routinely be required for leading edge platforms.Dr. Neal Stollon (neals@fs2.com) is Director of Technical Marketing and Program Manager of systems level products for First Silicon Solutions. He has over 20 years digital design, EDA and processor development experience at Texas Instruments, LSI Logic, Alcatel, and other companies.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- Design and Implementation of an OCP-IP Compliant 64-Node Butterfly Network on Chip on Multi-FPGA
- OCP SoC instrumentation solutions involve more than just trace
- Modelling OCP Interfaces in SystemC: Standards built on top of OSCI's TLM-2
- Survey of Chip Designers on the Value of Formal Verification Across the Spectrum of Applications
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience