Making SPI-4.2 Implementations More Efficient: Part 1
Neeraj Parik, Xilinx Inc. and Prakash Bare, GDA Technologies Inc.
Sep 15, 2004 (6:00 AM)
The SPI-4.2 interface has quickly achieved the industry-wide recognition and is highly accepted as standard high-speed interface in the networking chip space. However, creating an efficient SPI-4.2 interface provides many challenges to a system design, such as buffer overflow and underflow, to a system design.
The solutions to address above mentioned concerns are conflicting and the users need make the right decision to effectively use SPI-4.2. In this two-part series, we'll look at the steps designers need to take in order to efficient develop a SPI-4.2 interface.
In Part 1, we'll provide an overview of the SPI-4.2 interface spec and then look at the data transfer mechanism, latency, and potential buffer issues. In Part 2, we'll look at issues needed to improve bandwidth utilization on a SPI-4.2 bus. We'll also look at techniques for effectively scheduling training on a SPI-4.2 link. Let's start, however, by looking at the basic elements defined in the SPI 4.2 interface.
SPI-4.2 Defined
The SPI-4.2 spec is a 16-bit interface operating at a minimum data rate of 622-Mbits/line, which yields a raw data rate of 9.952 Gbit/s. A typical SPI-4.2 interface might run at 1 Gbits/line, yielding a raw data rate of 16 Gbit/s.
In SPI-4.2, data is transferred in bursts that have been provisioned for maximum lengths, with the exception of transfers that terminate with an end-of-packet (EOP). Information associated with each transfer (port address, start/end-of-packet indication, packet abort and DIP-4 encoding) is sent through the in band 16-bit control words.
In addition to the data lines, the SPI-4.2's status channel, which can be an LVDS or LVTTL interface, sends flow control information from the receiver to the transmitter. The status information is determined using a configured FIFO status calendar. This calendar defines the order for status updates to each of SPI-4.2's 16 ports. These status updates grant credits for the transmitter to send "not more than a number of 16-byte blocks" of data for a given port. A burst ending with EOP consumes the entire 16 byte of credit.
The number of credits granted is determined by the maxburst2 and maxburst1 value(s) that the interface uses for its hungry and starving indications. The credit available for a port allows the transmitter to perform data bursts; transferring data for the corresponding ports application logic using the SPI-4.2 bus. The transmitter selects a particular port using the implementation specific arbitration scheme and initiates transfer on the SPI-4.2 bus. The length of burst transfer is determined from the credit available for a port and the amount of data available for that port in its transmit FIFO (Figure 1).
Figure 1: Diagram showing a SPI-4.2 system model.
Data Transfer Mechanism
The requirement for buffering at both ends of a SPI-4.2 line can be explained as follows. Per port stream across the SPI-4.2 is maintained through the port's data sending buffer in the transmit FIFO and data accepting buffer in receive FIFO with the associated backpressure mechanism through the status channel. The back pressure mechanism provisions regulation of data transmitted for each port on the SPI-4.2 bus.
The buffering provided in the receive and transmit FIFOs allows the processing order of user application logic to be independent of the order of data transfer on the bus. The receive FIFO also absorbs the usually observed bursty nature of data on the interface. This FIFO could also decouple the operating frequency of the user application logic from the frequency of the SPI-4.2 bus.
In SPI-4.2, the transmitter performs functions like deserialization, synchronization to the received status information, status sequence mapping to ports, evaluation of credits for each port, synchronizing credit information to the data path clock, arbitration among the currently ready ports, data fetching, and formatting. One or more of the above functions can be pipelined and or performed concurrently.
Similarly, the receiver performs the functions of data de-skew, word alignment (using training sequences), packet aligning and data de-formatting, pushing the data to port FIFO, and forming the status sequence along with required training (training is applicable to LVDS only).
Figure 2: diagram showing the SPI-4.2 data transfer model.
Introduced Latency
Effectively, all transmit and receive functions shown in Figure 2 add to latency in the flow control of SPI-4.2. Figure 3 models these latencies in the flow control of SPI-4.2. The latencies are named as data scheduling latency, data path latency, status update latency and status path latency.
Figure 3: Diagram showing a simplified SPI-4.3 flow-control model.
Data scheduling latency is the amount of time transmitter takes to initiate the SPI-4.2 transmission for a selected port once hungry or starving is sampled on the status channel for that port. The data scheduling latency observed will vary based on numerous factors like data fetch latency, arbitration latency, amount of credit available for other ports, and amount of data scheduled to be sent in transmit SPI-4.2 data pipe. Since the arbitration latency increases directly with the number of ports, data scheduling latency tends to be higher for large number of ports.
Data path latency is amount of time data takes to reach from the transmit FIFO to receive FIFO. The data popped out of transmit FIFO is formatted to form SPI-4.2 bursts, along with control words and training sequences that are serialized and transmitted over a SPI-4.2 link. The received data is deserialized, de-skewed, filtered, SOP-aligned, and then pushed to the receive FIFO.
Status update latency is the latency observed in the change of FIFO status with respect to FIFO-pop and FIFO-push. The data written to receive FIFO or read by the application logic may cause port's status to change. This typically should take one clock cycle, however some implementations may take two to three cycles. The other factor contributing to status update latency will be synchronizing the status update (if FIFO is implemented in data path clock domain, which typically is the case) to status clock domain.
Status path latency is caused by the serial nature (one port status sent per unit interval) of SPI-4.2 status channel. As explained above, the status is exchanged between the receiver and transmitter using a configured calendar sequence. The calendar sequence determines, the port number, to which the status corresponds to.
In the worst case, what can happen is that the receiver's status sourcing state machine logic might realize that it has fallen below the low watermark just after the calendar slot for that port has passed. This means that the receiver's port has to wait until its next entry in status sequence or until the next status sequence in order to send its status update to the transmitter. The status sequences are longer for large number of ports since each ports must have at least one entry in status sequence. The status path latency is more for large number of ports.
Latency Problems
The previous standard interfaces defined for Sonet/SDH systems like POS-Level 3 (PL3), provisioned low-latency backpressure mechanisms to prevent buffer overflows. One provision is the direct access of currently active port's receiver FIFO status information to the transmitter. Another provision is ability of the data receiver to stall the transfer within two clocks.
In SPI-4.2, however, there is no low-latency, back-pressure mechanism available to stall the ongoing transfer, when FIFO fills. Also there is no fast mechanism to initiate a transfer for a particular port when the port's FIFO is empty.
The above detailed latencies in the flow control have an impact on the performance of the SPI-4.2 link and design of transmit and receive buffers. Let us define a new term called total path latency, which is sum of all the latencies in the flow control model i.e. data path latency + status update latency + status path latency + data scheduling latency.
With maximum write rate and minimum read rate conditions approaching at same point of time, these latencies can cause data on the fly (i.e. data in the pipe and buffers) to build up and can potentially cause data loss due to overruns on the port FIFO. This overrun occurs even though a back off mechanism (satisfied on status channel implies no more credit for the particular port) is provided by way of the status channel. This overflow is a non-acceptable phenomenon for most networking designs.
Similarly, with maximum read rate and minimum write rate for a particular port at a particular time, the data buffered in the receive FIFO can be quickly drained off by the application logic, leading to port FIFO underflow and hence throttling the application. The underflow of port FIFO may be undesirable to certain networking designs.
Let's look at the underflow and overflow issue in more detail. We'll start with buffer overflow.
Buffer Overflow
Let's assume that a single active port occupies the entire SPI-4.2 bandwidth and is the only active port having enormous amount of data to send. The receiver FIFO is ready to receive data (reflected by its latest status update) and is having a credit of maxburst1 at the transmitter, further more read from the port's FIFO is disabled. At this time, if the port FIFO status is changed to satisfied (FIFO occupancy having reached the high watermark), port may end up receiving up to maxburst1 + data path latency equivalent amount of data. Additionally, the status updates yet to be processed due to status path latency can be maximally equated to status path latency equivalent amount of data.
The receive FIFO should have enough space to accommodate all the pending (already scheduled or to be scheduled) data transfer the particular port to avoid the potential overflows. Ideally, a receive FIFO design and back-pressure mechanism should ensure that there are no data losses due to overruns.
To solve the overflow problem, SPI-4.2 users must employ a look-ahead indication of the FIFO status. Thus, any port FIFO would indicate a satisfied status when it is data path latency + status path latency equivalent + maxburst1 transfers short of being full. This implies that the high watermark for a FIFO must be set equal to data path latency + status path latency equivalent + maxburst1. Effectively, an additional mandatory space after satisfied indication has to be provided in the port FIFO to avoid buffer overflows.
Buffer Underflow
A SPI-4.2 receive port FIFO underflows when data falls below the low watermark and receives no data from the other end through the SPI-4.2 interface, and eventually goes empty even though the transmit FIFO has data to send for that port. This happens because the transmitter has exhausted the previously granted credits before it gets the next credit update, for example status starving or hungry from the receiver. To prevent the underflow, the watermark of status indication must be set high enough so that the transmitter responds to FIFO space available indication from the receiver before the application logic drains the port data from the FIFO.
As discussed earlier, the time elapsed between the FIFO status indicating starving or hungry to get the data for that particular port is the total path latency, as define earlier is the sum of status update latency + status path latency + data scheduler latency and finally the data path latency. The first two numbers reflect the amount of time required in getting the credit information built up at the transmitter. The last two numbers, on the other hand, demonstrate the amount of time required to get the data moved across the interface from transmit FIFO to the receive FIFO over the SPI-4.2 Link.
The buffer underflow depends on the maximum read rate of the port FIFO by the application logic. To prevent underflow, software should program low watermark for each port FIFO, judiciously (i.e. large enough).
Buffer Sizing
An efficient SPI-4.2 buffer design should ideally avoid underflows and overflows. Putting the discussion above in equations, we get the following:
HWM (high watermark) = FIFO_DEPTH - (STATUS_PATH_LATENCY + DATA_PATH_LATENCY + MAXBURST1).
Assuming, read rate for any particular port can be 100% for reasonable length of time. The low water mark can be quantified as:
LWM (low watermark) = TOTAL_PATH_LATENCY.
Using the fact that, HWM is less than LWM, we get:
FIFO DEPTH > DATA_PATH_LATENCY + STATUS_PATH_LATENCY + TOTAL_PATH_LATENCY + MAXBURST.
The above calculated minimum FIFO_DEPTH avoids both underflows and overflows. If the user is interested in just avoiding the overflows then low water mark can be set to zero. This gives the following equation:
FIFO DEPTH > DATA_PATH_LATENCY + STATUS_PATH_LATENCY + MAXBURST.
Note that HWM is the satisfied threshold. LWM, on the other hand is hungry threshold (or staving threshold, if both are equal).
On to Part 2
That wraps up Part 1 in our series on improving efficiency in SPI-4.2 implementations. In Part 2, we'll look at issues needed to improve bandwidth utilization on a SPI-4.2 bus. We'll also look at techniques for effectively scheduling training on a SPI-4.2 link. To view Part 2, click here.
About the Authors
Neeraj Parik is an IP design engineer at Xilinx Inc. He can be reached neeraj.parik@xilinx.com.
Prakash Bare is the vice president of engineering in GDA Technologies' IP division. He can be reached at prakash@gdatech.com.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- Making SPI-4.2 Implementations More Efficient: Part 2
- Analog switches in D-PHY MIPI dual camera/dual display applications (Part 1 of 2)
- Guide to VHDL for embedded software developers: Part 2 - More essential commands
- Making wireless MIMO equalization more efficient with a multiprocessor DSP SoC
Latest White Papers
- Monolithic 3D FPGAs Utilizing Back-End-of-Line Configuration Memories
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard