How to write DSP device drivers
By Nick Lethaby and David Friedland, Courtesy of Embedded Systems Programming
Dec 15 2003 (17:00 PM)
As digital signal processors pick up peripherals, you'll need to write new device drivers. Here are some time-saving tips for writing them for platforms based on DSPs.
Digital signal processors (DSPs) are now often integrated on-chip with numerous peripheral devices, such as serial ports, UARTs, PCI, or USB ports. As a result, developing device drivers for DSPs requires significantly more time and effort than ever before.
In this article, we'll show you a DSP device-driver architecture that reduces overall driver development time by reusing code across multiple devices. We'll also look in-depth at an audio codec driver created using this architecture. The design and code examples are based on drivers developed for the Texas Instruments' DSP/BIOS operating system, though the same approach will work in any system.
How DSPs differ
Whereas microprocessors are mainly used for general purpose control, DSPs almost invariably do hard real-time data-path processing, where data samples are input in a continuous stream. DSPs are optimized to move data quickly from a peripheral to the DSP core, leading to several architectural differences from microprocessors.
Some microprocessors execute all code from external memory via an instruction cache (I-cache). I/O peripheral registers are memory-mapped and accessed like any other program data. In contrast, many DSPs don't provide I-cache, but do include high-speed, on-chip memory that supports efficient program execution. Even with the latest DSPs that do provide I-cache, the memory is configurable as either a cache or directly addressable memory or a combination of the two. It's a common practice to dedicate some of this memory to real-time I/O, critical code loops, and data to avoid potential nondeterministic behavior caused by cache misses.
To meet real-time I/O needs, DSPs provide dedicated serial ports that connect to streaming peripherals, such as codecs and other data converters. The interaction between the codec and the serial port is synchronous and handled entirely in hardware, though initial configuration of the frame sync, transfer rate, sample rate, and sample size must be done by the DSP.
Although the serial ports can interact directly with the DSP core, software developers avoid this approach for real-time I/O because of the frequency of interrupts. DSPs generally provide one or more direct memory access (DMA) controllers, the channels of which can be used to buffer multiple samples from the serial port and then transfer the full buffer into the on-chip memory without DSP involvement. The DSP is only interrupted once the buffer is full rather than on every data sample. After initial configuration by the DSP, the DMA and serial port interact without any processor intervention. Because of the efficiency gains, drivers for most DSP peripherals use DMA.
Writing a codec driver for a DSP actually involves programming three different peripherals—the codec itself, the serial port, and the DMA controller. Figure 1 shows the data flow between the different peripherals, the DSP, and the DSP's internal memory. Later, we'll show a much more detailed implementation of a DSP codec driver, but first let's discuss some driver-architectural issues that enable better code reuse both at the application and driver levels.
Device-driver architecture
A device driver performs two main functions:
- device configuration and initialization
- data movement
Device configuration is, by definition, specific to a particular device. Data movement, on the other hand, is more generic. In the case of a streaming data peripheral like a codec, the application ultimately expects to send or receive a stream of buffers. The application shouldn't have to worry about how the buffers are managed or what type of codec is being used, beyond issues such as data precision (number of bits in the sample).
Class drivers
By providing a clear abstraction between the driver and the application, you can essentially free the application of the specifics of a certain peripheral and port it more easily to new hardware. You can further apply these concepts of abstraction and reuse to the driver itself. A number of driver functions are independent of the underlying device, such as synchronization between the driver and the application. You can provide these services in a driver module that's specific to a particular class of devices but independent of any individual device. Such a module is often referred to as the "upper half" of a driver. For convenience, we'll use the term class driver to describe this module.
As Figure 2 illustrates, any driver can be divided into two parts: a class driver that handles the application interface and OS-specifics and a mini-driver that addresses the hardware specifics of a particular device.
Because of large differences between devices like codecs and UARTs, you'll typically need to implement several class drivers to support all the peripherals used with a DSP. When designing the driver model for DSP/BIOS, two of the class drivers we implemented were:
- An SIO class driver for frame-oriented streaming devices such as codecs and data converters that transmit fixed-sized frames of data.
- A GIO class driver provides basic read/write operations for devices such as UARTs.
As can be seen from these definitions, the class drivers define the I/O models used by the application. Since the class driver is responsible for synchronization between the application and driver, it will determine whether the I/O is synchronous (where the application thread blocks on an I/O transaction so that another thread can run) or asynchronous (where the application thread continues to run and relies on a notification mechanism that will inform the application when the I/O transaction is complete). Although class drivers are device-independent, they're intimately associated with the operating system since they use operating system services, such as semaphores.
Mini-drivers
The specifics of a peripheral are addressed in the lower half of the driver, for which we'll use the term mini-driver. The mini-driver is responsible for all device-specific initialization and control and for passing a buffer of data to (or receiving a buffer from) the class driver. The mini-driver must define a standard interface to the class driver since it enables a class driver to work with multiple mini-drivers or vice versa. For example, in a system with multiple codecs of different types, you can save code space by having just one instance of the class driver code work with all the different codec mini-drivers.
The most device-specific routines are those that initialize device control registers. These operations require the calculation of specific bit patterns to set the appropriate flag values in each control register. Although the device initialization routines will never be portable, implementation and maintenance can be made much simpler by the development of a basic hardware abstraction layer (HAL) for the device registers.
Example: Codec driver
To illustrate this DSP device-driver architecture, let's look at an example that reveals the design decisions and implementation of the class driver and the mini-driver. We'll use the codec device driver for Texas Instruments' TMS320C5402 DSP starter kit board. The code is similar to other DSP/codec combinations, so you can adapt it as necessary.
To avoid getting lost in the DMA setup code, we'll demonstrate some of the driver concepts using a simple sample-by-sample audio codec driver that processes samples once per interrupt. This type of driver is somewhat simplistic because you'd almost always use a DMA to enable the DSP to process on frames of data rather than having to service an interrupt for every data point.
The codec's class driver
Our example uses the SIO class driver. Like most software modules, the best way to understand the SIO class driver is to look at some actual code. The example, shown in Listing 1, simply reads sound data from the audio codec device driver, copies the data to another buffer, and then transmits it back out to the codec so that it can be heard through a speaker.
Listing 1: Application startup
void main()
{
void* buf0, buf1, buf2, buf3;
/*
*Allocate buffers for the SIO buffer exchange
*/
buf0 = (void*) MEM_calloc(0, BUFSIZE, BUFALIGN);
buf1 = (void*) MEM_calloc(0, BUFSIZE, BUFALIGN);
buf2 = (void*) MEM_calloc(0, BUFSIZE, BUFALIGN);
buf3 = (void*) MEM_calloc(0, BUFSIZE, BUFALIGN);
/*
*Create the task and open the I/O streams
*/
TSK_create(echo);
inStream = SIO_create("/codec", SIO_INPUT, BUFSIZE);
outStream = SIO_create("/codec", SIO_OUTPUT, BUFSIZE);
/*
* Start the DSP/BIOS scheduler when main () exits
*/
}
The code starts, as all C programs do, in main(). The application uses the DSP/BIOS memory manager to allocate four buffers from the system's heap space and then two SIO objects are created to stream data to and from the device driver.
The application uses the SIO_create() call to create the channels. The SIO_create() function arguments indicate some of the design decisions to be made when implementing this class driver. For instance, we'll decide here that an SIO stream can be opened for either reading or writing, but not both. If bidirectional communication is required, the application simply opens two channels (as in this example). This unidirectional channel implementation is more efficient, and many data converters operate in only one direction.
In addition, because most codecs operate on fixed-sized frames of data, we'll program the class driver to optimally support this and avoid the overhead incurred if variable-sized buffers are assumed. We'll use the attributes field to specify the stream object's configuration parameters such as the number of buffers used. For the purposes of this example, we'll choose the default attributes, which specify that the application will use two buffers (in other words, double buffering).
Once main() terminates, the DSP/BIOS scheduler will then activate and allow any tasks to start running. The task called echo will start to execute once main() has completed and will continue to run until the application is terminated, as shown in Listing 2.
Listing 2: A simple application task
void
echo()
{
int sizeRead; // Number of buffer units read
unsigned short *inbuf, *outbuf;
/
* Issue the first & second empty buffers to input stream.
*/
SIO_issue(inStream, buf0, SIO_bufsize(inStream), NULL);
SIO_issue(inStream, buf1, SIO_bufsize(inStream), NULL);
/*
* Issue the first & second empty buffers to output stream.
*/
SIO_issue(outStream, buf2, SIO_bufsize(outStream), NULL);
SIO_issue(outStream, buf3, SIO_bufsize(outStream), NULL);
/*
* Echo buffers ad infinitum.
*/
for (;;)
{
/*
* Reclaim full buffer from input stream
* and empty from output stream.
*/
sizeRead = SIO_reclaim(inStream, (void**)&inbuf, NULL);
SIO_reclaim(outStream, (void**)&outbuf, NULL);
/*
* Copy data from input buffer to output buffer.
*/
for (int i = 0; i < sizeRead; i++)
{
outbuf[i] = inbuf[i];
}
/*
* Issue full buffer to output stream
* and empty to input stream.
*/
SIO_issue(outStream, outbuf, nmadus, NULL)
SIO_issue(inStream, inbuf, SIO_bufsize(inStream), NULL)
}
}
The SIO class driver uses an issue/reclaim model of buffer submission. This means that the buffers that are used to transmit and receive data are all "owned" by the application, and so the creator of the stream is expected to supply all of the necessary buffers. Calls to SIO_issue() are all asynchronous (nonblocking), which enables an application thread to submit multiple buffers to the driver for either reading or writing data while continuing to execute if necessary. By contrast, calls to SIO_reclaim() are synchronous (blocking), so if no buffer is ready to be given back to the application, DSP/BIOS will perform a context switch to the next highest-priority task until it becomes ready.
Another important concept in the SIO class driver design is buffer exchange. To provide efficient I/O operations with low overhead, you should avoid having data copied from one place to another during certain I/O operations in favor of recycling pointers to buffers passed between the application and the device.
Before we can start interacting with the driver in a steady state, we'll need to prime the driver with an initial set of buffers for both the input and output streams. Once this is done, our application can run in an infinite loop, reclaiming the buffers, copying data to them, and issuing them once again to the driver.
The codec's mini-driver
As we discussed previously, the mini-driver is the lower half of the device driver and handles the device-specific chores of the driver—namely device initialization and data I/O. Because we need to support a range of DSP peripherals, including codecs, UARTs, and PCI controllers, we'll begin by defining a standard mini-driver API to support all required devices. We'll then use the standard mini-driver as a basis for the codec mini-driver. The mini-driver interface functions are defined as follows:
- mdBindDev() initializes the devices. In a codec driver, it must initialize the serial port and DMA, as well as the codec. Although we'll hard-code some of the device initialization, we'll also define two parameters to provide configuration options. You can use a devid parameter to specify a device when the DSP or system has more than one device of a specific type. For example, the codec mini-driver uses this parameter to select which serial port it will use. The devParams parameter enables the mini-driver to expose additional configuration options, such as which hardware interrupt the DMA will use.
- mdUnbindDev() frees any resources allocated by the mini-driver.
- mdCreateChan() creates a channel instance. The concept of channel create is separated from the rest of the device initialization because some DSP peripherals are multi-channel devices. The chanParams parameter is required to provide device-specific parameters unique to the channel. For example, the codec driver may use these to configure the DMA channel it's accessing.
- mdDeleteChan() deletes the channel instance.
- MdSubmitChan() processes an I/O Packet (IOP), which contains a pointer to the buffer of data being passed through the device, for a specific channel as indicated by the chanp parameter. MdSubmitChan() implements functions such as device read, write, or flush. The contents of the command field in the IOP determines the particular operation (for example, read or write) performed on a specific IOP is determined.
- MdControlChan() enables the application to perform device-specific control, such as a device reset. The developer predefines the control options and assigns them a command code. The cmd parameter specifies which control command the mini-driver should perform, and the arg parameter enables the class driver to pass a configuration structure that can reconfigure the devices' control registers.
We won't go through the details of how to implement each of these functions, but to illustrate the concept, we'll show a simple implementation of a mini-driver's channel object structure and mdSubmitChan() function.
The channel object structure is initialized by the mini-driver's mdCreateChan() function and is shown Listing 3. The actual design of this structure is completely up to the driver writer, but many implementations look similar to this one. Elements of this structure can include channel state information, such as information about the current I/O packet being processed, a linked list of packets queued for processing, and the callback function that's to be used to notify the class driver that a packet's processing is complete.
Listing 3: Channel object data structure
typedef struct | ||||||
{ | ||||||
bool | inuse; | // TRUE => channel has been opened | ||||
int | imode; | // IOM_INPUT or IOM_OUTPUT | ||||
IOM_Packet | *dataPacket; | // current active I/O packet | ||||
QUE_Obj | pendList; | // list of packets for I/O | ||||
unsigned int | *bufptr; | // pointer *within* current buffer | ||||
unsigned int | bufcnt; | // remaining samples to be handled | ||||
IOM_TiomCallback | cbFxn; | // used to notify client when complete | ||||
void* | cbArg; | // arg passed with callback function | ||||
} ChanObj, *ChanHandle; |
Listing 4: mdSubmitChannel () function
static int
mdSubmitChan(void* chanp, IOM_Packet *packet)
{
ChanHandle chan = (ChanHandle) chanp;
unsigned int imask;
imask = HWI_disable(); // disable interrupts
if (chan->dataPacket == NULL)
{
/*
* Start I/O job.
*/
chan->bufptr = (unsigned int *)packet->addr;
chan->bufcnt = packet->size;
// dataPacket must be set last, to synchronize with ISR.
chan->dataPacket = packet;
}
else
{
/*
* There is an I/O job already pending; queue packet.
*/
QUE_put(&chan->pendList, packet);
}
HWI_restore(imask); // restore interrupts
return (IOM_PENDING);
}
The mdSubmitChan() function, shown in Listing 4, will receive an I/O packet from the class driver and either put the packet in queue if the function is already working on a previously submitted job or start working on the packet right away. All of the state driver information required to accomplish this is contained in the channel-object structure. Notice that interrupts are typically disabled in this function to maintain the coherency of the channel state; however, you should keep this period short for a proper driver design.
A modular mini-driver architecture
Since DSPs usually have multiple serial ports, a DSP may interface to several different data-converter devices. Since companies often use essentially the same application across different hardware platforms, which may have a mix of different peripherals, we can look for opportunities to make the mini-driver code more reusable across devices.
A codec driver requires the programming of three peripherals: the codec itself, a serial port, and the DMA controller. Since the DMA controller and serial port for a given DSP will always be the same, we can partition the DMA and serial-port driver code from the codec code.
As we discussed earlier, a driver's functions can be divided into configuration and data movement. Since the codec and serial port communicate synchronously without software intervention, the driver code need only address moving data from the DMA to the DSP's memory. This enables us to bifurcate the codec mini-driver functions into two discrete modules. Only the mdBindDev() and mdCreateChan() functions need to be rewritten for a new codec because these functions perform initialization. The remaining functions are implemented in a generic serial-port/DMA data mover that you can use across many different mini-driver implementations.
Figure 3 shows how a mini-driver can be split into generic data mover and device-specific portions.
Device-driver buffer flow
To better understand the flow of data through the driver and to map out the interactions between the application, device driver and device, take a look at Figure 4, which shows a step-by-step breakdown.
- The application will create the buffers that the device driver will use. In this case, we've chosen a double-buffered design, so two buffers are created (buf0 and buf1).
- The application calls the class driver function SIO_issue() with buf0 as an argument. The device driver now "owns" the buffer and builds an IOP by taking an available packet from a pool of available IOPs (implemented simply using a free list queue) and filling it with command and status information as well as a pointer to buf0 itself. The class driver creates the pool of IOPs at startup to avoid the overhead of having to dynamically allocate memory at run-time.
- The class driver calls the mini-driver function mdSubmitChan() with the IOP as an argument. As we showed earlier, this function will disable interrupts, and either start the I/O processing job or queue it for processing at a later time. Queuing the packets in the class driver would be too inefficient since the mini-driver would need to resynchronize with the class driver every time it was ready to process the next packet.
- When the device has completed processing the job, it will fire off an interrupt to the DSP. The ISR will set the status value in the IOP to indicate if the operation was successful.
- The mini-driver will then call the class driver's callback function with the IOP as an argument to signal its completion.
- The application will call the class driver function SIO_reclaim(). If this call is made before the I/O job has completed, it will block, allowing the RTOS to context switch. When the class driver unblocks, it will pass the buffer back to the application so that it can be filled with more data and the cycle can then repeat.
Time to modularize
You can simplify DSP driver design by abstracting driver functionality into different modules that isolate device-specific code from more generic functions. Although using a modular approach to driver development requires more up-front design time and effort, you'll see significant benefit when porting the application to a new hardware configuration.
In our experience, the modular approach reduces the development effort for a new codec driver by 90%. Because the driver reuses existing debugged modules, development time is more predictable since typically it's much harder to debug a driver than to write the initial code.
Nick Lethaby is the technology product manager for the DSP/BIOS operating system at Texas Instruments. He has over 16 years experience in embedded and real-time software applications and has a special interest in real-time kernels. Nick has a BS in computer science from the University of London. You can reach him at nlethaby@ti.com.
David Friedland is a senior applications engineer and project manager for Texas Instruments. He has developed device drivers and other embedded systems software and currently manages the development of device drivers for DSP peripherals. He has over 17 years of experience in embedded and DSP software and has a BS in electronic engineering from San Diego State. He can be reached at dfriedland@ti.com.
Related Semiconductor IP
- Root of Trust (RoT)
- Fixed Point Doppler Channel IP core
- Multi-protocol wireless plaform integrating Bluetooth Dual Mode, IEEE 802.15.4 (for Thread, Zigbee and Matter)
- Polyphase Video Scaler
- Compact, low-power, 8bit ADC on GF 22nm FDX
Related White Papers
- How to cost-efficiently add Ethernet switching to industrial devices
- Customized DSP -> Applications take the driver's seat
- DSP or FPGA? How to choose the right device
- Use Pre-Configured Device Drivers (PCD) to reduce embedded system memory footprint
Latest White Papers
- Reimagining AI Infrastructure: The Power of Converged Back-end Networks
- 40G UCIe IP Advantages for AI Applications
- Recent progress in spin-orbit torque magnetic random-access memory
- What is JESD204C? A quick glance at the standard
- Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience