Tensilica Introduces Xtensa LX2 and Xtensa 7 Configurable Processors

New Cores Extend Tensilica's Configurable Processor Technology Leadership

SANTA CLARA, CA, – December 4, 2006 – Tensilica, Inc. today introduced its seventh-generation of Xtensa configurable processors, the Xtensa LX2 and Xtensa 7 cores. Both processors feature several architectural enhancements, and are the first configurable licensable core families available with built-in, on-the-fly Error Correcting Code (ECC), which is extremely important in storage, networking, automotive and transaction processing applications where data integrity and error resiliency are of paramount concern. Tensilica’s new generation of processors reinforce Tensilica’s processor technology leadership by remaining the lowest power, highest performance licensable cores on the market. Both processors are available and shipping now.

“We’ve made several architectural improvements that enhance our leadership both with our Xtensa 7 our Xtensa LX2 configurable, extensible processors,” stated Chris Rowen, Tensilica’s president and CEO. “Tensilica offers more configuration options and a much more automated process of generating both the hardware RTL and the matching software tool chain than anyone in the industry.”

Lowest Power, Highest Performance

The base Xtensa instruction set architecture, common to both the Xtensa 7 and Xtensa LX processor cores, provides the industry’s lowest power and highest performance when compared to legacy fixed architecture cores. Because both cores are fully configurable and designers can add application-specific instructions to the base processor using Tensilica’s patented, automated processor generator, it’s important to compare equivalent processor configurations when comparing to competing processor core offerings.

For example, a small configuration of an Xtensa 7 core without cache memories and without designer-defined instruction extensions is roughly equivalent to an ARM 7TDMI-s core, yet it has much better performance and lower power:

Processor

Max Frequency (0.13u G worst case)

Power - mW per MHz (0.13u G)

Dhrystone MIPS

Dhrystone MIPS/MHz

ARM 7TDMI-S

146 MHz

0.10-0.18 (area/speed opt.)

131

1.11

Xtensa 7

233-250 MHz

0.082

300

1.28

A high-performance version of the Xtensa LX2 processor uses less than half the die area and power of the equivalent ARM 1136J-S: NOTE: This is not the base Xtensa LX processor. Rather, this version of Xtensa LX2 has been configured to be a high performance, general-purpose CPU.

Processor

Equivalent Frequency (0.13u G worst case)

Power - mW per MHz (0.13u G)

Dhrystone MIPS/MHz

ARM 1136J-S 33 MHz (single-issue) 0.60 1.98
Xtensa LX2, 3-way FLIX performance configuration 600 MHz (three-issue) 0.17 10.4

Power Reduction Up to 30 Percent

Several enhancements were made to both the Xtensa 7 and Xtensa LX2 processors to reduce power up to 30 percent in total core plus memory power, including:

  • Enhanced configuration choices that allow independent width selection of main system memory interface, local data memory interface, and instruction memory interface

  • Reduced execution speculation for data memory enables and accesses, leaving data cache and tightly coupled local data memories turned off for longer periods of time

  • An optional wider instruction fetch buffer that reduces instruction memory cycles (and power consumed by those instruction fetch cycles) by up to 75 percent, depending on code set.

Also, Tensilica designed in additional power-down modes, including external power-down of the trace port control and on-chip debug modules, lowering overall system power.

New ECC Option

Tensilica introduced two options for detecting and/or correcting memory errors, which are an increasing problem as silicon process geometries shrink. Tensilica’s configurable Xtensa processor designers can now select either parity or ECC protection on all local (tightly coupled memories). Parity generates an exception when a single-bit soft error is detected in the cache data array, cache tag array, or local instruction and/or data memory. ECC detects and corrects single-bit errors and detects double-bit errors. Tensilica has the first licensable processor architecture family with built-in, on-the-fly ECC capability. Error correction is extremely important in mission-critical applications such as storage and networking applications where reliability and accuracy are a paramount concern. It is also very important in automotive applications to help meet error-free automotive safety standards.

“As process geometries shrink, soft memory errors increase due to lower cell capacitances and lower supply voltages,” added Rowen. “Therefore, it’s increasingly important that processors be able to detect and fix soft memory errors. That’s why it’s so important that Tensilica is making built-in, on-the-fly ECC available as an option in all of its new generation Xtensa cores.”

What Else is New?

Tensilica added several features that apply to both the Xtensa 7 and Xtensa LX processor cores:

  1. Increased designer options for its Processor Interface (PIF) for the control of buffering (making it smaller) to fine-tune and lower power in non-performance-critical paths of the SOC design.

  2. The option to configure a wide interface to fast local instruction and data memories and, at the same time, a narrow system interface to the system bus. This enables fast, high bandwidth to local memories while reducing the complexity, area and power of system interface and bus design.

  3. Improved infrastructure for the TIE (Tensilica Instruction Extension) language, providing better handling of multiple TIE files for large development teams and companies sharing repositories for pre-built TIE modules.

Tensilica also added some features that apply only to the advanced capabilities in the Xtensa LX processor:

  1. New TIE Lookup port feature, which allows the creation of new memory interfaces beyond those already available as local instruction and data memories. Memories connected to these new designer-defined TIE Lookup ports can be read and written directly from the processor data path without using load and store instructions. Video system designers can use a TIE Lookup port to connect a local buffer that stores video frame data that is filled/refilled by external hardware to the processor data path without using power-hungry DMA (Direct Memory Access). Network designers can use TIE Lookup ports to connect large lookup tables that then can be quickly accessed by the processor.

  2. An optional connection box that is a full cross bar, enabling the connection of two single ported (banked) local data RAMs to Xtensa LX2 processor core configuration that have two load/store ports. In this way, the processor can sustain two load/stores per cycle as long as they are to opposite banks. This greatly simplifies system design when using Xtensa LX2 as an XY-style DSP architecture with two load/store ports.

  3. Memory Management Unit (MMU) support for all configurations, even those using a 7-stage pipeline and Tensilica’s patented FLIX (Flexible Length Instruction eXtensions) technology, allowing multi-instruction-issue high-performance CPUs. The MMU is required for running the Linux operating system, which is available from Tensilica’s partner MontaVista. MMU-enabled Xtensa LX2 processor cores employing FLIX are perfect for high-performance networking applications running complex protocol stacks and high-end applications processors in mobile and handset applications. (Note: MMU is also available as an option on Xtensa 7.)

The New Xtensa 7 Processor

This seventh-generation Xtensa configurable processor is optimized for low-power applications and is ideal for both control and DSP (digital signal processing) operations. The Xtensa 32-bit architecture has a 5-stage pipeline, 32-bit ALU (arithmetic logic unit), up to 64 general-purpose physical registers, six special purpose registers and 80 base instructions, including improved 16- and 24-bit RISC instruction encoding (with modeless switching for maximum code density). Clock speed reaches 600 MHz in 90nm GT process, speed-optimized netlist, worst case operating conditions. Power consumption for a minimum configuration (20,000 gates) is 0.038 mW/MHz in 130nm LV process, area-optimized netlist, typical operating conditions and 0.048 mW/MHz in a 90nm GT process, area-optimized netlist, typical operating conditions.

The New Xtensa LX2 Processor

Tensilica’s second-generation Xtensa LX2 processor includes all of the features of Xtensa 7 plus three important features not available on any other processor core:

  1. Much faster data input and output (I/O), including an option for a second load/store unit and Tensilica’s breakthrough capability to add designer-defined GPIO (general purpose input/output) TIE Ports and FIFO (first in, first out) TIE Queues for direct data access into the processor’s execution units. The TIE Ports and Queues completely bypass the bus, eliminating the need for multiple load/store operations to process data.

  2. Tensilica’s innovative FLIX technology, which allows the creation of processor configurations that issue multiple instructions per cycle in a manner similar to VLIW processors. The Xtensa C/C++ Compiler (XCC) automatically extracts instruction-level and loop-level parallelism from C/C++ code and bundles operations into FLIX instructions. These multi-issue FLIX instructions can be either 32-bits or 64-bits wide and are modelessly intermixed with the base 16- and 24-bit instructions. By packing multiple operations into a wide 32- or 64-bit instruction word, designers can accelerate a broader class of “hot spots” in embedded applications.

  3. Xtensa LX2 features the same instruction set as Xtensa 7 with an option for a 7-stage high-performance pipeline. The 7-stage version of Xtensa LX can achieve over 650 MHz in 90nm GT process, speed-optimized netlist, worst case operating conditions.

Power consumption for a minimum configuration (20,000 gates) is 0.038 mW/MHz in 130nm LV process, area-optimized netlist, typical operating conditions and 0.048 mW/MHz in a 90nm GT process, area-optimized netlist, typical operating conditions.

Broad Partner Base

Configurability of a Tensilica processor core never compromises the underlying base Xtensa instruction set, thereby ensuring availability of a robust ecosystem of third party application software and development tools. All possible configurations of the Xtensa processor are always compatible with major operating systems, debug probes and ICE solutions; and always come with an automatically generated, complete software development toolchain including an advanced integrated development environment based on the ECLIPSE framework, a world-class compiler, a cycle-accurate SystemC-compatible instruction set simulator, and the full industry-standard GNU toolchain.

The Configurable, Extensible Xtensa Architecture

Xtensa processors feature more than 300 independent configuration parameters so the designer can select the right mix of features for the application. These click-box options include: multipliers; floating point unit; an audio processor; a basic DSP engine or a 3-way VLIW (very long instruction word) SIMD (single instruction, multiple data) DSP engine; processor bus interfaces; MMU; up to 32 interrupts; optimized EDA scripts; operating system support; and much more.

To increase performance 2-100x or more, designers can add application-specific instructions using the TIE language, or let Tensilica’s XPRES (Xtensa PRocessor Extension Synthesis) Compiler automatically evaluate C/C++ algorithms and automatically develop optimized TIE instructions that will accelerate these algorithms. The TIE language can describe an entire new data path including elements like new registers, register files, multi-cycle execution units, designer-defined GPIO and FIFO interfaces, SIMD execution units, a VLIW data path, and custom data types, such as 24-bit data for audio applications, 56-bit data for security processing, or 256-bit data for packet processing, to save area and power. The TIE Compiler takes the descriptions of this new data path and new instructions and updates the entire compiler tool chain (compiler, debugger, profiler, et cetera), the instruction-set simulator and system models. It also inserts optimized clock-gated execution units, registers, register files, control logic, bypass logic, etc., into the processor hardware. This is done automatically and guaranteed to be correct-by-construction by Tensilica.

Using Xtensa Processors Instead of Logic Blocks

Tensilica’s Xtensa processors are often used instead of dedicated hard-wired RTL (register-transfer level) blocks for several reasons. First, because it is programmable, the Xtensa processor offers flexibility that pure RTL-based finite state machine (FSM) design cannot offer. Second, post-silicon algorithmic bug fixes can be done via firmware updates, dramatically reducing the risk of silicon respins. Third, Xtensa processors reduce total SOC design and verification time considerably over RTL design methods. Fourth, often Xtensa processors are lower power than equivalent RTL implementations because the Xtensa Processor Generator does automatic pipeline activity analysis and clock gating on a cycle-by-cycle basis. The time required to do this manually in RTL design is generally prohibitive. And fifth, because Xtensa processors can bypass the bus and use GPIO TIE Ports and FIFO TIE Queues for data transfer, Xtensa processors can move and manipulate data as fast and efficient as RTL blocks.

Pricing and Availability

Both Xtensa 7 and Xtensa LX2 are shipping now. Xtensa 7 pricing starts at $250,000 for a single-project use license.

About Tensilica

Tensilica offers the broadest line of controller, CPU and specialty DSP processors on the market today, in both an off-the-shelf format via the Diamond Standard Series cores and with full designer configurability with the Xtensa processor family. Tensilica’s low-power, benchmark proven processors have been designed into high-volume products at industry leaders in the digital consumer, networking and telecommunications markets. All Tensilica processor cores are complete with a matching software development tool environment, portfolio of system simulation models, and hardware implementation tool support. For more information on Tensilica's patented approach to the creation of application-specific building blocks for SOC design, visit www.tensilica.com.

×
Semiconductor IP