HDTV H.264/AVC Video Encoder with compressed reference frame store

Overview

The OL_H264E-CFS core is a hardware implementation of the H.264 video compression algorithm. The core accepts up to the highest resolution HDTV video stream as input and outputs the encoded bitstream. No DRAM required, thanks to the compressed reference frame store technology.

Simple, fully synchronous and silicon proven design with low gate count.

The OL_H264E-CFS core is a hardware implementation of the H.264 video compression algorithm designed to process HDTV progressive video up to 1920x1080 at 30 fps.

Each block of 16x16 pixels is processed in just 1024 cycles. This means that each pixel is processed in just 4 cycles. Consequently, given an uncompressed video stream of resolution X by Y, and frame rate fps, the minimum clock frequency to process a such video stream is :

F = 4*X*Y*fps

This allows the core to process the video stream at relatively low clock frequencies. For example, HDTV video of 1920x1080 @ 30 fps requires ~250 MHz, whereas VGA video of 640x480 @ 30 fps requires ~37 MHz.

Key Features

  • Fully compatible with the ITU-T H.264 specification.
  • Highly (10-15:1 ) compressed frame store (CFS) with perfect reconstruction (no error/drift) with third party standard decoders. Patent pending technology.
  • Extremely low power : no external DRAM required and much lower bandwidth and power through the CFS.
  • I and P frame support.
  • Proven in FPGA : 720p @ 30 fps in Virtex5-2 demo board with video streamed to Ethernet.
  • Profile level 4.1, can be decoded by Baseline, Main or Hi Profile decoder.
  • Supports up to the highest HDTV video resolution (1920x1080 @ 30 fps progressive).
  • Very low operational frequency : from ~1.5 MHz for QCIF @ 15 fps to ~250 MHz for 1920x1080 @ 30 fps.
  • Single core HDTV support in FPGA : 720p (1280x720) at 30 fps in high end device.
  • No CPU required for encoding.
  • Constant Bit Rate (CBR). Partial Variable Bit Rate (VBR).
  • Very low latency (~1.1 ms for VGA @ 30 fps).
  • Motion vector up to –16.00/+15.75 pixels around the predicted motion vector (-24.00/+23.75 around the origin), down to quarter pixel.
  • Support for most of intra4x4 and all intra16x16 modes.
  • Block skipping logic for lower bitrate.
  • Supports YUV 4:2:0 video input.
  • Min Clock speed = ~ 4 x the raw pixel clock speed.
  • Low gate count : 280 Kgates + 217 Kbits of RAM for real time 1080p @ 30
  • Simple, fully synchronous design.
  • Available as fully functional and synthesizable VHDL or Verilog soft-core.

Benefits

  • This core highly compresses (8-16:1) the reference frames. This results in very small storage : ~1 Mbit for 720p, ~2.2. Mbit for 1080p or less.
  • Such small storage can be integrated on the SoC and it allows to eliminate the power hungry external DRAM.
  • Since the data is highly compressed, the compressed frame store memory is rarely accessed, resulting in even more power reduction.
  • Bandwidth is only ~50 Mbytes/s for 1080p@30 .
  • no error or drift is present when decoding with third party H.264 decoders.

Applications

  • Digital video recorders.
  • Video wireless devices.
  • Video surveillance systems.
  • Hand held HDTV video cameras.

Deliverables

  •  Synthesizable VHDL or Verilog RTL.
  •  Bit accurate C model.
  •  Complete HDL testbench. 
  • Complete data sheet.

Technical Specifications

Maturity
Proven in ASIC and FPGA
Availability
Now
×
Semiconductor IP