Low-power high-speed reconfigurable processor to accelerate AI everywhere.

Overview

Zhufeng-800: A low-power high-speed reconfigurable processor to accelerate AI everywhere.

The processor incorporates sophisticated technologies including network random sparsity, adaptive tensor tiling, layer-fusion data movement, and associated tool-chain. The multicore Zhufeng-800 architecture, which is called RiSE, scales from ultra-low-cost single-core applications to ultra-high-performance many-core applications. The architecture is optimized for efficient deep learning, computer vision, and image-video processing.

Efficient architecture for accelerating random sparse deep neural network
By using random pruning schemes, up to 90% redundancy of models can be removed, which can potentially reduce the computational complexity. Most AI inference architectures are unable to fully utilize network sparsity due to the random distribution of nonzero weights. For Zhufeng-800, the pruned networks can be preprocessed by the associated tool-chain. By this means, the Zhufeng-800 can skip the pruned weights during the inference and achieve speedup linearly with network sparsity.

Scalable multi-core architecture for applying in various scenarios
The Zhufeng-800 contains 4 identical cores connected by an internal bus, which can be utilized as building blocks for further configurations. These cores can be configured for different performance requirements. The throughput of 0.6 TOPS can be achieved with only one activated core, while dozens of TOPS can be reached when all cores are activated. Thus, only one core is activated when processing low-cost tasks, while other cores can be reactivated to fulfill the highest performance requirements. The scalable RiSE architecture supports different operating points across product lines: from very-low-cost to ultra-high-performance.

Software programmable and Full stack framework for deployment
The Zhufeng-800 processor is fully software programmable and thus can handle the still rapidly changing deep learning algorithms. The newly-added neural networks or applications can be accommodated by the associated tool-chain for optimized deployment. Therefore, a SoC based on the Zhufeng-800 processing platform can stay in the market despite the regeneration of algorithms or can even address other market segments with a wide variety of customer requests.
Full service including associated tool-chain and the deployment of algorithm will be provided with Zhufeng-800, which includes sparse training, quantitative training, model compilation, and model deployment.

Key Features

  • Multi-Core Number: 4
  • Performance (INT8, 600MHz): 0.6TOPS
  • Achievable Clock Speed (MHz): 600 (28nm)
  • Synthesis Logic Gates (MGates): 2
  • Memory Size (KBytes): 464Kbytes/Core
  • Bus Inference: AXI4
  • Accuracy: decreases 0.28% with 90% sparsity for yolov3
  • Framework Support: PyTorch, Tensorflow, Caffe (ONNX)

Benefits

  • Adaptive Tensor Tiling
  • Random Sparse Neural Network Acceleration
  • Lossless Weight Compression
  • Low Latency
  • Channel-Wise Dynamic Fixed-Point Data Type
  • Memory Management Unit
  • Wide Range of Element-Wise Unit Operations
  • Embedded Configurable On-Chip Memory

Block Diagram

Low-power high-speed reconfigurable processor to accelerate AI everywhere. Block Diagram

Applications

  • Autonomous Driving
  • Smart Surveillance
  • Intelligent Manufacturing
  • Smarter City

Technical Specifications

Foundry, Node
28hpc/9T
×
Semiconductor IP