Zhufeng-800: A low-power high-speed reconfigurable processor to accelerate AI everywhere.
The processor incorporates sophisticated technologies including network random sparsity, adaptive tensor tiling, layer-fusion data movement, and associated tool-chain. The multicore Zhufeng-800 architecture, which is called RiSE, scales from ultra-low-cost single-core applications to ultra-high-performance many-core applications. The architecture is optimized for efficient deep learning, computer vision, and image-video processing.
Efficient architecture for accelerating random sparse deep neural network
By using random pruning schemes, up to 90% redundancy of models can be removed, which can potentially reduce the computational complexity. Most AI inference architectures are unable to fully utilize network sparsity due to the random distribution of nonzero weights. For Zhufeng-800, the pruned networks can be preprocessed by the associated tool-chain. By this means, the Zhufeng-800 can skip the pruned weights during the inference and achieve speedup linearly with network sparsity.
Scalable multi-core architecture for applying in various scenarios
The Zhufeng-800 contains 4 identical cores connected by an internal bus, which can be utilized as building blocks for further configurations. These cores can be configured for different performance requirements. The throughput of 0.6 TOPS can be achieved with only one activated core, while dozens of TOPS can be reached when all cores are activated. Thus, only one core is activated when processing low-cost tasks, while other cores can be reactivated to fulfill the highest performance requirements. The scalable RiSE architecture supports different operating points across product lines: from very-low-cost to ultra-high-performance.
Software programmable and Full stack framework for deployment
The Zhufeng-800 processor is fully software programmable and thus can handle the still rapidly changing deep learning algorithms. The newly-added neural networks or applications can be accommodated by the associated tool-chain for optimized deployment. Therefore, a SoC based on the Zhufeng-800 processing platform can stay in the market despite the regeneration of algorithms or can even address other market segments with a wide variety of customer requests.
Full service including associated tool-chain and the deployment of algorithm will be provided with Zhufeng-800, which includes sparse training, quantitative training, model compilation, and model deployment.
Low-power high-speed reconfigurable processor to accelerate AI everywhere.
Overview
Key Features
- Multi-Core Number: 4
- Performance (INT8, 600MHz): 0.6TOPS
- Achievable Clock Speed (MHz): 600 (28nm)
- Synthesis Logic Gates (MGates): 2
- Memory Size (KBytes): 464Kbytes/Core
- Bus Inference: AXI4
- Accuracy: decreases 0.28% with 90% sparsity for yolov3
- Framework Support: PyTorch, Tensorflow, Caffe (ONNX)
Benefits
- Adaptive Tensor Tiling
- Random Sparse Neural Network Acceleration
- Lossless Weight Compression
- Low Latency
- Channel-Wise Dynamic Fixed-Point Data Type
- Memory Management Unit
- Wide Range of Element-Wise Unit Operations
- Embedded Configurable On-Chip Memory
Block Diagram
Applications
- Autonomous Driving
- Smart Surveillance
- Intelligent Manufacturing
- Smarter City
Technical Specifications
Foundry, Node
28hpc/9T
Related IPs
- ARC Functional Safety (FS) Processor IP supports ASIL B and ASIL D safety levels to simplify safety-critical automotive SoC development and accelerate ISO 26262 qualification
- 32-bit RISC Processor To Deliver High Performance In Low-Cost Microcontroller Applications
- AI inference processor IP
- High-performance 32-bit multi-core processor with AI acceleration engine
- Low-power 32-bit RISC-V processor
- MIPI CSI-2 host/device controllers for high-speed serial interface between image processor and camera sensors