On-device AI is a must-have for many new designs. Silicon architects look for solutions that support the latest AI technologies, like transformers and stable diffusion, while balancing performance and low power consumption with minimal latency.
The Origin™ E2 is a family of power and area optimized NPU IP cores designed for devices like smartphones and edge nodes. It supports video—with resolutions up to 4K and beyond— audio, and text-based neural networks, including public, custom, and proprietary networks.
Innovative Architecture
The Origin E2 neural engine uses Expedera’s unique packet-based architecture, which is far more efficient than common layer-based architectures. The architecture enables parallel execution across multiple layers achieving better resource utilization and deterministic performance. It also eliminates the need for hardware-specific optimizations, allowing customers to run their trained neural networks unchanged without reducing model accuracy. This innovative approach greatly increases performance while lowering power, area, and latency.
Specifications
Compute Capacity | 0.5K to 10K INT8 MACs |
Multi-tasking | Run Multiple Simultaneous Jobs |
Power Efficiency | 18 TOPS/W effective; no pruning, sparsity or compression required (though supported) |
Example Networks Supported | ResNet, MobileNet, MobileNet SSD Inception V3, RNN-T, BERT, EfficientNet, FSR CNN, CPN, CenterNet, Unet, YOLO V3, YOLO V5, ShuffleNet2, others |
Example Performance | MobileNet V1 (226 x 226): 8750 IPS, 13,482 IPS/W (N7 process, 1GHz, no sparsity/pruning/compression applied) |
Layer Support | Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Programmable general FP function, including Sigmoid, Tanh, Sine, Cosine, Exp, others, custom operators supported. |
Data types | INT4/INT8/INT10/INT12/INT16 Activations/Weights FP16/BFloat16 Activations/Weights |
Quantization | Channel-wise Quantization (TFLite Specification) Software toolchain supports Expedera, customer-supplied, or third-party quantization |
Latency | Deterministic performance guarantees, no back pressure |
Frameworks | TensorFlow, TFlite, ONNX, others supported |