Neural engine IP - The Cutting Edge in On-Device AI
With support for the latest generative AI models and traditional RNN, CNN, and LSTM models, the Origin™ E6 NPUs scale from 16 to …
Overview
With support for the latest generative AI models and traditional RNN, CNN, and LSTM models, the Origin™ E6 NPUs scale from 16 to 32 TOPS to deliver the optimum balance of performance, efficiency, and features for demanding edge inference applications.
The Origin E6 is a versatile NPU that is customized to match the needs of next-generation smartphones, automobiles, AV/VR, and consumer devices. With support for video, audio, and text-based AI networks, including standard, custom, and proprietary networks, the E6 is the ideal hardware/software co-designed platform for chip architects and AI developers. It offers broad native support for current and emerging AI models, and achieves ultra-efficient workload scheduling and memory management, with up to 90% processor utilization—avoiding dark silicon waste.
The Origin E6 neural engine uses Expedera’s unique packet-based architecture, which is far more efficient than common layer-based architectures. The architecture enables parallel execution across multiple layers achieving better resource utilization and deterministic performance. It also eliminates the need for hardware-specific optimizations, allowing customers to run their trained neural networks unchanged without reducing model accuracy. This innovative approach greatly increases performance while lowering power, area, and latency.
Specifications
| Compute Capacity | 8K to 16K INT8 MACs |
| Multi-tasking | Run up to 8 Simultaneous Jobs |
| Power Efficiency | 18 TOPS/W effective; no pruning, sparsity or compression required (though supported) |
| Example Networks Supported | HitNet, Denoise, ResNext, ResNet50 V1.5, ResNet50 V2, Inception V3, RNN-T, MobileNet SSD, MobileNet V1, UNET, BERT, EfficientNet, FSR CNN, CPN, CenterNet, YOLO V3, YOLO v5l, ShuffleNet2, Swin, SSD-ResNet34, DETR, others |
| Example Performance | MobileNet V1 (512 x 512): 3629 IPS, 2696 IPS/W (N7 process, 1GHz, no sparsity/pruning/compression applied) |
| Layer Support | Standard NN functions, including Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Programmable general FP function, including Sigmoid, Tanh, Sine, Cosine, Exp, others, custom operators supported. |
| Data types | INT4/INT8/INT10/INT12/INT16 Activations/Weights FP16/BFloat16 Activations/Weights |
| Quantization | Channel-wise Quantization (TFLite Specification) Software toolchain supports Expedera, customer-supplied, or third-party quantization |
| Latency | Deterministic performance guarantees, no back pressure |
| Frameworks | TensorFlow, TFlite, ONNX, others supported |
Key features
- Choose the Features You Need: Customization brings many advantages, including increased performance, lower latency, reduced power consumption, and eliminating dark silicon waste. Expedera works with customers to understand their use case(s), PPA goals, and deployment needs during their design stage. Using this information, we configure Origin IP to create a customized solution that perfectly fits the application.
- Market-Leading 18 TOPS/W: Sustained power efficiency is key to successful AI deployments. Continually cited as one of the most power-efficient architectures in the market, Origin NPU IP achieves a market-leading, sustained 18 TOPS/W.
- Efficient Resource Utilization: Origin IP scales from GOPS to 128 TOPS in a single core. The architecture eliminates the memory sharing, security, and area penalty issues faced by lower-performing, tiled AI accelerator engines. Origin NPUs achieve sustained utilization averaging 80%—compared to the 20-40% industry norm—avoiding dark silicon waste.
- Full TVM-Based Software Stack: Origin uses a TVM-based full software stack. TVM is widely trusted and used by OEMs worldwide. This easy-to-use software allows the importing of trained networks and provides various quantization options, automatic completion, compilation, estimator and profiling tools. It also supports multi-job APIs.
- Successfully Deployed in 10M Devices: Quality is key to any successful product. Origin IP has successfully deployed in over 10 million consumer devices, with designs in multiple leading-edge nodes.
Block Diagram
Benefits
- 16 – 32 TOPS performance
- Support for standard, custom, and proprietary neural networks
- Performance efficiencies up to 18 TOPS/Watt
- Input resolutions up to 4K and beyond
- Runs LLM, CNN, RNN, DNN, LSTM, and other network types
- Full software stack provided, including compiler, estimator, scheduler, and quantizer
- Support for transformers, stable diffusion, large language models (LLMs), others
- Delivered as Soft IP (RTL) or GDS
Files
Note: some files may require an NDA depending on provider policy.
Specifications
Identity
Provider
Learn more about NPU IP core
Heterogeneous NPU Data Movement Tax: Intel's Own Slides Tell the Story
The Upcoming NPU Shakeout
One Instruction Stream, Infinite Possibilities: The Cervell™ Approach to Reinventing the NPU
Legacy IP Providers Struggle to Solve the NPU Dilemna
Can You Rely Upon your NPU Vendor to be Your Customers' Data Science Team?
Frequently asked questions about NPU IP cores
What is Neural engine IP - The Cutting Edge in On-Device AI?
Neural engine IP - The Cutting Edge in On-Device AI is a NPU IP core from Expedera listed on Semi IP Hub.
How should engineers evaluate this NPU?
Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this NPU IP.
Can this semiconductor IP be compared with similar products?
Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.