Vendor: Expedera Category: NPU

NPU IP Core for Edge

Edge-friendly LLM and CNN AI Inference processing Edge devices are increasingly equipped with AI processing capabilities that enh…

Contact Provider Request Datasheet

Overview

Edge-friendly LLM and CNN AI Inference processing

Edge devices are increasingly equipped with advanced AI processing capabilities that enhance functionality and improve the user experience. While many of these devices previously depended on cloud-based inference, manufacturers are now shifting towards on-device inference. This transition helps to lower latency, reduce overall power consumption, and minimize the need for cloud processing, thereby cutting costs.

Perfect-Fit Solutions

Origin Evolution™ for Edge offers out-of-the-box compatibility with today's most popular LLM and CNN networks. Attention-based processing optimization and advanced memory management ensure optimal AI performance across a variety of networks and representations. Featuring a hardware and software co-designed architecture, Origin Evolution for Edge scales to 32 TFLOPS in a single core to address the most advanced edge inference needs.

Bringing AI to the Edge

Edge device makers are adding more AI to their products, including advanced LLM and CNN capabilities, as they enable a new set of applications including speech and visual contextual awareness, natural language queries, predictive maintenance, and human interaction assistance. For the best user experience, the privacy and latency advantages of edge processing are clear, with edge devices moving inference processing to the device. However, as today's leading LLMs may be 20 to 50X larger than more traditional AI networks employed on past generation devices, there are significant memory and processor hurdles to that device makers must overcome before this can be realized.

Innovative Architecture

Origin Evolution uses Expedera’s unique packet-based architecture to achieve unprecedented NPU efficiency. Packets, which are contiguous fragments of neural networks, are an ideal way to overcome the hurdle of large memory movements and differing network layer sizes, which are exacerbated by LLMs. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different LLM and CNN networks. Origin Evolution includes a high-speed external memory streaming interface that is compatible with the latest memory standards.

Ultra-Efficient Neural Network Processing

Accepting standard, custom, and black box networks in a variety of AI representations, Origin Evolution offers a wealth of user features such as mixed precision quantization. Expedera’s unique packet-based processing reduces much larger networks into smaller, contiguous fragments, overcoming the hurdle of large memory movements and offering much higher processor utilization. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different types of networks. Internal memory handles intermediate needs, while the memory streaming interface interfaces with off-chip storage.

Specifications

Compute Capacity	up to 16K FP16 MACs
Multi-tasking	Run Simultaneous Jobs
Example Networks Supported	Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others, including proprietary/black box networks
Example Performance	80 tokens per second, Llama 3.1 1B (INT4 weights, INT16 Act), 1 TOPS engine, 2MB internal memory, 64GB external peak bandwidth. Specified in TSMC 7nm, 1 GHz system clock, no sparsity/compression/pruning applied (though supported)
Layer Support	Standard NN functions, including Transformers, Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Support for custom operators.
Data types	FP16/FP32/INT4/INT8/INT10/INT12/INT16 Activations/Weights
Quantization	Software toolchain supports Expedera, customer-supplied, or third-party quantization. Mixed precision supported.
Latency	Deterministic performance guarantees, no back pressure
Frameworks	Hugging Face, Llama.cpp, PyTorch, TVM, ONNX. Tensor Flow and others supported

Key features

32 TFLOPS performance
Support for standard, custom, and proprietary neural networks
Readily customized to specific use cases and deployment needs
Full software stack provided, including compiler, estimator, scheduler, and quantizer
Runs LLM, CNN, RNN, DNN, LSTM, and other network types
Delivered as Soft IP (RTL) or GDS

Block Diagram

Origin Evolution for Edge block diagram

Benefits

Choose the Features You Need: Customization brings many advantages, including increased performance, lower latency, reduced power consumption, and eliminating dark silicon waste. Expedera works with edge device customers to understand their use case(s), PPA goals, and deployment needs during their design stage. Using this information, we configure Origin Evolution to create a customized solution that perfectly fits the application.
Reducing Memory Bandwidth: Origin Evolution's packet-architecture reduces memory requirements of popular LLMs like Llama 3.2 and Qwen1 by as much as 79%, saving system power and offering a much better utilized processor.
Efficient Resource Utilization: Origin Evolution for Edge scales to 32 TFLOPS in a single core, eliminating the memory sharing, security, and area penalty issues faced by lower-performing, tiled AI accelerator engines. Origin Evolution NPUs achieve sustained utilization averaging 80%, compared to the 20-40% industry norm, avoiding dark silicon waste.
Full Software Stack: Origin Evolution employs an easy-to-use software stack that allows the importing of trained networks from popular representations such as Hugging Face, Llama.cpp, PyTorch, TVM, ONNX, TensorFlow, and others, while providing various quantization options, automatic completion, compilation, estimator and profiling tools. It also supports multi-job APIs.
LLM, CNN, and other Network Support: Origin Evolution offers out-of-the-box support for 100+ popular neural networks, including Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others.

Specifications

Identity

Part Number

Origin Evolution for Edge

Vendor

Expedera

Type

Silicon IP

Files

Product Sheet Download

Note: some files may require an NDA depending on provider policy.

Provider

Expedera

HQ: USA

Expedera provides scalable neural engine semiconductor IP that enables major improvements in performance, power, and latency while reducing cost and complexity in AI inference applications. Third-party silicon validated and shipped in more than 10M customer devices, Expedera’s solutions produce superior performance and are scalable to a wide range of applications from edge nodes and smartphones to automotive. Expedera’s Origin™ Neural Processing Unit IP solutions are easily integrated, readily scalable, and customized to unique use cases and application requirements. The company is headquartered in Santa Clara, California, with engineering and sales offices around the globe.

Contact Expedera

Frequently asked questions about NPU IP cores

What is NPU IP Core for Edge?

NPU IP Core for Edge is a NPU IP core from Expedera listed on Semi IP Hub.

How should engineers evaluate this NPU?

Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this NPU IP.

Can this semiconductor IP be compared with similar products?

Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.

NPU IP Core for Edge

Overview

Key features

Block Diagram

Benefits

Specifications

Identity

Files

Provider

Learn more about NPU IP core

Can Your NPU Run DOOM? Chimera Can.

Heterogeneous NPU Data Movement Tax: Intel's Own Slides Tell the Story

The Upcoming NPU Shakeout

One Instruction Stream, Infinite Possibilities: The Cervell™ Approach to Reinventing the NPU

Legacy IP Providers Struggle to Solve the NPU Dilemna

Can You Rely Upon your NPU Vendor to be Your Customers' Data Science Team?

Frequently asked questions about NPU IP cores

What is NPU IP Core for Edge?

How should engineers evaluate this NPU?

Can this semiconductor IP be compared with similar products?