Vendor: Expedera Category: NPU

NPU IP Core for Automotive

Whether deployed in-cabin for driver distraction or in the driver assistance system (ADAS) stack for object recognition and point…

Overview

Whether deployed in-cabin for driver distraction or in the advanced driver assistance system (ADAS) stack for object recognition and point cloud processing, AI forms the backbone of the future of safer, smarter cars.

AI Demands Higher Performance

Auto makers are adding more AI, including advanced LLM and multimodal capabilities, as they enable safety and usability use cases such as autonomous driving/ADAS, driver attention monitoring, passenger detection, and infotainment. While local inference is essential for all safety-critical systems, LLMs may be 20 to 50X larger than more traditional AI networks. Automotive inference capabilities on today's processors are already limited, and automakers are looking for alternative, more efficient architectures for next generation systems.

Ideal Processing Architecture

Origin Evolution™ for Automotive offers out-of-the-box compatibility with popular LLM and CNN networks. Attention-based processing optimization and advanced memory management ensure optimal AI performance across a variety of today’s standard and emerging neural networks. Featuring a hardware and software co-designed architecture, Origin Evolution for Automotive scales to 96 TFLOPS in a single core, with multi-core performance to PetaFLOPs.

Innovative Architecture

Origin Evolution uses Expedera’s unique packet-based architecture to achieve unprecedented NPU efficiency. Packets, which are contiguous fragments of neural networks, are an ideal way to overcome the hurdle of large memory movements and differing network layer sizes, which are exacerbated by LLMs. Packets are routed through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when running different LLM and CNN networks. Origin Evolution includes a high-speed external memory streaming interface that is compatible with the latest DRAM and HBM standards.

Specifications

Compute Capacity up to 48 FP16 MACs
Multi-tasking Run Simultaneous Jobs
Example Networks Supported Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others, including proprietary/black box networks
Example Performance 261 tokens per second, DeepSeek v3 token generation, 64 TFLOPS engine, batch size of 512, 256 GB/s external peak bandwith, 4.391W peak power consumption. Specified in TSMC 7nm, 1 GHz system clock, no sparsity/compression/pruning applied (though supported)
Layer Support Standard NN functions, including Transformers, Conv, Deconv, FC, Activations, Reshape, Concat, Elementwise, Pooling, Softmax, others. Support for custom operators.
Data types FP16/FP32/INT4/INT8/INT10/INT12/INT16 Activations/Weights
Quantization Software toolchain supports Expedera, customer-supplied, or third-party quantization. Mixed precision supported.
Latency Deterministic performance guarantees, no back pressure
Frameworks Hugging Face, Llama.cpp, PyTorch, TVM, ONNX. Tensor Flow and others supported
Safety ASIL-B readiness, ISO 9001:2015

Key features

  • 96 TFLOPS performance
  • Support for standard, custom, and proprietary neural networks
  • Readily customized for specific use cases and deployment needs
  • Full software stack provided, including compiler, estimator, scheduler, and quantizer
  • Runs LLM, CNN and other network types
  • Delivered as Soft IP (RTL) or GDS

Block Diagram

Benefits

  • Choose the Features You Need: Customization brings many advantages, including increased performance, lower latency, reduced power consumption, and eliminating dark silicon waste. Expedera works with automotive customers to understand and optimize to their use case(s), PPA goals, and deployment needs during their design stage. Using this information, we configure Origin Evolution to create a customized solution that perfectly fits the application.
  • Reducing Memory Bandwidth: Origin Evolution's packet-architecture reduces memory requirements of popular LLMs like Llama 3.2 and Qwen1 by as much as 79%, saving system power and offering a much better utilized processor.
  • Efficient Resource Utilization: Origin Evolution for Automotive scales to 96 TFLOPS in a single core, eliminating the memory sharing, security, and area penalty issues faced by lower-performing, tiled AI accelerator engines. Origin Evolution NPUs achieve sustained utilization averaging 80%—compared to the 20-40% industry norm—avoiding dark silicon waste.
  • Full Software Stack: Origin Evolution employs an easy-to-use software stack that allows the importing of trained networks from popular representations such as Hugging Face, Llama.cpp, PyTorch, TVM, ONNX, TensorFlow, and others, while providing various quantization options, automatic completion, compilation, estimator and profiling tools. It also supports multi-job APIs.
  • LLM, CNN, and other Network Support: Origin Evolution offers out-of-the-box support for 100+ popular neural networks, including Llama2, Llama3, ChatGLM, DeepSeek, Mistral, Qwen, MiniCPM, Yolo, MobileNet, and many others.

Files

Note: some files may require an NDA depending on provider policy.

Specifications

Identity

Part Number
Origin Evolution for Automotive
Vendor
Expedera

Provider

Expedera
HQ: USA
Expedera provides scalable neural engine semiconductor IP that enables major improvements in performance, power, and latency while reducing cost and complexity in AI inference applications. Third-party silicon validated and shipped in more than 10M customer devices, Expedera’s solutions produce superior performance and are scalable to a wide range of applications from edge nodes and smartphones to automotive. Expedera’s Origin™ Neural Processing Unit IP solutions are easily integrated, readily scalable, and customized to unique use cases and application requirements. The company is headquartered in Santa Clara, California, with engineering and sales offices around the globe.

Learn more about NPU IP core

Heterogeneous NPU Data Movement Tax: Intel's Own Slides Tell the Story

At Quadric, we have long argued that heterogeneous NPU designs — those that stitch together multiple specialized fixed-function engines — carry an unavoidable hidden cost: data has to move. A lot. And data movement burns power, adds latency, and creates silicon-area overhead that scales with every new generation of AI models. Now, Intel has made that case for us.

The Upcoming NPU Shakeout

The IP industry is no stranger to boom and bust cycles, and it looks to be at the crest of another wave.

Frequently asked questions about NPU IP cores

What is NPU IP Core for Automotive?

NPU IP Core for Automotive is a NPU IP core from Expedera listed on Semi IP Hub.

How should engineers evaluate this NPU?

Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this NPU IP.

Can this semiconductor IP be compared with similar products?

Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.

×
Semiconductor IP