Vendor: AImotive Category: Edge AI Accelerator

Neural Network Acceleration for Automotive AI

State-of-the-art NPU for automotive inference, with many features built-in to maximize performance for a wide range of automotive AI applications.

Overview

The latest aiWare5 offers the flexibility needed to support a wide range of AI workloads—including SSMs, VTs, and LLMs—and offers industry-leading tools that transform how AI-enabled ADAS software is developed and validated.

Our products are designed to accelerate the realization of your automated driving goals. Click the button below to download our latest aiWare benchmark document and scroll down to see what we can bring to the table. 

High-performance, versatility and scalability

The aiWare hardware IP Core delivers high performance at low power, thanks to its class-leading highly optimized hardware architecture. With up to 256 sparse TOPS per core, aiWare5 is highly scalable for use in applications ranging from sensor edge processors and domain/zone controllers to high-performance centralized processing. While the core focus remains on ADAS perception workloads, the versatile hardware architecture also enables other applications, such as driver and occupant monitoring (DMS/OMS), language models, sensor tuning, and more.

Unique SDK plus optimized aiDrive-based AI

The aiWare Studio SDK has been recognized and acclaimed by OEMs and Tier1s around the world for its innovative approach to embedded NN optimization, focusing on iterating the NN itself, not just the NPU code used to execute it. This gives NN designers far more flexibility when implementing their NNs for production hardware platforms than other NPU solutions. The offline performance estimator within aiWare Studio is another highlight, delivering accurate performance estimation within 5% of final silicon on any desktop PC. And thanks to the comprehensive portfolio of aiDrive modular software for automotive AI, a wide range of software solutions fully optimized for aiWare are also available.

Powering the most demanding L2-L4 automated driving applications

As showcased by aiDrive deployed on aiWare enabled silicon, aiWare delivers the exact features needed for production automotive deployment, not just claiming high theoretical benchmark results in the lab. The underlying technology for aiWare has been engineered from the ground up for automotive production deployment. This ensures that aiWare customers can be confident that the silicon budget is extremely well utilized, providing support for features typically not benchmarked but required by ADAS neural networks—such as multi-input, multi-headed network support, robust functional safety, minimal inference-time jitter, and SIL testing capability. This is the main benefit of sourcing an NPU from an ADAS expert engineering team.

Key features

Performance

256 TOPS per core @ 2GHz

Scalability

From 1 TOPS up to 1000+ TOPS (using multiple cores)

MACs/Cycle

Up to 65,536 MACs/core (BF16 or INT32 internal accuracy)

Support functions

Wide range of Activation, Pooling, Unary, Binary, Tensor Shaping, Attention and Linear operations to ensure 100% NN (DNN, VT, SSM, LLM and more) execution within aiWare NPU with no host CPU intervention

Configurability

•    Number of MACs
•    Size of on-chip local tightly-coupled SRAM & WFRAM
•    Safety features (ASIL-B standard)
•    Generic interfaces for both host CPU and local LPDDR or shared memory

NN depth and graph format

No depth and format limit. Excellent multi-headed and multi-input NN support

Quantization

State-of-the-art and constantly updated quantization algorithms shipped with SDK. SDK enables the application of proprietary quantization schemes/strategies. Underlying arithmetic ensures very low accuracy loss.

Data types

INT8 or FP8 Native
32-bit internal precision and dynamic per-layer scaling

ISO26262 safety

Compliance

As the only NPU certified for ISO 26262 compliance as a Safety Element out of Context (SEooC), rather than just process compliance featured by competitors, this solution delivers unmatched functional safety and significantly reduces integration effort for ADAS-targeted silicon.

Hardware

Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives

Software

Tools and runtime support developed using ISO26262-compliant processes

Memory

Core SRAM

Up to 16MBytes per core (configurable)

Wavefront SRAM

1-64MBytes per core (configurable)

External Memory

Dedicated off-chip DRAM or shared SOC memory 

Bandwidth reduction

On-chip compression
Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer

Main interface

AXI4 to LPDDR
AXI4 to host

Neural network development frameworks

Frameworks supported

Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF

Inference deployment

Binary compiled using aiWare Studio or command line tools offline
Single binary contains one or multiple NNs, weights and all scheduling info

Software runtime

Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request

Development Tools

aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI

Evaluation Tools

aiWare Studio features offline performance estimator accurate to within 5% of final silicon
CPU- and GPU-based emulators
FPGA implementations also available

Application validation tools

Application validation tools

Benefits

Scalable performance

The aiWare5 RTL is fully synthesizable to deliver up to 256 TOPS per core @ 2GHz at 5nm, using standard libraries and memory compilers. The RTL is designed for easy integration, with no large buses and tiled computation modules to enable easy implementation at high clock speeds.

De-risking and accelerating the path to production

A GPU-accelerated, bit-accurate emulator enables faster-than-real-time SIL testing without target hardware, removing a major development bottleneck. Instead of sourcing hundreds of units and building costly labs, aiWare chips let teams use existing cloud or on-prem GPUs to run full stacks—delivering faster test cycles, lower costs, and virtually unlimited scalability.

Offering the right level of versitality

Built for state-of-the-art multi-input, multi-headed neutral networks uniquely found in automotive perception NN-s. With its versitile engine, no matter if the backbone is a VT (Vision Transformer), a SSM (State Space Model) or a CNN (Convolutional Neural Network) based architecture, aiWare will handle it with ease. Looking beyond perception? With the right configuration, aiWare can handle LLM-s (Large Language Models) too!

Applications

  • Automotive Inference for automated driving
  • High performance automotive multi-camera perception
  • Large camera NN processing (no upper limit on input resolution)
  • High data rate heterogeneous multi-sensor fusion

Specifications

Identity

Part Number
aiWare
Vendor
AImotive
Type
Silicon IP

Files

Note: some files may require an NDA depending on provider policy.

Provider

AImotive
HQ: Hungary
aiMotive is an automotive technology powerhouse working on level-agnostic automated driving solutions. The company delivers an integrated portfolio of tools and embedded solutions that enable customers to rapidly develop and deploy production automated driving features, combining in-house expertise with aiMotive modular capabilities while achieving substantial reductions in development costs and timescales. The company’s product portfolio has been validated in mass production programs.

Learn more about Edge AI Accelerator IP core

RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI

While lightweight architectures like MobileNetV2 employ Depthwise Separable Convolutions (DSC) to reduce computational complexity, their multi-stage design introduces a critical performance bottleneck inherent to layer-by-layer execution: the high energy and latency cost of transferring intermediate feature maps to either large on-chip buffers or off-chip DRAM. To address this memory wall, this paper introduces a novel hardware accelerator architecture that utilizes a fused pixel-wise dataflow.

Accelerating Your Development: Simplify SoC I/O with a Single Multi-Protocol SerDes IP

Enter the Multi-Protocol SerDes (Serializer/Deserializer)—a flexible, reusable IP block that allows a single PHY to support multiple serial communication protocols, such as PCIe, SATA, Ethernet, USB, and more. This approach enables SoC vendors to meet diverse customer requirements and application needs without redesigning I/O for each target market.

Frequently asked questions about Edge AI Accelerator IP cores

What is Neural Network Acceleration for Automotive AI?

Neural Network Acceleration for Automotive AI is a Edge AI Accelerator IP core from AImotive listed on Semi IP Hub.

How should engineers evaluate this Edge AI Accelerator?

Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this Edge AI Accelerator IP.

Can this semiconductor IP be compared with similar products?

Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.

×
Semiconductor IP