Vendor: Ceva, Inc. Category: NPU

Scalable Edge NPU IP for Generative AI

Ceva-NeuPro-M is a scalable NPU architecture, ideal for transformers, Vision Transformers (ViT), and generative AI applications, …

Overview

Ceva-NeuPro-M is a scalable NPU architecture, ideal for transformers, Vision Transformers (ViT), and generative AI applications, with an exceptional power efficiency of up to 3500 Tokens-per-Second/Watt for a Llama 2 and 3.2 models

The Ceva-NeuPro-M Neural Processing Unit (NPU) IP family delivers exceptional energy efficiency tailored for edge computing while offering scalable performance to handle AI models with over a billion parameters. Its innovative architecture, which has won multiple awards, introduces significant advancements in power efficiency and area optimization, enabling it to support massive machine-learning networks, advanced language and vision models, and multi-modal generative AI. With a processing range of 4 to 200 TOPs per core and leading area efficiency, the Ceva-NeuPro-M optimizes key AI models seamlessly. A robust tool suite complements the NPU by streamlining hardware implementation, model optimization, and runtime module composition.

The Solution

The Ceva-NeuPro-M NPU IP family is a highly scalable, complete hardware and software IP solution for embedding high performance AI processing in SoCs across a wide range of edge AI applications.

The heart of the NeuPro-M NPU architecture is the computational unit. Scalable from 4 to 20 TOPs, a single computational unit comprises a multiple-MAC parallel neural computing engine, activation and sparsity control units, an independent programable vector-processing unit, plus local shared L1 memory and a local unit controller. A core may contain up to eight of these computational units, along with a shared Common Subsystem comprising functional-safety, data-compression, shared L2 memory, and system interfaces.

These NPU cores may be grouped into multi-core clusters to reach performance levels in excess of 2000 TOPS.

Key features

  • Support wide range of activations & weights data types, from 32-bit Floating Point down to 2-bit Binary Neural Networks (BNN)
  • Unique mixed precision neural engine MAC array micro architecture to support data type diversity with minimal power consumption
  • Out-of-the-box, untrained Winograd transform engine that allows to replace traditional convolution methods while using 4-bit, 8-bit, 12-bit or 16-bit weights and activations, increasing efficiency in a factor of 2x with <0.5% precision degradation
  • Unstructured Sparsity engine to avoid operations with zero-value weights or activations of every layer along the inference process. With up to 4x in performance, sparsity will also reduce memory bandwidth and power consumption
  • Simultaneous processing of the Vector Processing Unit (VPU), a fully programmable processor for handling any future new neural network architectures to come
  • Lossless Real-time Weight and Data compression/decompression, for reduced external memory bandwidth
  • Scalability by applying different memory configuration per use-case and inherent single core with 1-8 multiengine architecture system for diverse processing performance
  • Secure boot and neural network weights/data against identity theft
  • 2 degrees of freedom parallel processing
  • Memory hierarchy architecture to minimize power consumption attributed to data transfers to and from an external SDRAM as well as optimize overall bandwidth consumption
  • Management controllers decentralized architecture with local data controller on each engine to achieve optimized data tunneling for low bandwidth and maximal utilization as well as efficient parallel processing schema
  • Supports next generation NN architectures like: fully-connected (FC), FC batch, RNN, transformers (self-attention), 3D convolution and more…
  • The NeuPro-M AI processor architecture includes the following processor options:
    • NPM11 – A single NPM engine, with processing power of up to 20 TOPS
    • NPM18 – An Octa NPM engine, with processing power of up to 160 TOPS
  • Matrix Decomposition for up to 10x enhanced performance during network inference

Block Diagram

Benefits

  • The Ceva-NeuPro-M NPU family empowers edge AI and cloud inference system developers with the computing power and energy efficiency needed to implement complex AI workloads. It enables seamless deployment of multi-modal models and transformers in edge AI SoCs, delivering scalable performance for real-time, on-device processing with enhanced flexibility and efficiency.
  • The family can scale from smaller systems, for such applications as security cameras, robotics, or IoT devices, with performance starting at 4 TOPS, up to 2,000 TOPS multi-core systems capable of LLM and multi-modal generative AI, all using the same hardware units, core structure, and AI software stack.
  • Equally important, NeuPro-M provides direct hardware support for vital model optimizations, including variable precision for weights and hardware-supported addressing of sparse arrays. An independent vector processor on each computational unit supports novel computations that might be required in future AI algorithms. The family also provides hardware support for functional safety ISO 26262 ASIL-B.
  • With its enormous scalability and the Ceva-NeuPro Studio full AI software stack, the Ceva-NeuPro-M family is the fastest route to a shippable implementation for an edge-AI chip or SoC.

Applications

  • Consumer IoT
  • Automotive
  • Infrastructure 
  • Mobile
  • PC 

Files

Note: some files may require an NDA depending on provider policy.

Specifications

Identity

Part Number
Ceva NeuPro-M
Vendor
Ceva, Inc.

Provider

Ceva, Inc.
HQ: USA
The Smart Edge runs on Ceva! Ceva is the leader in innovative silicon and software IP solutions that enable smart edge products to connect, sense, and infer data more reliably and efficiently. At Ceva, we are passionate about the smart edge. Providing the technology and market expertise our customers need to be successful is what we do best, and we’ve been doing it for over 30 years. With the industry’s only portfolio of comprehensive communications and scalable edge AI IP, Ceva powers the connectivity, sensing, and inference in today’s most advanced smart edge products across consumer IoT, mobile, automotive, infrastructure, industrial, and personal computing. More than 17 billion of the world’s most innovative smart edge products from smartphones to drones to cellular base stations and more are powered by Ceva. We create innovative technologies that help our customers turn great ideas into extraordinary products. We license our portfolio of wireless communications and scalable edge AI IP to our customers, breaking down barriers to entry and enabling them to bring new cutting-edge products to market faster, more reliably, efficiently, and economically. Ceva is a trusted partner to over 400 of the leading semiconductor and OEM companies including Actions, Artosyn, ASR, Atmosic, Autotalks, Beken, Bestechnic, Brite, Broadcom, Celeno, Ceragon, Cirrus Logic, Dialog Semiconductor, DSP Group, Espressif, FujiFilm, GCT Semi, iCatch, InPlay, Intel, Itron, Leadcore, LG Electronics, Mediatek, Microchip, Nextchip, Nokia, Novatek, NXP, ON Semiconductor, Optek, Oticon, Panasonic, RDA, Renesas, Rockchip, Rohm, Samsung, Sanechips, Sharp, Siflower, SigmaStar, Socionext, Sony, Socionext, Sonova, STMicroelectronics, Toshiba, Unisoc, Vatics, Yamaha and ZTE all leverage Ceva’s industry-leading IP. These companies incorporate our IP into application-specific integrated circuits (“ASICs”) and application-specific standard products (“ASSPs”) that they manufacture, market and sell to consumer electronics companies. Headquartered in Rockville, Maryland, Ceva has over 400 employees worldwide, with design centers in Israel, Ireland, France, United Kingdom, United States, Serbia, and sales and support offices located in Europe, the U.S. and throughout Asia. Ceva is a sustainable and environmentally conscious company, adhering to our Code of Business Conduct and Ethics. As such, we emphasize and focus on environmental preservation, recycling, the welfare of our employees and privacy – which we promote on a corporate level. At Ceva, we are committed to social responsibility, values of preservation and consciousness towards these purposes.

Learn more about NPU IP core

Heterogeneous NPU Data Movement Tax: Intel's Own Slides Tell the Story

At Quadric, we have long argued that heterogeneous NPU designs — those that stitch together multiple specialized fixed-function engines — carry an unavoidable hidden cost: data has to move. A lot. And data movement burns power, adds latency, and creates silicon-area overhead that scales with every new generation of AI models. Now, Intel has made that case for us.

The Upcoming NPU Shakeout

The IP industry is no stranger to boom and bust cycles, and it looks to be at the crest of another wave.

Frequently asked questions about NPU IP cores

What is Scalable Edge NPU IP for Generative AI?

Scalable Edge NPU IP for Generative AI is a NPU IP core from Ceva, Inc. listed on Semi IP Hub.

How should engineers evaluate this NPU?

Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this NPU IP.

Can this semiconductor IP be compared with similar products?

Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.

×
Semiconductor IP