GPNPU Processor IP - 32 to 864TOPs
Designed from the ground up to address significant machine learning (ML) inference deployment challenges facing system on chip (S…
Overview
Designed from the ground up to address significant machine learning (ML) inference deployment challenges facing system on chip (SoC) developers, the Chimera(TM) general purpose neural processor (GPNPU) family has a simple yet powerful architecture with demonstrated improved matrix-computation performance over the traditional approach. Its crucial differentiation is its ability to execute diverse workloads with great flexibility all in a single processor.
The Chimera GPNPU family provides a unified processor architecture that can handle matrix and vector operations and scalar (control) code in one execution pipeline. In conventional SoC architectures these workloads are traditionally handled separately by an NPU, DSP, and realtime CPU, requiring splitting code and tuning performance across two or three heterogenous cores. The Chimera GPNPU is a single software-controlled core, allowing for simple expression of complex parallel workloads.
The Chimera GPNPU is entirely driven by code, empowering developers to continuously optimize the performance of their models and algorithms throughout the device’s lifecycle. That's why it's ideal to run classic backbone networks, todays' newest Vision Transformers and Large Language Models, and whatever new networks are invented tomorrow.
Modern System-on-Chip (SoC) architectures deploy complex algorithms that mix traditional C++ based code with newly emerging and fast-changing machine learning (ML) inference code. This combination of graph code commingled with C++ code is found in numerous chip subsystems, most prominently in vision and imaging subsystems, radar and lidar processing, communications baseband subsystems, and a variety of other data- rich processing pipelines. Only Quadric’s Chimera GPNPU architecture can deliver high ML inference performance and run complex, data-parallel C++ code on the same fully programmable processor.
Compared to other ML inference architectures that force the software developer to artificially partition an algorithm solution between two or three different kinds of processors, Quadric’s Chimera processors deliver a massive uplift in software developer productivity while also providing current-day graph processing efficiency coupled with long-term future-proof flexibility.
Quadric’s Chimera GPNPUs are licensable processor IP cores delivered in synthesizable source RTL form. Blending the best attributes of both neural processing units (NPUs) and digital signal processors (DSPs), Chimera GPNPUs are aimed at inference applications in a variety of high-volume end applications including mobile devices, digital home applications, automotive and network edge compute systems.
Key features
- Hybrid Von Neuman + 2D SIMD matrix architecture
- 64b Instruction word, single instruction issue per clock
- 7-stage, in-order pipeline
- Scalar / vector / matrix instructions modelessly intermixed with granular predication
- Deterministic, non-speculative execution delivers predictable performance levels
- Configurable AXI Interfaces to system memory (independent data and instruction access)
- Configurable Instruction cache (64/128/256K)
- Distributed tightly coupled local register memories (LRM) with data broadcast networks within the matrix array allows overlapped compute and data movement to maximize performance
- Local L2 data memory (multi-bank, configurable 1MB to 16MB) minimizes off-chip DDR access, lowering power dissipation
- Optimized for INT8 machine learning inference (with optional FP16 floating point MAC support, and A8W4 MAC option) plus 32b ALU DSP ops for full C++ compiler support
- Compiler-driven, fine-grained clock gating delivers power savings
Block Diagram
Benefits
System Simplicity
- Quadric’s solution enables hardware developers to instantiate a single core that can handle an entire AI/ML workload plus the typical digital signal processor functions and signal conditioning workloads often intermixed with inference functions. Dealing with a single core drastically simplifies hardware integration and eases performance optimization. System design tasks such as profiling memory usage and estimating system power consumption are greatly simplified.
Programming Simplicity
- Quadric’s Chimera GPNPU architecture dramatically simplifies software development since matrix, vector, and control code can all be handled in a single code stream. Graph code from the common training toolsets (Tensorflow, Pytorch, ONNX formats) is compiled by the Quadric SDK and can be merged with signal processing code written in C++, all compiled into a single code stream running on a single processor core.
- Quadric’s SDK meets the demands of both hardware and software developers, who no longer need to master multiple toolsets from multiple vendors. The entire subsystem can be debugged in a single debug console. This can dramatically reduce code development time and ease performance optimization.
- This new programming paradigm also benefits the end users of the SoCs since they will have access to program all the GPNPU resources.
Future Proof Flexibility
- A Chimera GPNPU can run any AI/ML graph that can be captured in ONNX, and anything written in C++. This is incredibly powerful since SoC developers can quickly write code to implement new neural network operators and libraries long after the SoC has been taped out. This eliminates fear of the unknown and dramatically increases a chip’s useful life.
- This flexibility is extended to the end users of the SoCs. They can continuously add new features to the end products, giving them a competitive edge.
- Replacing a legacy heterogenous AI subsystem comprised of separate NPU, DSP, and realtime CPU cores with one GPNPU has obvious advantages. By allowing vector, matrix, and control code to be handled in a single code stream, the development and debug process is greatly simplified while the ability to add new algorithms efficiently is greatly enhanced.
- As ML models continue to evolve and inferencing becomes prevalent in even more applications, the payoff from this unified architecture helps future-proof chip design cycles.
Files
Note: some files may require an NDA depending on provider policy.
Specifications
Identity
Provider
Learn more about Edge AI Accelerator IP core
Using edge AI processors to boost embedded AI performance
The Industry’s First USB4 Device IP Certification Will Speed Innovation and Edge AI Enablement
Accelerating Your Development: Simplify SoC I/O with a Single Multi-Protocol SerDes IP
IoT Was Interesting, But Follow the Money to AI Chips
Designing Energy-Efficient AI Accelerators for Data Centers and the Intelligent Edge
Frequently asked questions about Edge AI Accelerator IP cores
What is GPNPU Processor IP - 32 to 864TOPs?
GPNPU Processor IP - 32 to 864TOPs is a Edge AI Accelerator IP core from Quadric listed on Semi IP Hub.
How should engineers evaluate this Edge AI Accelerator?
Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this Edge AI Accelerator IP.
Can this semiconductor IP be compared with similar products?
Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.