AI Processor Accelerator

Overview

Introducing Gyrus's ground-breaking AI Processor Accelerator IP, coupled with a native graph processing software stack, is the ultimate solution for seamless Neural Network implementation. We have cracked the code on scalability, programmability, and power consumption with a tightly integrated Hardware & Software approach.
The secret to our success lies in our efficient utilisation of compute elements and intelligent memory reuse through reinforcement learning based software, ensuring a seamless ?ow of data to the compute engines with minimal cycles unused. With Gyrus's compilers and software tools, you can effortlessly port any neural network to our hardware accelerator, unlocking exceptional efciency, even with substantial activations and weights.

Our compilers streamline hardware con?guration, reducing SOC complexity and power consumption while enabling AI algorithms to run smoothly on edge devices. The scheduler is a neural scheduler search based on Reinforcement learning. With a cycle-accurate C-Model, we create a Digital Twin of the NNA IP, ensuring long-term model deployment efciency. Elevate your edge device capabilities with Cortisoft from Gyrus!

Key Features

  • OPTIMIZED COMPUTATION - >80% Utilization
  • LOW MEMORY - 16X Reduced
  • SPEED - 10-30x Lower Clock Cycles
  • HIGH Efficiency - 30 TOPs/W
  • LOW POWER - 10-20x lower
  • SMALL AREA - 10-8x smaller die area
  • Scalable RTL via parameters for performance and power
  • Number of ALU
  • Number of clusters
  • Activation memory size per cluster
  • (Local memory can be 256KB, 512KB typical) DDR or No DDR – external memory
  • Internal system memory
  • External shared memory (optional)
  • No long interconnects or interconnect fabric Designed for high speed and/or HVt cells Synthesized and P&R complete at 800MHz, 7nm Hardware con?guration input to compiler

Benefits

  • Universal Compatibility: Supports any framework, neural network, and backbone.
  • Large Input Frame Handling: Accommodates large input frames without downsizing.
  • Parallel Design: Achieves high performance at low operational frequency.
  • Memory Effciency: Reduces memory usage with data traversal-based optimization.
  • Versatile Stationary Modes: Effciently manages both input-stationary and weight-stationary setups.
  • Model & Activation: Memory stays in sleep most of the time.
  • Graph SIMD Compiler: Enables effcient network deployment.
  • Optimal Data Movement and Compute Instructions:
  • Maximizes AI performance.
  • Memory Architecture: Drastically minimizes data movement and conserves memory.
  • Sparse NN Implementation: Effciently handles sparse neural networks, reducingmodel size and compute demands by over 3 times.
  • Minimal Host Code Dependency:Requires very low host code support for AI workloads.

Block Diagram

AI Processor Accelerator Block Diagram

Applications

  • Automotive
  • Smart Devices
  • Security & Surveillance
  • IoT
  • High Performance Computing
  • Robotics

Technical Specifications

Maturity
In production
Availability
Available
×
Semiconductor IP