Vendor: Flow technology Category: Vector Processor

Parallel Processing Unit

Flow's Parallel Processing Unit solves one of computing's most fundamental dilemmas - parallel processing.

Overview

Flow's Parallel Processing Unit solves one of computing's most fundamental dilemmas - parallel processing.

The Parallel Processing Unit (PPU) is an IP block that integrates tightly with the CPU on the same silicon. It is designed to be highly configurable to specific meet requirements of numerous use cases. 

Customization options include:

  • Number of cores in PPU (4, 16, 64, 256, etc.)
  • Type and number of functional units (such as ALUs, FPUs, MUs, GUs, NUs)
  • Size of on-chip memory resources (caches, buffers, scratchpads)
  • Instruction set modifications tailored to complement the CPU’s instruction set extensions

CPU modifications are minimal, involving the integration of the PPU interface into the instruction set and an update to the number of CPU cores in to unlock new levels of performance.

Our parametric design enables extensive customization, including the number of PPU cores, the variety and number of functional units, and the size of on-chip memory resources. Your performance enhancement scales with the number of PPU cores. A 16-core PPU is ideal for small devices like smart watches; a 64-core PPU is well-suited for smartphones and PCs; and a 256-core PPU is excels in high-demand environments such as AI, cloud, and edge computing servers.

Core benefits of Flow's technology

100x performance boost

  • Flow's innovative Parallel Processing Unit (PPU) enhances CPU performance by up to 100 times, ushering in the era of SuperCPUs. 
  • Designed for full backward compatibility, the PPU boosts the performance of existing software and applications after recompilation. The more parallel the functionality, the greater the boost in performance. 
  • Flow's technology even enhances the entire computing ecosystem. While the CPU gains direct benefits, ancillary components—matrix units, vector units, NPUs, and GPUs—also see improved performance through the boosted CPU capabilities, all thanks to the PPU.

2X faster legacy software and applications

  • Flow’s PPU enhances legacy code without altering the original application and greater performance gains when paired with recompiled operating system or programming system libraries. 
  • The result? Significant speed improvements across a wide range of applications, especially those that exhibit parallelism but are constrained by traditional thread-based processing. Our PPU unlocks the full potential of these applications, delivering significant performance gains where previous architectures fell short.

Parametric design

  • The configurable, parametric design of the PPU allows it to adapt to diverse uses. Everything can be tailored to meet specific requirements of multiple use cases. And we do mean everything. The number of PPU cores—4, 16, 64, 256, or more— and the type and number of functional units, such as ALUs, FPUs, MUs, GUs, and NUs. Even the size of on-chip memory resources—caches, buffers, and scratchpads—can be tailored to suit specific requirements.
  • The performance scalability is directly linked to the number of PPU cores. A PPU with 4 cores is ideal for small devices like smart watches, a 16-core PPU is perfect for smartphones, and a 64-core PPU delivers excellent performance for PCs. For servers, a PPU with 256 cores is recommended, enabling them to handle the most demanding computing tasks with ease.

Block Diagram

Benefits

  • Flow technology is fully backwards compatible with all existing legacy software and applications. The PPU's compiler automatically recognizes parallel parts of the code and executes those in PPU cores. 
  • What’s more, Flow is developing an AI tool to help application and software developers identify parallel parts of the code and to propose methods of streamlining those for maximum performance.

Specifications

Identity

Part Number
PPU
Vendor
Flow technology
Type
Silicon IP

Files

Note: some files may require an NDA depending on provider policy.

Provider

Flow technology
HQ: Finland
Flow Computing is a pioneering startup based in Helsinki, Finland enabling the next generation of CPU performance for the most demanding applications, such as locally-hosted AI and general-purpose parallel computing. Capable of being integrated into any current or pending design architecture or process geometry, Flow provides a revolutionary up to 100-fold acceleration that is usable in standard products to enable “CPU 2.0”-levels of throughput and eliminating the need for expensive GPU acceleration of CPU instructions. This revolutionary performance leap is achieved using the company’s patented Parallel Processing Unit (PPU) licensable architecture and compiler ecosystem - enabling developers to choose their preferred ratio of raw performance for new apps vs. maintaining legacy code and application compatibility when required. Flow's groundbreaking architecture also turbocharges embedded systems and data centers, for uses such as edge and cloud computing, AI clouds, multimedia codecs across 5G/6G, autonomous vehicle systems, military-grade computing and more.

Learn more about Vector Processor IP core

MultiVic: A Time-Predictable RISC-V Multi-Core Processor Optimized for Neural Network Inference

Real-time systems, particularly those used in domains like automated driving, are increasingly adopting neural networks. From this trend arises the need for high-performance hardware exhibiting predictable timing behavior. While state-of-the-art real-time hardware often suffers from limited memory and compute resources, modern AI accelerators typically lack the crucial predictability due to memory interference. The authors present a new hardware architecture to bridge this gap between performance and predictability.

Integrating eFPGA for Hybrid Signal Processing Architectures

As system requirements evolve toward multi-standard, reconfigurable platforms, signal processing architectures are under pressure to deliver both ASIC-class performance and software-like flexibility. Semiconductor engineers face a fundamental tradeoff: fixed logic yields, unmatched throughput, and efficiency, but cannot adapt once taped out. Software-programmable solutions offer flexibility but often miss hard real-time performance constraints and can consume more power.

FeNN-DMA: A RISC-V SoC for SNN acceleration

Spiking Neural Networks (SNNs) are a promising, energy-efficient alternative to standard Artificial Neural Networks (ANNs) and are particularly well-suited to spatio-temporal tasks such as keyword spotting and video classification. However, SNNs have a much lower arithmetic intensity than ANNs and are therefore not well-matched to standard accelerators like GPUs and TPUs. Field Programmable Gate Arrays (FPGAs) are designed for such memory-bound workloads and here we develop a novel, fully-programmable RISC-V-based system-on-chip (FeNN-DMA), tailored to simulating SNNs on modern UltraScale+ FPGAs.

Frequently asked questions about Vector Processor IP cores

What is Parallel Processing Unit?

Parallel Processing Unit is a Vector Processor IP core from Flow technology listed on Semi IP Hub.

How should engineers evaluate this Vector Processor?

Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this Vector Processor IP.

Can this semiconductor IP be compared with similar products?

Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.

×
Semiconductor IP