Vendor: ZeroPoint Technologies AB Category: Data Compression

High-Performance Memory Expansion IP for AI Accelerators

AI inference performance is increasingly constrained by memory bandwidth and capacity - not compute.

Overview

AI inference performance is increasingly constrained by memory bandwidth and capacity - not compute. Especially in large language models (LLMs), where the Prefill stage is compute-bound, but the Decode stage - critical for low-latency, real-time applications - is memory-bound. Traditional approaches like quantization, pruning and distillation reduce memory usage but compromise accuracy.

The AI-MX offers a complementary solution: lossless, nanosecond-latency memory compression, enabling up to 1.5x more effective HBM or LPDDR capacity and bandwidth. AI-MX enhances AI accelerators by transparently compressing memory traffic - without changes to DMA logic or SoC architecture.

Standards

Compression: XZI (proprietary)
Interface: AXI4, CHI
Plug in compatible with industry standard memory controllers

Architecture

Modular architecture, enables seamless scalability
Architectural configuration parameters accessible to fine tune performance

Integration

AI-MX is available in two versions:

AI-MX G1 - Asymmetrical Architecture

Models are compressed in software before deployment
Hardware performs in-line decompression at runtime
Suitable for static model data
Up to 1.5x expansion

AI-MX G2 - Symmetrical Architecture

Full in-line hardware compression + decompression
Accelerates not just the model, but also KV-cache, activations, and runtime data
Up to 2x expansion on unstructured data
Ideal for dynamic, memory-bound workloads

Performance / KPI

Feature	Performance
Compression ratio:	1.5x for LLM model data at BF16 1,35x - 2.0x for dynamic data i.e. KV cache and activiations
Throughput:	Matches the throughput of the HBM and LPDDR memory channel
Frequency:	Up to 1.75 GHz (@5nm TSMC)
IP area:	(Throughput dependant - contact for information)
Memory technologies supported:	HBM, GDDR, LPDDR, DDR, SRAM

Key features

SW model compression and in-line hardware accelerated decompression (Gen 1)
Hardware accelerated in-line compression and decompression (Gen 2)
On-the-fly Multi-algorithm switching capability without recompression
Cache line granularity to enable high throughput and low latency

Block Diagram

AI-MX block diagram

Benefits

+50% Effective HBM Capacity - Compress model and runtime data to fit 150 GB of workload into 100 GB of physical memory.
+50% Bandwidth Efficiency - Transfer more data per memory transaction, accelerating model throughput - tokens per second - without increasing power.
+50% More Throughput from the Same Silicon - An AI accelerator with 4 HBM stacks behaves like it has 6 - without changing the memory controller.
Transparent System Integration - Includes a patented, real-time Address Translation Unit for seamless compressed memory access.

Applications

AI Inference Accelerators (Datacenter, Edge)
Generative AI (LLMs like LLaMA3, GPT, Claude)
Smart devices with LPDDR bottlenecks
Automotive & Embedded AI Systems
Any SoC using HBM, LPDDR, GDDR or DDR

What’s Included?

Performance evaluation license C++ compression model for integration in customer performance simulation model
HDL Source Licenses
- Synthesizable System Verilog RTL (encrypted)
- Implementation constraints
- UVM testbench (self-checking)
- Vectors for testbench and expected results
- User Documentation

Specifications

Identity

Part Number

AI-MX

Vendor

ZeroPoint Technologies AB

Type

Silicon IP

Files

Product Sheet Download

Note: some files may require an NDA depending on provider policy.

Provider

ZeroPoint Technologies AB

HQ: Sweden

ZeroPoint Technologies is a fabless silicon IP (intellectual property) provider of real-time main memory compression and encryption technology. ZeroPoint Technologies is offering a patented memory compression technology for high performance System on a Chip (SoC) processor subsystems. The technology effectively doubles the main memory capacity and memory bandwidth. Our IP-block is placed on the main memory access path and is transparent to the operating system and applications. ZeroPoint also offers main memory encryption, a technology that delivers high throughput and high security memory encryption, integrated with compression or stand alone. ZeroPoint Technologies AB was incorporated in 2015 as a spinout from Chalmers University of Technology. The Intellectual Property Portfolio related to computer memory management was then acquired from the founders and transferred into the new startup. The technology and ideas behind ZeroPoint come from Professor Per Stenström’s research group, which is internationally recognized for research excellence within high-performance computing. Since its creation, ZeroPoint has attracted skilled people from academia, industry, finance and the startup community and raised its first round of Venture Capital in 2016. The headquarter is in Göteborg, Sweden.

Contact ZeroPoint Technologies AB

Frequently asked questions about Data Compression IP

What is High-Performance Memory Expansion IP for AI Accelerators?

High-Performance Memory Expansion IP for AI Accelerators is a Data Compression IP core from ZeroPoint Technologies AB listed on Semi IP Hub.

How should engineers evaluate this Data Compression?

Engineers should review the overview, key features, supported foundries and nodes, maturity, deliverables, and provider information before shortlisting this Data Compression IP.

Can this semiconductor IP be compared with similar products?

Yes. Buyers can compare this product with similar semiconductor IP cores or IP families based on category, provider, process options, and structured technical specifications.

High-Performance Memory Expansion IP for AI Accelerators

Overview

Key features

Block Diagram

Benefits

Applications

What’s Included?

Specifications

Identity

Files

Provider

Learn more about Data Compression IP core

Evaluating Lossless Data Compression Algorithms and Cores

Data compression tutorial: Part 3

Firmware Compression for Lower Energy and Faster Boot in IoT Devices

A configurable FPGA-based multi-channel high-definition Video Processing Platform

IP Core for an H.264 Decoder SoC

Digital Associative Memories Based on Hamming Distance and Scalable Multi-Chip Architecture

Frequently asked questions about Data Compression IP

What is High-Performance Memory Expansion IP for AI Accelerators?

How should engineers evaluate this Data Compression?

Can this semiconductor IP be compared with similar products?