NPU / AI accelerator with emphasis in LLM

Overview

Skymizer EdgeThought is compatible with Hugging Face and Nvidia. It supports most cutting-edge LLMs, including the newest Llama3. With expertise in software-hardware co-design, EdgeThought can leverage the high bandwidth of specialty DRAM to run a Llama3 8B at up to 200 tokens per second on just 4GB of memory.

Key Features

Programmable and Model-flexible

Minimal yet efficient LLM-specific instruction set supports diverse decoder-only transformers,
including LLaMA2, LLaMA3, Mistral, Phi-2, Phi-3, Gemma, etc.
Currently focusing on 7-13B models.
Larger models require more DRAM capacities.

Ecosystem Ready

LLM Frameworks: HuggingFace Transformers, Nvidia Triton Inference Server, OpenAI API, and LangChain API.
Fine-Tuning and RAG Toolkits: HuggingFace PEFT, QLoRA, LlamaIndex, and LangChain.

Benefits

High Performance and Low Cost

High memory bandwidth utilization.
Shortest response time with minimal MAC requirement.

Applications

Edge server
AI PC / Mini PC
Smart speaker
Video conferencing
Automotive (in-cabin)
Robotics (language to actions)

Deliverables

RTL (plus software including compiler and quantizer)

Technical Specifications

Availability

Now

Request Info