NPU / AI accelerator with emphasis in LLM

Overview

Skymizer EdgeThought is compatible with Hugging Face and Nvidia. It supports most cutting-edge LLMs, including the newest Llama3. With expertise in software-hardware co-design, EdgeThought can leverage the high bandwidth of specialty DRAM to run a Llama3 8B at up to 200 tokens per second on just 4GB of memory.

Key Features

  • Programmable and Model-flexible
    • Minimal yet efficient LLM-specific instruction set supports diverse decoder-only transformers,
    • including LLaMA2, LLaMA3, Mistral, Phi-2, Phi-3, Gemma, etc.
    • Currently focusing on 7-13B models.
    • Larger models require more DRAM capacities.
  • Ecosystem Ready
    • LLM Frameworks: HuggingFace Transformers, Nvidia Triton Inference Server, OpenAI API, and LangChain API.
    • Fine-Tuning and RAG Toolkits: HuggingFace PEFT, QLoRA, LlamaIndex, and LangChain.

Benefits

  • High Performance and Low Cost
    • High memory bandwidth utilization.
    • Shortest response time with minimal MAC requirement.

Applications

  • Edge server
  • AI PC / Mini PC
  • Smart speaker
  • Video conferencing
  • Automotive (in-cabin)
  • Robotics (language to actions)

Deliverables

  • RTL (plus software including compiler and quantizer)

Technical Specifications

Availability
Now
×
Semiconductor IP