NPU / AI accelerator with emphasis in LLM
Overview
Skymizer EdgeThought is compatible with Hugging Face and Nvidia. It supports most cutting-edge LLMs, including the newest Llama3. With expertise in software-hardware co-design, EdgeThought can leverage the high bandwidth of specialty DRAM to run a Llama3 8B at up to 200 tokens per second on just 4GB of memory.
Key Features
- Programmable and Model-flexible
- Minimal yet efficient LLM-specific instruction set supports diverse decoder-only transformers,
- including LLaMA2, LLaMA3, Mistral, Phi-2, Phi-3, Gemma, etc.
- Currently focusing on 7-13B models.
- Larger models require more DRAM capacities.
- Ecosystem Ready
- LLM Frameworks: HuggingFace Transformers, Nvidia Triton Inference Server, OpenAI API, and LangChain API.
- Fine-Tuning and RAG Toolkits: HuggingFace PEFT, QLoRA, LlamaIndex, and LangChain.
Benefits
- High Performance and Low Cost
- High memory bandwidth utilization.
- Shortest response time with minimal MAC requirement.
Applications
- Edge server
- AI PC / Mini PC
- Smart speaker
- Video conferencing
- Automotive (in-cabin)
- Robotics (language to actions)
Deliverables
- RTL (plus software including compiler and quantizer)
Technical Specifications
Availability
Now