NPU / AI accelerator with emphasis in LLM
Overview
Skymizer EdgeThought is compatible with Hugging Face and Nvidia. It supports most cutting-edge LLMs, including the newest Llama3. With expertise in software-hardware co-design, EdgeThought can leverage the high bandwidth of specialty DRAM to run a Llama3 8B at up to 200 tokens per second on just 4GB of memory.
Key Features
- Programmable and Model-flexible
- Minimal yet efficient LLM-specific instruction set supports diverse decoder-only transformers,
- including LLaMA2, LLaMA3, Mistral, Phi-2, Phi-3, Gemma, etc.
- Currently focusing on 7-13B models.
- Larger models require more DRAM capacities.
- Ecosystem Ready
- LLM Frameworks: HuggingFace Transformers, Nvidia Triton Inference Server, OpenAI API, and LangChain API.
- Fine-Tuning and RAG Toolkits: HuggingFace PEFT, QLoRA, LlamaIndex, and LangChain.
Benefits
- High Performance and Low Cost
- High memory bandwidth utilization.
- Shortest response time with minimal MAC requirement.
Applications
- Edge server
- AI PC / Mini PC
- Smart speaker
- Video conferencing
- Automotive (in-cabin)
- Robotics (language to actions)
Deliverables
- RTL (plus software including compiler and quantizer)
Technical Specifications
Availability
Now
Related IPs
- AI accelerator (NPU) IP - 1 to 20 TOPS
- AI accelerator (NPU) IP - 16 to 32 TOPS
- AI accelerator (NPU) IP - 32 to 128 TOPS
- AI Accelerator (NPU) IP - 3.2 GOPS for Audio Applications
- ARC NPX Neural Processing Unit (NPU) IP supports the latest, most complex neural network models and addresses demands for real-time compute with ultra-low power consumption for AI applications
- DDR2/DDR3/DDR3L/LPDDR/LPDDR2/LPDDR3 6 in one combo IO with auto calibration - 40nm LL