HyperThought is a cutting-edge LLM accelerator IP designed to revolutionize AI applications. Built for the demands of multimodal and agentic intelligence, HyperThought delivers unparalleled performance, efficiency, and security.
Small
- Compact Design: HyperThought employs advanced compression technology to minimize language model size.
- Reduced Footprint: Optimized for lower parameter counts and DRAM bandwidth requirements, utilizing standard LPDDR 4/5 memory.
- Efficient Compression:
- Weight (long-term memory) compression outperforms open-source llama.cpp by 9% to 17.8%.
- KV cache (short-term memory) compression with minimal perplexity loss (less than 0.06% to 3.52%).
Efficient
- Balanced Performance: HyperThought achieves optimal area and compute efficiency.
- High Throughput: 100 GB/s bandwidth requires only 0.5 TOPs to achieve 30 tokens/second.
- Robust Performance: Even on a T28nm process, HyperThought delivers exceptional results.
Powerful
- Scalable Architecture: Multi-core design for increased processing power.
- High-Speed Processing: Octa-Core HTX301 LPU achieves 240 tokens/second for Llama2 7B prefill.
- Multi-Chip Scalability: Connect multiple chips to achieve extreme performance – up to 1200 tokens/second for Llama 7B prefill and support for 600B large models.
Secure
- LISA v3 Architecture: Incorporates Language Instruction Set Architecture (LISA v3) for enhanced security.
- Secure Design: Protects every interaction with security-focused instruction sets.
- Attack Prevention: KV cache format encryption prevents overflow attacks.
Driving Innovation with LISA v3 and LPU IP
- LISA v3 (Language Instruction Set Architecture): The foundation of HyperThought, enabling efficient processing of diverse data types (text, audio, image).
- LPU IP (LLM Processing Unit IP): The core processing engine of HyperThought, optimized for high-performance LLM acceleration.