Scaling On-Device GPU Inference for Large Generative Models
Driven by the advancements in generative AI, large machine learning models have revolutionized domains such as image processing, audio synthesis, and speech recognition. While server-based deployments remain the locus of peak performance, the imperative for on-device inference, necessitated by privacy and efficiency considerations, persists. Recognizing GPUs as the on-device ML accelerator with the widest reach, we present ML Drift--an optimized framework that extends the capabilities of state-of-the-art GPU-accelerated inference engines. ML Drift enables on-device execution of generative AI workloads which contain 10 to 100x more parameters than existing on-device generative AI models. ML Drift addresses intricate engineering challenges associated with cross-GPU API development, and ensures broad compatibility across mobile and desktop/laptop platforms, thereby facilitating the deployment of significantly more complex models on resource-constrained devices. Our GPU-accelerated ML/AI inference engine achieves an order-of-magnitude performance improvement relative to existing open-source GPU inference engines.
To read the full article, click here
Related Semiconductor IP
- E-Series GPU IP
- Arm's most performance and efficient GPU till date, offering unparalled mobile gaming and ML performance
- Highest performance automotive GPU IP, with revolutionary functional safety technology
- High performance GPU for cloud gaming with DirectX support
- Arm’s latest flagship GPU is based on the new 5th Gen GPU architecture, bringing the next generation of visual computing to mobile
Related Articles
- SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models
- RoMe: Row Granularity Access Memory System for Large Language Models
- SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation
- SOC: Submicron Issues -> Large PLDs need own physical models
Latest Articles
- Closer in the Gap: Towards Portable Performance on RISC-V Vector Processors
- TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing
- Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA
- A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core
- ObfAx: Obfuscation and IP Piracy Detection in Approximate Circuits