Small Language Models: Efficient Arm Computing Enables a Custom AI Future
As AI pivots from the colossal to the compact, small language models (SLMs) offer tailored solutions with reduced costs and increased accessibility
Increasingly in the world of AI, small is big.
Large language models (LLMs) have driven the early innovation in generative AI in the past 18 months, but there’s a growing body of evidence that the momentum behind unfettered scaling of LLMs – now pushing trillions of parameters to train on – is not sustainable. Or, at the very least, the infrastructure costs to push this approach to AI further are putting it out of reach for all but a handful. This class of LLM requires a vast amount of computational power and energy, which translates into high operational costs. Training GPT-4 cost at least $100 million, illustrating the financial and resource-heavy nature of these projects.
Not to mention, these LLMs are complex to develop and deploy. A study from the University of Cambridge points out companies might spend over 90 days to deploy a single machine learning model. This long cycle hampers rapid development and iterative experimentation, which are crucial in the fast-evolving field of AI.
These and other challenges are why the development focus is shifting towards small language models (SLMs or sometimes small LLMs), which promise to address many of these challenges by being more efficient, requiring fewer resources, and being easier to customize and control. SLMs like Llama, Mistral, Qwen, Gemma, or Phi3 are much more efficient at simpler, focused tasks like conversation, translation, summarization, and categorization as compared to sophisticated or nuanced content generation and, as such, consume a fraction of the energy for training.
To read the full article, click here
Related Semiconductor IP
- UCIe Chiplet PHY & Controller
- MIPI D-PHY1.2 CSI/DSI TX and RX
- Low-Power ISP
- eMMC/SD/SDIO Combo IP
- DP/eDP
Related Blogs
- How Arm is making it easier to build platforms that support Confidential Computing
- Arm GPUs built on new 5th Generation GPU architecture to redefine visual computing
- Total Compute Solutions (TCS23) provide the complete platform for mobile computing
- How Standards Are Unleashing the Power of DPUs for Cloud Computing
Latest Blogs
- CEO Interview with Cyril Sagonero of Keysom
- Cycuity Partners with SiFive and BAE Systems to Strengthen Microelectronics Design Supply Chain Security
- Cadence Unveils the Industry’s First eUSB2V2 IP Solutions
- Half of the Compute Shipped to Top Hyperscalers in 2025 will be Arm-based
- Industry's First Verification IP for Display Port Automotive Extensions (DP AE)