Arm Accelerates AI From Cloud to Edge With New PyTorch and ExecuTorch Integrations to Deliver Immediate Performance Improvements for Developers

September 16, 2024 -- Today Arm is announcing some significant new developments in our mission to make the developer experience as frictionless as possible, wherever they are in the ML stack. We’re working closely with leading cloud providers and frameworks to create an environment that makes it easy for software developers to bring accelerated AI and ML workloads to life on Arm-based hardware – in essence, we’re doing the hard work so developers don’t have to.

This has taken shape as Arm Kleidi, which brings together the latest developer enablement technologies and critical resources to drive technical collaboration and innovation across the ML stack. Since its launch only four months ago, Kleidi is already accelerating development and unlocking major AI performance uplifts on Arm CPUs, and Arm’s close collaboration with the PyTorch community is a great example of how the technology is significantly reducing the effort needed for developers to take advantage of efficient AI.

Integration with leading frameworks leads to major cloud benefits

In the cloud, Kleidi builds upon our existing work enhancing PyTorch with the Arm Compute Libraries (ACL) and establishes a blueprint for optimizing AI on Arm everywhere. We want developers to look to Arm as the platform of choice for running their critical ML workloads, without having to do unnecessary engineering work themselves. As a key step towards that vision, we’ve partnered directly with PyTorch and Tensorflow to integrate Arm Kleidi Libraries, consisting of essential Arm kernels integrated directly into these leading frameworks.

Critically, this means that application developers automatically benefit from dramatic performance improvements as soon as new framework versions are released, but without taking any extra steps to build on Arm today. We’ve seen the positive impact of this investment in partnerships already:

  • Arm’s demo chatbot, powered by the Meta Llama 3 Large Language Model and running on AWS Graviton processors enables real-time chat responses for the first time in mainline PyTorch.
    • We saw a 2.5x faster time to first token after integrating Kleidi technology to the open-source PyTorch codebase, as measured on AWS Graviton4.
  • By applying optimizations to torch.compile to make efficient use of Kleidi technology delivered via ACL, we’ve realized 1.35-2x performance gains measured on AWS Graviton3 on a range of Hugging Face model inferencing workloads.

These are just two impressive cloud examples, and represent the type of performance acceleration possible as we work to democratize ML workloads on Arm. We’re continuing to invest to make sure developer AI apps run best on Arm from cloud to edge, including making new capabilities forward-compatible so that developers can immediately take advantage.

Partnering to help developers keep pace with generative AI

Generative AI has spurred a wave of AI innovation, with new versions of language models being released at an unprecedented rate. We’re working closely with all key parts of the ML stack, including cloud service providers like AWS and Google, and the rapidly growing ML ISV community, such as Databricks, to ensure developers can stay ahead. You can see what some of these partners have to say at the end of this blog.

And we won’t stop there. It’s critical that developers can apply the resources we provide to real-world use cases, so we’re creating demonstration software stacks alongside learning paths to show developers exactly how to build AI workloads on Arm CPUs. This is what drives rapid adoption and speeds time to deployment for developers on Arm systems. You can see the first of these stacks – an implementation of a chat bot accelerated by Kleidi technologies – here. We’ll be adding to these use cases with ML Ops and Retrieval-Augmented Generation later in the year, with more to come in 2025.

Continuing to drive performance improvements at the edge

Building on our momentum with Kleidi at the edge, today we’re also announcing that KleidiAI will be integrated into ExecuTorch, the new on-device inference runtime from PyTorch. This integration is on track to be completed in October 2024 and promises exciting performance improvements for edge devices across apps currently being production tested or rolled-out in ExecuTorch. We’ll be sharing more data and detail around the impact it’s having on edge device performance when the integration is complete.

It joins a number of other KleidiAI integrations we’ve already announced, including with Google’s XNNPACK and MediaPipe, and Tencent’s Hunyuan large language model, and the impact to real-world workloads speaks for itself – as you can see in our chatbot demo.

As Kleidi continues to be integrated with both PyTorch and ExecuTorch releases, alongside all other major AI frameworks, developers will be able to immediately run efficient and performant AI workloads on Arm across a spectrum of devices, from cloud data centers to edge devices. We will continue to actively introduce enhancements to the PyTorch community, and, looking ahead, we are focusing on delivering quantization optimizations for various integer formats to unlock further performance gains. This work is enabling the next generation of AI experiences to run on seamlessly on the Arm CPU at scale.

More to come to further empower developers

PyTorch is driving a huge amount of innovation in ML development and as many of you have seen, recently the organization announced I am joining the PyTorch board. That’s a watershed moment for Arm and our AI journey. While I’m new to Arm, I’m certainly not new to the open-source AI software community, and my mission here is to empower developers worldwide to create cutting-edge AI and application capabilities by unlocking the full potential of end-to-end AI on Arm. Watch this space!

Supporting partner quotes:

“Arm and Google Cloud are both committed to increasing AI access and agility for developers, and Kleidi is a big step forward in co-optimizing hardware and software for AI needs,” said Nirav Mehta, Senior Director of Product Management, Google Cloud Compute. “As our customers embrace Axion, our custom Arm-based CPU, we look forward to enabling them with more seamless integration across the entire ML stack.”

“Organizations leveraging the Databricks Data Intelligence Platform for AI and ML workflows benefit from performance optimizations provided by Arm Kleidi integrations across the ML software stack,” said Lin Yuan, software engineer at Databricks. “With Arm-based AWS Graviton processors supported by the Databricks ML Runtime cluster, companies benefit from increased speedups for a wide range of ML libraries, while lowering their cloud service provider costs.”

Additional resources:

About Kleidi:

Kleidi (‘key’ in ancient Greek) is built on three critical pillars:

  • Open Arm technology integrated directly into key frameworks, enabling large language models to seamlessly access performance of Arm CPUs without any extra work for the developer. We’ll ensure that new technologies are always forward-compatible so that developers can immediately take advantage.
  • Developer empowerment through a wide range of resources, including usage guidance, learning paths and demonstrations.
  • A vibrant ecosystem of ML software providers, frameworks and open-source projects with access to all the latest AI features, building solutions on Arm first.

You can find out more about Kleidi here.

×
Semiconductor IP