Arm in the agentic era: Scaling the converged AI data center

System-level AI design is expanding the role of the CPU and Arm is the foundation of what comes next

The only way to scale AI is with comprehensive system design. Accelerators crunch the math that drives AI models, but it is CPUs that underpin the systems that turn that compute into real-world value. As AI infrastructure evolves toward tightly integrated racks and super-clusters, CPUs are playing the central role in orchestrating, coordinating, securing and scaling these systems – with CPU capability scaling at least in step with, and in many cases faster than, accelerator growth. NVIDIA’s announcement of the Vera Rubin platform at CES 2026 is a powerful validation of that shift: a fully co-designed AI system built to operate as a single, coherent supercomputer, with Arm technology at its core.

This reflects a broader industry transition toward purpose-built, system-level AI platforms that tightly integrate compute, acceleration, networking, storage and security from the ground up. Hyperscalers and AI-focused neo-clouds are converging on this model through multiple approaches, including NVIDIA platforms, hyperscaler-designed accelerators such as AWS Trainium and Google TPUs, and highly coordinated hybrid systems that combine merchant and custom silicon. Scaling AI now depends on more than raw accelerator performance. In these environments, the CPU layer that manages data movement, synchronization, reliability and isolation is key – and in the industry’s leading integrated AI systems, this is built on Arm.

Arm’s defining role in enabling “extreme co-design” for the AI era

The Vera Rubin platform brings together six tightly co‑designed chips: the Vera CPU, Rubin GPU, NVLink™ 6 Switch, ConnectX®‑9 SuperNIC, BlueField®‑4 DPU and Spectrum™‑6 Ethernet Switch. Together, they form what NVIDIA describes as a single AI supercomputer, optimized end to end for training, inference, reasoning and agentic AI workloads, significantly reducing the cost per token. NVIDIA’s Arm-based CPUs in this platform individually show generation-on-generation performance gains of up to 6x, as well as significant memory and interconnect bandwidth upgrades. It’s all part of system-wide optimization across silicon and software contributing to the overall performance uplift of the NVL72 rack-scale AI system. You can’t pursue that kind of uplift with legacy off-the-shelf CPUs; you need to go purpose-built.

The AWS Trainium3 UltraServer follows a similar approach, pairing Trainium3 accelerator chips with AWS Graviton CPUs and AWS Nitro cards, making Arm-based purpose-built silicon central to the platform.

What’s notable is not just the breadth of these platforms, but the architectural philosophy behind them. Each component is designed with the others in mind, minimizing bottlenecks and maximizing efficiency at scale. In his CES 2026 keynote, NVIDIA CEO Jensen Huang used the phrase “extreme co-design” to describe this shift. This level of integration demands not only top-tier performance and efficiency but also scope for significant flexibility and innovation in SoC design without compromising software compatibility. That’s where Arm technology comes in. 

Vera and BlueField: Arm at the heart of AI innovation

The Vera Rubin platform includes two key SoCs built with Arm technology.

Vera is a purpose-built CPU for large-scale AI systems, optimized for data movement, orchestration and agentic reasoning rather than traditional general-purpose workloads. Vera delivers up to 2x the performance of the previous Grace CPU with 88 cores per die and significant uplifts in memory and chip-to-chip bandwidth.

BlueField-4 represents a step change in what a DPU can be. By integrating the Arm Neoverse V2-based Grace CPU – an SoC previously reserved for high-performance, server-grade systems – NVIDIA has dramatically expanded BlueField’s role in large-scale AI infrastructure. Compared to earlier generations, BlueField-4 increases from 16 to 64 CPU cores, with significantly higher performance per core.

Memory size and bandwidth are increasingly critical constraints for large agentic models. By increasing BlueField-4’s compute capability by as much as 6x over BlueField-3, NVIDIA directly addresses this bottleneck and effectively elevates the DPU into a server-class system in its own right. This shift is exemplified by NVIDIA’s announcement that along with deploying a BlueField-4 in every Vera Rubin blade, it will also serve as the controller of a new class of AI inference-specific storage server specifically designed to complement their rack-scale systems.

Both Vera and BlueField-4 benefit from full compatibility with the broad software ecosystem built around the Arm Neoverse platform, unlocking access to more than 22 million developers from edge to cloud. With every major cloud provider now deploying Arm Neoverse-based CPU instances and growing enterprise adoption, these Arm-based NVIDIA systems can leverage a wide range of open-source and commercial software for rapid, seamless application deployment and development.

Industry convergence around a shared architecture

Across the industry, system designers are converging on a common architectural model for AI:

  • Purpose-built accelerators for AI compute
  • Powerful, efficient CPUs for orchestration and control
  • Tight system-level integration to scale across clusters

Arm’s Neoverse platform is designed to support this broader shift and is already being used in systems from the world’s largest technology providers, including AWS, Google, Meta, Microsoft and NVIDIA. Across this landscape, Arm-based CPUs increasingly power the control, orchestration and data movement layers that make large-scale AI viable.

Scaling the AI era

As we enter 2026, one thing is clear: The future of AI is moving faster than ever and is increasingly a system-level – or even a super-system-level – problem. Scaling intelligence now depends as much on efficient data storage and movement as it does on raw accelerator performance. Offloading critical AI-adjacent functions to powerful, programmable CPU fabrics in networking and storage allows accelerators and host CPUs to stay focused on core training and inference workloads, while improving isolation, efficiency and system reliability. In this world of increasingly purpose-built hardware, maintaining a coherent software ecosystem across generations of rapidly improving silicon is critical.

As AI systems grow in scale and complexity, Arm and the Arm ecosystem’s role continue to expand across every plane of the data center. The Vera Rubin announcement is a compelling illustration of that trajectory and a significant milestone for the Arm ecosystem as we kick off a new year.

×
Semiconductor IP