Open-Source Design of Heterogeneous SoCs for AI Acceleration: the PULP Platform Experience
By Francesco Conti, Angelo Garofalo, Davide Rossi, Giuseppe Tagliavini, and Luca Benini -- University of Bologna
The complexity of Artificial Intelligence (AI) algorithms increases at an exponential pace that pure technological scaling, especially with the slowing of Moore’s law, can not keep up with. Epoch AI estimates that the number of parameters in AI models is currently (as of 2024) scaling at a rate of 2× per year; training floating-point operations (FLOPs) are scaling even faster at 4.2× per year. On the other hand, the same institution estimates that compute performance from dedicated hardware only scales at a rate of 1.3× per year for 32-bit floating-point data, and with similar rates for other data formats. This figure includes gains from both technology node advancements and architectural improvements.
This setup creates an extraordinary challenge for the designers of heterogeneous AI System-on-Chips (SoCs). On the one hand, accelerator designs must scale continuously to match the increasing complexity of AI workloads – and this is true not only for datacenter AI accelerators but also for edge AI devices, whose functionality is expected to become progressively more sophisticated. On the other hand, this scaling also needs to happen at a fast pace, which makes it imperative to design, verify, and tape-out new complex heterogeneous SoCs with a much quicker turnaround time than in traditional cycles – especially for fabless startups.
By merit of its “automatic” cost-sharing principle, the open-source hardware model offers a promising avenue to streamline and accelerate the development of new SoCs, both in terms of cost and time. The principle is simple: instead of allocating significant resources to integrate outsourced IPs from vendors for low-value common baseline, non-differentiating parts of an SoC, one can focus efforts and funding primarily on the development of differentiating proprietary IPs and outsource only those technology-dependent IPs that are of critical importance (e.g., DRAM PHYs). Moreover, one can leverage available high-quality open-source IPs as a “starting point” for their designs, avoiding the need to fund development from scratch.
Since 2013, the academic PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor cores to network-on-chips, peripherals, SoC templates, and full hardware accelerators. In this article, we focus on the PULP experience designing heterogeneous AI acceleration SoCs – an endeavour encompassing SoC architecture definition; development, verification, and integration of acceleration IPs; front- and back-end VLSI design; testing; development of AI deployment software.
To read the full article, click here
Related Semiconductor IP
- Chiplet Die-to-Die Interconnect IP Solution
- High speed MACsec Engine 100G/200G/400G/800G/1.6T
- Temperature/Voltage sensors
- AMBA Bus Host to eSPI Controller/Target
- AMBA Bus Host to eSPI Controller
Related Articles
- The role of cache in AI processor design
- Leveraging RISC-V as a Unified, Heterogeneous Platform for Next-Gen AI Chips
- The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization
- Verification and Validation (V&V)-in-the-Loop for RISC-V Design: The Holistic Vision of BZL
Latest Articles
- ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs
- ITP-STDP: An Intrinsic-Timing Power-of-Two Learning Engine for On-Chip SNN Training
- OpenEye: A Scalable Open-Source Hardware Accelerator for DNNs
- CHIMERA: A Flexible and Scalable 3.1 TOPS/W AI-MCU with Transformer Accelerator and 563 Gb/s Shared-L2 Memory Subsystem with QoS Guarantees
- CXL-ClusterSim: Modeling CXL-based Disaggregated Memory Cluster for Pooling and Sharing using gem5 and SST