Overview
Machine vision and deep learning are being embedded in highly integrated SoCs and expanding into high-volume applications such as automotive ADAS, surveillance, and augmented reality. A major challenge in enabling mass adoption of embedded vision applications is in providing the processing capability at a power and cost point low enough for embedded applications, while maintaining sufficient flexibility to cater to rapidly evolving markets.
The Synopsys ARC EV Processors are fully programmable and configurable IP cores that are optimized for embedded vision applications, combining the flexibility of software solutions with the low cost and low power consumption of hardware. For fast, accurate execution of convolutional neural networks (CNNs) or recurrent neural networks (RNNs), the EV Processors integrate an optional high-performance deep neural network (DNN) accelerator.
The EV Processors are designed to integrate seamlessly into an SoC and can be used with any host processors and operate in parallel with the host. To speed application software development, the EV processors are supported by a comprehensive software programming environment based on existing and emerging embedded vision and neural network standards including OpenCV, OpenVX™, OpenCL™ C, and Caffe with Synopsys' ARC MetaWare EV Development Toolkit.
Learn more about CPU IP core
For the first time in our more than 35-year history, Arm is delivering its own silicon products – extending the Arm Neoverse platform beyond IP and Arm Compute Subsystems (CSS) to give customers greater choice in how they deploy Arm compute – from building custom silicon to integrating platform-level solutions or deploying Arm-designed processors.
The ChiPy DSL is Quadric's Python framework for building complete on-chip pipelines. Using YOLOX-M as a case study, we show how backbone inference, box decoding, and NMS run entirely on the Chimera GPNPU — no host CPU intervention, no DDR round-trips, just Python compiled to silicon.
As part of the new Arm Lumex compute subsystem (CSS) platform, the Arm C1 CPU cluster – the first built on the Armv9.3 architecture – is the next evolution of our highest performing CPU cluster for consumer devices, designed to unleash the full potential of on-device AI and elevate the user experience.
Hardware fuzzing has recently gained momentum with many discovered bugs in open-source RISC-V CPU designs. Comparing the effectiveness of different hardware fuzzers, however, remains a challenge: each fuzzer optimizes for a different metric and is demonstrated on different CPU designs.
Unlock ultra-efficient performance, advanced AI processing, and robust security with the Cortex-A320—designed to power the future of IoT and edge AI innovation.
Pie maintains low computation latency, high throughput, and high elasticity. Our experimental evaluation demonstrates that Pie achieves optimal swapping policy during cache warmup and effectively balances increased memory capacity with negligible impact on computation. With its extended capacity, Pie outperforms vLLM by up to 1.9X in throughput and 2X in latency. Additionally, Pie can reduce GPU memory usage by up to 1.67X while maintaining the same performance. Compared to FlexGen, an offline profiling-based swapping solution, Pie achieves magnitudes lower latency and 9.4X higher throughput.