TOPS: The Truth Behind a Deep Learning Lie
By Ludovic Larzul, Mipsology
EETimes (June 25, 2021)
AI companies generally home in on one criterion: more tera operations per second (TOPS). Unfortunately, when silicon manufacturers promote their TOPS metrics, they are not really providing accurate guidance. In most cases, the numbers being hyped aren’t real TOPS, but peak TOPS. In other words, the TOPS number you think you’re getting in a card is actually the best-case scenario of how the chip would perform in a more than perfect world.
I will discuss the problems the industry has created by mislabeling performance metrics and explain how users can independently evaluate real-world TOPS.
Faux TOPS vs real TOPS
AI application developers generally start performing due diligence by gauging whether a chip manufacturer’s published TOPS performance data is adequate for powering their project.
Say you’re trying to remaster images in full HD on the U-Net neural network at 10 fps (frames per second). Since U-Net operations require 3 TOPS per image, simple math says you’ll need 30 TOPS to complete your project at the desired FPS. So, when shopping for a chip, you would assume that cards claiming to run 50, 40, or even 32 TOPS would be safe for the project. In a perfect world, yes, but you’ll soon find out that the card rarely hits the advertised number. And we’re not talking about drops of just a couple of TOPS; compute efficiency can be as low as 10 percent.
To read the full article, click here
Related Semiconductor IP
- RISC-V AI Acceleration Platform - Scalable, standards-aligned soft chiplet IP
- AI IP Core
- High-Performance Memory Expansion IP for AI Accelerators
- High-performance AI dataflow processor with scalable vector compute capabilities
- Lowest Power and Cost End Point AI Accelerator
Related White Papers
- Understanding the Deployment of Deep Learning algorithms on Embedded Platforms
- Aircraft Jet Engine Failure Analytics Using Google Cloud Platform Based Deep Learning
- Choosing a Processor for Machine Learning at the Edge
- Paving the way for the next generation of audio codec for True Wireless Stereo (TWS) applications - PART 5 : Cutting time to market in a safe and timely manner
Latest White Papers
- OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis Designs
- Balancing Power and Performance With Task Dependencies in Multi-Core Systems
- LLM Inference with Codebook-based Q4X Quantization using the Llama.cpp Framework on RISC-V Vector CPUs
- PCIe 5.0: The universal high-speed interconnect for High Bandwidth and Low Latency Applications Design Challenges & Solutions
- Basilisk: A 34 mm2 End-to-End Open-Source 64-bit Linux-Capable RISC-V SoC in 130nm BiCMOS