TOPS: The Truth Behind a Deep Learning Lie
By Ludovic Larzul, Mipsology
EETimes (June 25, 2021)
AI companies generally home in on one criterion: more tera operations per second (TOPS). Unfortunately, when silicon manufacturers promote their TOPS metrics, they are not really providing accurate guidance. In most cases, the numbers being hyped aren’t real TOPS, but peak TOPS. In other words, the TOPS number you think you’re getting in a card is actually the best-case scenario of how the chip would perform in a more than perfect world.
I will discuss the problems the industry has created by mislabeling performance metrics and explain how users can independently evaluate real-world TOPS.
Faux TOPS vs real TOPS
AI application developers generally start performing due diligence by gauging whether a chip manufacturer’s published TOPS performance data is adequate for powering their project.
Say you’re trying to remaster images in full HD on the U-Net neural network at 10 fps (frames per second). Since U-Net operations require 3 TOPS per image, simple math says you’ll need 30 TOPS to complete your project at the desired FPS. So, when shopping for a chip, you would assume that cards claiming to run 50, 40, or even 32 TOPS would be safe for the project. In a perfect world, yes, but you’ll soon find out that the card rarely hits the advertised number. And we’re not talking about drops of just a couple of TOPS; compute efficiency can be as low as 10 percent.
To read the full article, click here
Related Semiconductor IP
- NPU IP Core for Edge AI
- APX PHY for Physical AI
- LLM AI IP Core
- RISC-V-Based, Open Source AI Accelerator for the Edge
- Dataflow AI Processor IP
Related Articles
- Understanding the Deployment of Deep Learning algorithms on Embedded Platforms
- Aircraft Jet Engine Failure Analytics Using Google Cloud Platform Based Deep Learning
- What's Really Behind the Adoption of eFPGA?
- How Low Can You Go? Pushing the Limits of Transistors - Deep Low Voltage Enablement of Embedded Memories and Logic Libraries to Achieve Extreme Low Power
Latest Articles
- Design and Development of a Neuromorphic Silicon Suite: PVT Sensing, Stochastic LIF Inference, On-Chip STDP Learning, and Crossbar Programming
- LLM4RTL: Tool-Assisted LLM for RTL Generation
- Towards Delta Aware Training: Efficient DNN Weight Storage for Resource-Constrained FPGAs
- CHERI-D: Secure and efficient inline object ID for CHERI temporal memory safety
- AIA: A 16nm Multicore SoC for Approximate Inference Acceleration Exploiting Non-normalized Knuth-Yao Sampling and Inter-Core Register Sharing