From ChatGPT to Computer Vision Processing: How Deep-Learning Transformers Are Shaping Our World

Is that a dog in the middle of the street? Or an empty box? If you’re riding in a self-driving car, you’ll want the object detection and collision avoidance systems to correctly identify what might be on the road ahead and direct the vehicle accordingly. Inside modern vehicles, deep learning models play an integral role in the cars’ computer vision processing applications.

With cameras becoming pervasive in so many systems, cars aren’t the only ones taking advantage of AI-driven computer vision technology. Mobile phones, security systems, and camera-based digital personal assistants are just a few examples of camera-based devices that are already using neural networks to enhance image quality and accuracy.

While the computer vision application landscape has traditionally been dominated by convolutional neural networks (CNNs), a new algorithm type—initially developed for natural language processing such as translation and question answering—is starting to make inroads: transformers. A deep-learning model that processes all input data simultaneously, transformers likely won’t completely replace CNNs but will be used alongside them to enhance the accuracy of vison processing applications.

Transformers have been in the news lately thanks to ChatGPT, a transformer-based chatbot launched in November 2022 by OpenAI. While ChatGPT is a server-based transformer requiring 175 billion parameters, you’ll learn more in this blog post about why transformers are also ideal for embedded computer vision. Read on for insights into how transformers are changing the direction of deep-learning architectures and for techniques to optimize the implementation of these models to derive optimal results.

To read the full article, click here

×
Semiconductor IP