DeepSeek’s aftermath: Lessons to learn as the dust settles

The Chinese AI company DeepSeek took the technology industry, and Wall Street, by storm with its language model achieving a reported 10x higher efficiency than AI industry leaders. You have seen the news and might be getting sick of the endless articles tagging onto it, but I would like to offer a different perspective. DeepSeek claimed it used a cluster of 2,048 Nvidia H800 GPUs (as stated in their technical report). If this is true, they are using a lot less computing power than other leading AI players. That news hurt Nvidia’s stock price badly, as the implication is this reduces the need for compute… But is that really the case? Is this as obvious as it seems? And why haven’t some other tech companies reacted like the markets did?  

As the dust settles on all the media around DeepSeek, surely there are plenty of lessons to be learned. One is that DeepSeek built and trained their models on top of already open-source models making use of investments already made by others. All that past spent compute can never be retrospectively restricted in the space of available open global trained models.   

The fact that DeepSeek was able to achieve a similar performance as leading AI players with less hardware resources has given rise to a discussion comparing compute needs. If you can achieve this performance with less hardware resources and open-source models, do you even need more computing power? Well, even using model distillation, having access to limited compute resources required DeepSeek to heavily optimize their software (again, in their technical report).  

According to DeepSeek, they used various techniques to optimize software to the limited hardware they had access to, helping them achieve these performance gains with less computing power. Of course, nothing comes for free and tailoring software to hardware typically reduces flexibility. As with everything else in life, you need to find the right balance.   

To read the full article, click here

×
Semiconductor IP