Just how valuable has the GPU been to computing? Paresh Kharya, the senior director of product management and marketing at Nvidia, likes to say that the company’s chips are already driving a “million-fold leap” forward for the industry. The company offered its “big picture” analysis as part of its publicity built around the GTC conference that highlights how Nvidia GPUs can support artificial intelligence applications.
The factor of one million is dramatically bigger than the older Moore’s law, which only promised that transistor counts on chips would double every two or so years. Many have noted that the doubling rate associated with Moore’s prediction has slowed recently, attributed to a number of reasons, such as the burgeoning costs of building factories. The doubling has also been less obvious to users because the extra transistors aren’t of much use for basic tasks like word processing. There’s only so much parallelism in daily workflow.
Nvidia’s GPUs got 1,000 times more powerful
Kharya bases his claim to a factor of several million on the combination of hungry new applications and a chip architecture that is able to feed them. Many of the new applications being explored today depend upon artificial intelligence algorithms, and these algorithms provide ideal opportunities for the massive collections of transistors on an Nvidia GPU. The work of training and evaluating the AI models is often inherently parallel.
The speedup is accelerated by the shift from owning hardware to renting machines in datacenters. In the past, everyone was limited by the power of the computer sitting on their desks. Now, anyone can spin up 1,000 machines or more in the datacenter to tackle a massive problem in a few seconds.
As an example, Kharya pointed out that in 2015 a single Kepler GPU from Nvidia took nearly a month to work through a popular computer vision model called ResNet-50.
“We train that same model in less than half a minute on Selene, the world’s most powerful industrial supercomputer, which packs thousands of Nvidia Ampere architecture GPUs,” Kharya explained in a blog post.
Some speed gain in this example came because of better, faster, and bigger GPUs. Kharya estimates that over the past 10 years, the raw computational power of Nvidia’s GPUs has grown by a factor of 1,000. The other factors come from enabling multiple GPUs in the datacenter to work together effectively. Kharya cited, as just a few examples, “our Megatron software, Magnum IO for multi-GPU and multi-node processing, and SHARP for in-network computing.”
The rest came from the expansion of cloud options. Amazon’s Web Services has been a partner of Nvidia’s for years, and it continues to make it easier for developers to rent GPUs for machine learning or other applications.
Can GPU growth trajectory match transistor counts?
Kharya also offered another data point taken from the world of biophysics, where scientists simulated the action of 305 million atoms that make up SARS CoV-2 (coronavirus). He found that the newest versions of the simulation run 10 million times faster than the original one made 15 years ago. Improvements to the algorithm as well as faster chips contributed to this result.
Other companies are pursuing the same massive increases. Google, for instance, is designing custom chips optimized for machine learning. These TPUs, named after the TensorFlow algorithm, have been available on Google’s Cloud platform since 2019.
For all the buzz-worthy attention generated by a factor of one million, the only caveat is that we won’t see the same kind of exponential growth as we did with transistor counts. While the raw power of the basic GPUs may continue to grow in speed as chip fabrication continues down the path set by Moore, the boost that comes from moving to a datacenter only comes once. Adding more machines to speed up the process will always be linear in cost.
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more
Source: Read Full Article