Volta’s Tensor Cores are to neural networks what traditional GPU cores are to graphics
Traditional GPU cores were built to perform classic graphics operations like shading very quickly. For neural networks, the basic building blocks are matrix multiplication and addition. Nvidia’s new Tensor Cores can each perform all the operations needed to multiply two 4 x 4 matrices and adding a third at the same time. So in addition to having the benefit of the 5,120 cores on a V100 running in parallel, each core is itself running many operations in parallel. The result is what Nvidia says is a 12x speedup in inferencing learning over Pascal, and a 6x speedup in inferencing.
The Nvidia V100 is one of the most impressive chips ever made
In raw specs, the V100 is seriously impressive. With 21 billion transistors crammed into its 815 square millimeter die, Nvidia CEO Jensen Huang claims it is the largest and most complex chip that can be created with current semiconductor physics. At a cost of $3 billion in R&D, the final chip is fabricated using a 12nm process by TSMC, and uses the highest-speed RAM available from Samsung. After the keynote, Nvidia explained that it used 12nm and such a large die size because it deliberately wanted to create the most sophisticated chip possible.
Volta may help stem the rise of AI-specific processors
Google made some waves recently with a performance comparison of its custom TensorFlow chip with an older Nvidia GPU for inferencing performance. Volta is clearly part of Nvidia’s answer, but it isn’t stopping there. Huang also announced TensorRT, a compiler for Tensorflow and Caffe designed to optimize the runtime performance on GPUs. The compiler will not only improve efficiency, it greatly reduces latency–a key benefit of Google’s custom chip–allowing 30 percent lower latency than Skylake or P100 and 10x throughput for image recognition benchmarks. For pure inferencing loads, the new Tesla V100 PCIe can replace over a dozen current traditional CPUs, and at much lower power consumption. Nvidia also responded more directly to competition from customized inferencing chips by announcing that it is making its DLA (Deep Learning Accelerator) design and code open source.
The Tensor Cores are complemented with a large 20MB register file, 16GB of HBM2 RAM at 900GB/s, and 300GB/s NVLink for IO. The result is a chip that implements an AI-friendly version of the Volta architecture. Nvidia confirmed later that not all Volta architecture processors will have such an extensive set of AI acceleration features, and may be more focused on pure graphics or general purpose computing performance. Conversely, Nvidia defended its incorporation of AI features such as inferencing acceleration into its mainstream GPU, rather than creating a separate product line, by explaining that its Tensor Core is ideal for performing both training and inferencing operations.
The V100 is the heart of an upgraded DGX-1 and new HGX-1
Nvidia also announced an upgraded DGX-1 based on eight V100 chips, available for $149,000 in Q3, and a smaller DGX Station with 4 V100 chips for $69,000 also planned for Q3. OEM products based on the V100 are expected to start shipping by the end of the year. In partnership with Microsoft Azure, Nvidia has also developed a cloud-friendly box, the HGX-1, with eight V100s that can be flexibly configured for a variety of cloud computing needs. Microsoft plans to use Volta both for its own applications, and to be available to Azure customers.
Nvidia expects Volta to power cars and robots too
In addition to pure software applications, Nvidia expects Volta-based processors and boards to be the heart of physical devices that need learning or inferencing technology. That includes robots–especially ones simulated with Nvidia’s newly announced Isaac robot simulation toolkit–as well as autonomous vehicles of various shapes and sizes. One particularly interesting project is an Airbus effort to design a self-piloted small plane that can takeoff vertically and carry two passengers up to 70 miles.