We all are well aware of NVIDIA and the AI "gold mine" that has recently taken everyone by storm. In the midst of everything stands Team Green's H100 AI GPUs, which are simply the most sought-after piece of hardware for AI at the moment with everyone trying to get hands on one to power their AI needs.
NVIDIA H100 GPU Is The Best Chip For AI At The Moment & Everyone Wants More of Those
This article isn't particularly news but highlights readers on the current situation of the AI industry, and how companies are revolving around the H100 GPUs for their "future".
Before we go into the crux of the article, giving a recap becomes a necessity. So, at the start of 2022, everything was going fine with the usual developments. However, with November's arrival, a revolutionary application emerged named "ChatGPT", which established the foundations of the AI hype. While we cannot categorize "ChatGPT" as the founder of the AI boom, we certainly can say that it acted like a catalyst. With it emerged competitors like Microsoft and Google, getting forced into an AI race to release generative AI applications.
You might say, where does NVIDIA come in here? The backbone of generative AI involves hefty LLM (Large Language Model) training periods, and the NVIDIA AI GPUs come in clutch here. We won't go into tech specs and factual bits since that makes things dull and no fun to read. However, if are into getting to know specifics, we are dropping a table below, highlighting every AI GPU release from NVIDIA, dating back to Tesla models.
NVIDIA HPC / AI GPUs
NVIDIA Tesla Graphics Card | NVIDIA H100 (SMX5) | NVIDIA H100 (PCIe) | NVIDIA A100 (SXM4) | NVIDIA A100 (PCIe4) | Tesla V100S (PCIe) | Tesla V100 (SXM2) | Tesla P100 (SXM2) | Tesla P100 (PCI-Express) |
Tesla M40 (PCI-Express) |
Tesla K40 (PCI-Express) |
---|---|---|---|---|---|---|---|---|---|---|
GPU | GH100 (Hopper) | GH100 (Hopper) | GA100 (Ampere) | GA100 (Ampere) | GV100 (Volta) | GV100 (Volta) | GP100 (Pascal) | GP100 (Pascal) | GM200 (Maxwell) | GK110 (Kepler) |
Process Node | 4nm | 4nm | 7nm | 7nm | 12nm | 12nm | 16nm | 16nm | 28nm | 28nm |
Transistors | 80 Billion | 80 Billion | 54.2 Billion | 54.2 Billion | 21.1 Billion | 21.1 Billion | 15.3 Billion | 15.3 Billion | 8 Billion | 7.1 Billion |
GPU Die Size | 814mm2 | 814mm2 | 826mm2 | 826mm2 | 815mm2 | 815mm2 | 610 mm2 | 610 mm2 | 601 mm2 | 551 mm2 |
SMs | 132 | 114 | 108 | 108 | 80 | 80 | 56 | 56 | 24 | 15 |
TPCs | 66 | 57 | 54 | 54 | 40 | 40 | 28 | 28 | 24 | 15 |
FP32 CUDA Cores Per SM | 128 | 128 | 64 | 64 | 64 | 64 | 64 | 64 | 128 | 192 |
FP64 CUDA Cores / SM | 128 | 128 | 32 | 32 | 32 | 32 | 32 | 32 | 4 | 64 |
FP32 CUDA Cores | 16896 | 14592 | 6912 | 6912 | 5120 | 5120 | 3584 | 3584 | 3072 | 2880 |
FP64 CUDA Cores | 16896 | 14592 | 3456 | 3456 | 2560 | 2560 | 1792 | 1792 | 96 | 960 |
Tensor Cores | 528 | 456 | 432 | 432 | 640 | 640 | N/A | N/A | N/A | N/A |
Texture Units | 528 | 456 | 432 | 432 | 320 | 320 | 224 | 224 | 192 | 240 |
Boost Clock | TBD | TBD | 1410 MHz | 1410 MHz | 1601 MHz | 1530 MHz | 1480 MHz | 1329MHz | 1114 MHz | 875 MHz |
TOPs (DNN/AI) | 3958 TOPs | 3200 TOPs | 1248 TOPs 2496 TOPs with Sparsity |
1248 TOPs 2496 TOPs with Sparsity |
130 TOPs | 125 TOPs | N/A | N/A | N/A | N/A |
FP16 Compute | 1979 TFLOPs | 1600 TFLOPs | 312 TFLOPs 624 TFLOPs with Sparsity |
312 TFLOPs 624 TFLOPs with Sparsity |
32.8 TFLOPs | 30.4 TFLOPs | 21.2 TFLOPs | 18.7 TFLOPs | N/A | N/A |
FP32 Compute | 67 TFLOPs | 800 TFLOPs | 156 TFLOPs (19.5 TFLOPs standard) |
156 TFLOPs (19.5 TFLOPs standard) |
16.4 TFLOPs | 15.7 TFLOPs | 10.6 TFLOPs | 10.0 TFLOPs | 6.8 TFLOPs | 5.04 TFLOPs |
FP64 Compute | 34 TFLOPs | 48 TFLOPs | 19.5 TFLOPs (9.7 TFLOPs standard) |
19.5 TFLOPs (9.7 TFLOPs standard) |
8.2 TFLOPs | 7.80 TFLOPs | 5.30 TFLOPs | 4.7 TFLOPs | 0.2 TFLOPs | 1.68 TFLOPs |
Memory Interface | 5120-bit HBM3 | 5120-bit HBM2e | 6144-bit HBM2e | 6144-bit HBM2e | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 384-bit GDDR5 | 384-bit GDDR5 |
Memory Size | Up To 80 GB HBM3 @ 3.0 Gbps | Up To 80 GB HBM2e @ 2.0 Gbps | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s |
Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s |
16 GB HBM2 @ 1134 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s |
24 GB GDDR5 @ 288 GB/s | 12 GB GDDR5 @ 288 GB/s |
L2 Cache Size | 51200 KB | 51200 KB | 40960 KB | 40960 KB | 6144 KB | 6144 KB | 4096 KB | 4096 KB | 3072 KB | 1536 KB |
TDP | 700W | 350W | 400W | 250W | 250W | 300W | 300W | 250W | 250W | 235W |
The question still isn't answered here, why the H100s? Well, we are getting there. NVIDIA's H100 is the company's highest-end offering, providing immense computing capabilities. One might argue that the bump in performance brings in higher costing, but companies tend to order huge volumes, and "performance per watt" is the priority here. Compared to the A100, the Hopper "H100" brings in 3.5 times more 16-bit inference and 2.3 times 16-bit training performance, making it the obvious choice.
So now, we hope the superiority of the H100 GPU is evident here. Now, moving on to our next segment, why is there a shortage? The answer to this involves several aspects, the first being the vast volumes of H100s needed to train a single model. An astonishing fact is that OpenAI's GPT-4 AI model required around 10,000 to 25,000 A100 GPUs (at that time, H100s weren't released).
Modern AI startups such as Inflection AI and CoreWeave have acquired humongous amounts to H100s, with a total worth accounting in billions of dollars. This shows that a single company requires huge volumes, even to train a basic-to-decent AI model, due to which the demand has been tremendous.
If you question NVIDIA's approach, one can say, "NVIDIA could increase production to cope with demand." Saying this is much easier than actually implementing it. Unlike gaming GPUs, NVIDIA AI GPUs require extensive processes, with most of the manufacturing assigned to the Taiwanese semiconductor behemoth TSMC. TSMC is the exclusive supplier of NVIDIA's AI GPU, leading all stages from wafer acquisition to advanced packaging.
H100 GPUs are based on TSMC's 4N process, a revamped version of the 5nm family. NVIDIA is the biggest customer for this process since Apple previously utilized it for its A15 bionic chipset, but A16 Bionic has replaced that. Of all of the relevant steps, the production of HBM memory is the most complicated since it involves sophisticated equipment currently utilized by a few manufacturers.
HBM suppliers include SK Hynix, Micron, and Samsung while TSMC has limited its suppliers, and we are unaware of who they are. However, apart from HBM, TSMC also faces problems maintaining CoWoS (Chip-on-Wafer-on-Substrate) capacity, a 2.5D packaging process, and a crucial stage in developing H100s. TSMC can't match the demand from NVIDIA, due to which order backlogs have reached new heights, getting delayed to December.
So when people use the word GPU shortage, they're talking about a shortage of, or a backlog of, some component on the board, not the GPU itself. It's just limited worldwide manufacturing of these things... but we forecast what people want and what the world can build.
-Charlie Doyle, NVIDIA's DGX VP and GM (via Computerbase.de)
We have left out many specifics, but going into detail will deviate from our primary aim, which is to detail an average user about the situation. While for now, we don't believe the shortage could reduce and, in turn, is expected to increase. However, we could see a landscape shift here after AMD's decision to consolidate its position in the AI market.
DigiTimes reports that "TSMC seems to be particularly optimistic about demand for AMD’s upcoming Instinct MI300 series, saying it will be half of Nvidia's total output of CoWoS-packaged chips" It may distribute the workload across companies. Still, judging by Team Green's greedy policies in the past, something like this would require a severe offering from AMD.
Summing up our talk, NVIDIA's H100 GPUs are leading the AI hype to new heights, which is why this frenzy surrounds them. We aimed to wrap up our talk by giving readers a general idea of the whole scenario. Credits to GPU Utilis for the idea behind this article; make sure to look at their report too.
from Wccftech https://ift.tt/kN8H4R2
0 Comments