After much speculation, Nvidia today at its March 2022 GTC event announced the Hopper GPU architecture, a line of graphics cards that the company says will accelerate the types of algorithms commonly used in data science. Named for Grace Hopper, the pioneering U.S. computer scientist, the new architecture succeeds Nvidia’s Ampere architecture, with launched roughly two years ago.
The first card in the Hopper lineup is the H100, containing 80 billion transistors and a component called the Transformer Engine that’s designed to speed up specific categories of AI models. Another architectural highlight includes Nvidia’s MIG technology, which allows an H100 to be partitioned into seven smaller, isolated instances to handle different types of jobs.
“Data centers are becoming AI factories — processing and refining mountains of data to produce intelligence,” Nvidia founder and CEO Jensen Huang said in a press release. “Nvidia H100 is the engine of the world’s AI infrastructure that enterprises use to accelerate their AI-driven businesses.”
Compute powerhouse
The H100 is the first Nvidia GPU to feature dynamic programming instructions (DPX), “instructions” in this context referring to segments of code containing steps that need to be executed. Developed in the 1950s, dynamic programming is an approach to solving problems using two key techniques: recursion and memoization.
Recursion in dynamic programming involves breaking a problem down into sub-problems, ideally saving time and computational effort. In memorization, the answers to these sub-problems are stored so that the sub-problems don’t need to be recomputed when they’re needed later on in the main problem.
Dynamic programming is used to find optimal routes for moving machines (e.g., robots), streamline operations on sets of databases, align unique DNA sequences, and more. These algorithms typically run on CPUs or specially designed chips called field-programmable gate arrays (FPGAs). But Nvidia claims that the DPX instructions on the H100 can accelerate dynamic programming by up to seven times compared with Ampere-based GPUs.
Transformer Engine
Beyond DPX, Nvidia is spotlighting the H100’s Transformer Engine, which combines data formats and algorithms to speed up the hardware’s performance with Transformers. Dating back to 2017, the Transformer has become the architecture of choice for natural language models (i.e., AI models that process text), thanks in part to its aptitude for summarizing documents and translating between languages.
Transformers have been widely deployed in the real world. OpenAI’s language-generating GPT-3 and DeepMind’s protein shape-predicting AlphaFold are built atop Transformer, and research has shown that the Transformer can be trained to play games like chess and even generate images.
The H100’s Transformer Engine leverages what’s called 16-bit floating-point precision and a newly added 8-bit floating-point data format. AI training relies on floating-point numbers, which have fractional components (e.g., 3.14). Most AI floating-point math is done using 16-bit half precision (FP16), 32-bit single precision (FP32), and 64-bit double precision (FP64). Cleverly, Transformer Engine uses Nvidia’s fourth-generation tensor cores to apply mixed FP8 and FP16 formats, automatically choosing between FP8 and FP16 calculations based on “custom, [hand]-tuned” heuristics, according to Nvidia.
The challenge in training AI models is to maintain accuracy while capitalizing on the performance offered by smaller, faster formats like FP8. Typically, lower precisions, like FP8, translate to less accurate models. But Nvidia maintains that the H100 can “intelligently” handle scaling for each model and offer up to triple the floating point operations per second compared with prior-generation TF32, FP64, FP16 and INT8 precisions.
Next-generation servers
The H100 — which is among the first GPUs to support the PCIe Gen5 format — features nearly 5 terabytes per second of external connectivity and 3TB per second of internal memory bandwidth. A new fourth-generation version of Nvidia’s NVLink technology, in tandem with the company’s NVLink Switch and HDR Quantum InfiniBand, enables customers to connect to 256 H100 GPUs together at nine times higher bandwidth, Nvidia says.
The H100 also features confidential computing capabilities intended to protect AI models and customer data while they’re being processed. Confidential computing isolates data in an encrypted enclave during processing. The contents of the enclave — including the data being processed — are accessible only to authorized programming code and are invisible to anyone else.
The H100, bound for datacenters, will be available first in Nvidia’s fourth-generation DGX system — the DGX H100. The DGX H100 boasts two Nvidia BlueField-3 DPUs, eight ConnectX Quantum-2 InfiniBand networking adapters, and eight H100 GPUs, delivering 400 gigabytes per second throughput and 32 petaflops of AI performance at FP8 precision. Every GPU is connected by a fourth-generation NVLink for 900GB per second of connectivity, and an external NVLink Switch can network up to 32 DGX H100 nodes in one of Nvidia’s DGX SuperPod supercomputers.
“AI has fundamentally changed what software can do and how it is produced. Companies revolutionizing their industries with AI realize the importance of their AI infrastructure,” Huang continued. “Our new DGX H100 systems will power enterprise AI factories to refine data into our most valuable resource — intelligence.”
For experimentation purposes, Nvidia intends to build an ultra-powerful DGX SuperPod dubbed Eos, which will feature 576 DGX H100 systems with 4,608 DGX H100 GPUs. (A single DGX SuperPod with a H100 GPU delivers around an exaflop of FP8 AI performance.) Eos will provide 18.4 exaflops of AI computing performance — four times faster processing than the Fugaku supercomputer in Japan, currently the world’s speediest — and 275 petaflops of performance, the company says.
The H100 will be available in Q3 2022. DGX H100 systems, DGX Pods, and DGX SuperPods will also be available from Nvidia’s global partners starting in Q3.
留下你的回應
以訪客張貼回應