NVIDIA HGX-2

HGX-2

Nvidia新推出第一個專用於人工智慧以及高效能運算的統一架構HGX-2,該平臺提供多重精度計算,滿足不同運算需求,不過,該系統售價也不低,第一個使用HGX-2架構的人工智慧超級電腦DGX-2將在第三季發售,要價接近40萬美元。

這個HGX-2架構的前身HGX-1,被微軟的超大型雲端硬體設計Olympus專案(Project Olympus)、臉書的人工智慧硬體架構Big Basin,還有AWS的公有雲服務硬體採用,同時Nvidia也將其用於自家的人工智慧超級電腦DGX-1。

Nvidia創辦人黃仁勳在GPU技術大會(GPU Technology Conference)提到當計算需求暴增時,CPU擴展的能力顯得薄弱。HGX-2搭載Tensor Core GPU為企業提供能強大的運算能力,該公司表示,將其用於ResNet-50訓練基準,每秒能訓練15萬5千張圖片,其運算能力相當於一臺擁有300顆CPU的伺服器。




DRIVING NEXT-GENERATION AI TO FASTER PERFORMANCE

AI models are exploding in complexity and require large memory, multiple GPUs, and an extremely fast connection between the GPUs to work. With NVSwitch connecting all GPUs and unified memory, HGX-2 provides the power to handle these new models for faster training of advanced AI. A single HGX-2 replaces 300 CPU-powered servers, saving significant cost, space, and energy in the data center.

HGX-2其中一大特點便是提供多重精度(Multi-precision)計算,能為使用情境帶來更多樣的彈性,以科學計算和模擬來說,由於要求資料計算的精準度,HGX-2提供雙精度與單精度浮點數運算,另外,也可以使用半精度浮點數或是Int8來做人工智慧模型的訓練以及推測計算。

另外,透過Nvidia的光纖互連技術NVSwitch,HGX-2中的16顆V100 GPU運作起來就像是只有一個GPU,提供了2 Petaflops的運算速度,相當於每秒2萬億次的浮點數計算。

THE HIGHEST-PERFORMING HPC SUPERNODE

HPC applications require strong server nodes with the computing power to perform a 

massive number of calculations per second. Increasing the compute density of each node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For HPC simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors to facilitate computation, making GPUs connected by NVSwitch ideal. A single HGX-2 server replaces 60 CPU-only servers.



SPECIFICATIONS

   HGX-1  HGX-2
 Performance
 1 petaFLOP tensor operations
125 teraFLOPS single-precision
62 teraFLOPS double-precision
 2 petaFLOPS tensor operations
250 teraFLOPS single-precision
125 teraFLOPS double-precision
 GPUs  8x NVIDIA Tesla V100  16x NVIDIA Tesla V100
 GPU Memory  256GB total  512GB total
 NVIDIA CUDA® Cores  40,960  81,920
 NVIDIA Tensor Cores  5,120  10,240
 Communication Channel  Hybrid cube mesh powered by NVLink 300GB/s bisection bandwidth  NVSwitch powered by NVLink 2.4TB/s bisection bandwidth

Comments