Cloud-based AI supercomputers are gaining momentum on the list of the world's most powerful computers

Cloud-based AI supercomputers (including Microsoft Azure and Cambridge Universitys new systems) are gaining momentum on the latest list of the worlds most powerful computers

In the TOP500 list released by the ISC High-Performance Conference (ISC High Performance), 342 systems have adopted NVIDIA technology to provide acceleration, including 70% of the new systems, and 8 of the top 10.

The latest list of the worlds strongest systems shows that more and more high-performance computing centers are using AI, and it also shows that users continue to embrace the combination of NVIDIA AI, accelerated computing, and network technologies to run their scientific and commercial workloads.

For example, the number of systems using InfiniBand on the list has increased by 20% compared to last year. With the increasing demand for low latency and acceleration of AI, HPC, and analog data, InfiniBand has become the network of choice.

In addition, the two new systems on the list are what we call a super clouds-a new type of shared supercomputer that can simultaneously meet the needs of artificial intelligence, high-performance computing, and the cloud.

The arrival of the super cloud

Microsoft Azure uses clusters to elevate public cloud services to a new level, occupying 26th to 29th consecutively on the TOP500 list. They are part of the super cloud, a global artificial intelligence supercomputer, which can be used by any user on the planet today.

In the HPL benchmark test (also known as Linpack), each of the four Azure systems achieved a computing performance of 16.59 Petaflop. Linpack is a traditional standard for measuring the high-performance computing performance of 64-bit floating-point math operations, and it is also the reference basis for the TOP500 list.

Enter the era of high-performance computing in the industry

The Azure system is an example of the "industry high-performance computing revolution" described by NVIDIA CEO Huang Renxun, that is, the integration of AI, high-performance computing, and accelerated computing, is driving the development of various fields of scientific research and industry.

Behind the scenes, there are 8 NVIDIA A100 Tensor Core GPUs to power each virtual instance of the Azure system. Each chip has its own HDR 200G InfiniBand communication interface, which can establish fast connections with thousands of GPUs in the Azure cloud.

British researchers adopt cloud-native technology

Cambridge University became the fastest academic system in the UK for the first time, and its supercomputer ranked third in the Green500 list of the most energy-efficient systems in the world. This is another super cloud. It is called Wilkes-3 and is the world's first cloud-native supercomputer, which allows researchers to share virtual resources with privacy and security without compromising performance. This is due to the optimized NVIDIA Bluefield DPU that can perform security, virtualization, and other data processing tasks.

The system uses 320 A100 GPUs connected to the HDR 200G Infiniband network to accelerate simulation, artificial intelligence, and data analysis for academic research and business partners exploring the frontiers of science and medicine.

TOP500 rookies adopt AI

Many new systems powered by NVIDIA on the list highlight the increasing importance of AI for high-performance computing applications for scientific research and commercial users.

Perlmutter of the National Energy Research Scientific Computing Center (NERSC) ranked 5th in the TOP500 with 64.59 Linpack petaflops, thanks in part to its 6,144 A100 GPUs.

The system provides more than half of the exaflops performance on the latest version of HPL-AI. It is an emerging benchmark for blending HPC and AI workloads, and it uses mixed-precision mathematicsthe foundation of deep learning and many scientific and commercial workwhile still providing the full accuracy of double-precision math.

AI performance is becoming more and more important because AI is a growth area of ​​the U.S. Department of Energy. Its feasibility has been verified and production is being planned,said Wahid Bhimji, deputy head of NERSC's data and analysis services group.

HiPerGator AI ranks 22nd with 17.20 petaflops and 2nd on the Green500 list, making it the world's most energy-efficient academic supercomputer. It is a far cry from Green500's top position-only 0.18 Gflops/Watt.

Like the other 12 systems on the latest list, the system uses the modular architecture of NVIDIA DGX SuperPOD, a configuration that allows the University of Florida to quickly deploy one of the most powerful academic AI supercomputers in the world. The system also makes it a leading AI university, with a stated goal of 30,000 AI-related graduates by 2030.

Luxembourgs MeluXina is ranked 37th on HPL-AI with 10.5 Linpack petaflops. This system is the first system to be unveiled in the European national supercomputer network and will be used for the application of AI and data analysis in scientific research and commercial applications.

Cambridge-1 ranks 42nd in the top 500, reaching 9.68 Linpack petaflops, becoming the most powerful system in the UK. Commercial organizations provide services to British healthcare researchers.

Berzelius ranks 83rd with 5.25 petaflops, making it the fastest system in Sweden. Berzelius connected 60 NVIDIA DGX systems on the 200G InfiniBand network, using HPC, AI, and data analysis for academic and commercial research. It is one of the 15 on the list based on NVIDIA DGX.

10 major systems boost HPL-AI adoption

Another sign is that the importance of AI workloads is increasing, with 10 systems on the list reporting that their HPL-AI score is 5 times that of last June. Most systems have adopted major code optimizations released in March, which is the first upgrade since the University of Tennessee researchers released the benchmark at the end of 2018.

The new software simplifies communication and implements inter-GPU links, thereby eliminating the time waiting for the host CPU. It also implements communication in the form of a 16-bit code instead of the slower 32-bit code used by default on Linpack.

Azzam Haidar Ahmad said: "We cut the time for inter-chip communication by half and enabled some other workloads to run in parallel, so the new code has improved by an average of about 2.7 times compared to the original code." He helped define the benchmark and is now a senior engineer at NVIDIA.

Although this benchmark focuses on mixed-precision mathematical calculations, it still provides the same 64-bit precision as Linpack, thanks to a loop method of HPL-AI, which can quickly optimize some calculations.

Summit's HPL-AI score exceeds 1 Exaflop

After optimization, compared to the score reported last year using the earlier version of the code, the current score is much higher than the baseline.

For example, the Summit supercomputer of Oak Ridge National Lab (ORNL) was the first supercomputer to adopt the HPL-AI benchmark, and it announced in 2019 that it used the first version of the code to score 445 Petaflop. The test at this year's summit used the latest version of the HPL-AI test, with a score of 1.15 Exaflops.

Other supercomputers using this benchmark include Japans Fugaku (the fastest system in the world), NVIDIAs Selene (the worlds fastest commercial system), and Germanys strongest supercomputer Juwels.

Thomas Lippert, director of the Jülich Supercomputing Center, said: We use the HPL-AI benchmark because it can measure our growing AI and research workloads with mixed-precision operations and reflect accurate 64-bit floating-point calculation result."