The Department of Energy’s National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laboratory (LLNL) recently unveiled Sierra, one of the world’s fastest supercomputers. It will serve NNSA’s three nuclear security laboratories—LLNL, Sandia, and Los Alamos National Laboratories—providing high-fidelity simulations in support of NNSA’s core mission of ensuring the safety, security, and effectiveness of the nation’s nuclear stockpile.
Sierra’s arrival follows years of procurement, design, code development, and installation. It required the efforts of hundreds of computer scientists, developers, and operations personnel working in close partnership with IBM, NVIDIA, and Mellanox.
Sierra, ranked as the third-fastest supercomputer in the world on the latest TOP500 list, is NNSA’s first large-scale production heterogeneous computer, meaning each node incorporates both IBM central processing units (CPUs) and NVIDIA graphics processing units (GPUs). It is specifically designed for modeling and simulations essential for NNSA’s Stockpile Stewardship Program, ongoing life extension programs, weapons science, and nuclear deterrence. It is expected to go into use for classified production in early 2019.
Sierra boasts a peak performance of 125 petaFLOPS—125 quadrillion floating-point operations per second. Early indications using existing codes and benchmark tests are promising, demonstrating as predicted that Sierra can perform most required calculations far more efficiently in terms of cost and power consumption than computers consisting of CPUs alone. Depending on the application, Sierra is expected to be six to 10 times more capable than LLNL’s 20-petaFLOP Sequoia, currently the world’s eighth-fastest supercomputer.
To prepare for this architecture, LLNL has partnered with IBM and NVIDIA to rapidly develop codes and prepare applications to effectively optimize the CPU/GPU nodes. IBM and NVIDIA personnel worked closely with LLNL, both on-site and remotely, on code development and restructuring to achieve maximum performance. Meanwhile, LLNL personnel provided feedback on system design and the software stack to the vendor.
LLNL selected the IBM/NVIDIA system due to its energy and cost-efficiency, as well as its potential to effectively run NNSA applications. Sierra’s IBM POWER9 processors feature CPU-to-GPU connection via NVIDIA NVLink interconnect, enabling greater memory bandwidth between each node so Sierra can move data throughout the system for maximum performance and efficiency. Backing Sierra is 154 petabytes of IBM Spectrum Scale, a software-defined parallel file system, deployed across 24 racks of Elastic Storage Servers (ESS). To meet the scaling demands of the heterogeneous systems, ESS delivers 1.54 terabytes per second in both read and write bandwidth and can manage 100 billion files per file system.
“The next frontier of supercomputing lies in artificial intelligence,” said John Kelly, senior vice president, Cognitive Solutions and IBM Research. “IBM's decades-long partnership with LLNL has allowed us to build Sierra from the ground up with the unique design and architecture needed for applying AI to massive data sets. The tremendous insights researchers are seeing will only accelerate high-performance computing for research and business.”
As the first NNSA production supercomputer backed by GPU-accelerated architecture, Sierra’s acquisition required a fundamental shift in how scientists at the three NNSA laboratories program their codes to take advantage of the GPUs. The system’s NVIDIA GPUs also present scientists with an opportunity to investigate the use of machine learning and deep learning to accelerate the time-to-solution of physics codes. It is expected that simulation, leveraged by acceleration coming from the use of artificial intelligence technology will be increasingly employed over the coming decade.
In addition to critical national security applications, a companion unclassified system, called Lassen, also has been installed in the Livermore Computing Center. This institutionally focused supercomputer will play a role in projects aimed at speeding cancer drug discovery, precision medicine, research on traumatic brain injury, seismology, climate, astrophysics, materials science, and other basic science benefiting society.
Sierra continues the long lineage of world-class LLNL supercomputers and represents the penultimate step on NNSA’s road to exascale computing, which is expected to start by 2023 with an LLNL system called “El Capitan.” Funded by the NNSA’s Advanced Simulation and Computing (ASC) program, El Capitan will be NNSA’s first exascale supercomputer, capable of more than a quintillion calculations per second—about 10 times greater performance than Sierra. Such computing power will be easily absorbed by NNSA for its mission, having required the most advanced computing capabilities and deep partnerships with American industry.
“In just a few short years, we expect to see exascale systems deployed at Lawrence Livermore, Argonne, and Oak Ridge (national laboratories), ensuring our global superiority in this arena for years and decades to come,” Perry said. “Starting with Sierra, this new generation of supercomputers will be an absolute game-changer for the world.”