In record time, Vikram Gavini’s lab crossed a big milestone in viewing tiny things.
The three-person team at the University of Michigan crafted a program that uses complex math to peer deep into the world of the atom. It could advance many fields of science, as well as the design for everything from lighter cars to more effective drugs.
The code, available in the group’s open source repository, got a 20x speedup in just 18 months thanks to GPUs.
A Journey to the Summit
In mid-2018 the team was getting ready to release a version of the code running on CPUs when it got an invite to a GPU hackathon at Oak Ridge National Lab, the home of Summit, one of the world’s fastest supercomputers.
“We thought, let’s go see what we can achieve,” said Gavini, a professor of mechanical engineering and materials science.
“We quickly realized our code could exploit the massive parallelism in GPUs,” said Sambit Das, a post-doc from the lab who attended the five-day event.
Before it was over, Das and another lab member, Phani Motamarri, got 5x speedups moving the code to CUDA and its libraries. They also heard the promise of much more to come.
From 5x to 20x Speedups in Six Months
Over the next few months, the lab continued to tune its program for analyzing 100,000 electrons in 10,000 magnesium atoms. By early 2019, it was ready to run on Summit.
Taking an iterative approach, the lab ran increasing portions of its code on more and more of Summit’s nodes. By April, it was using most of the system’s 27,000 GPUs, getting nearly 46 petaflops of performance, 20x prior work.
It was an unheard-of result for a program based on density functional theory (DFT), the complex math that accounts for quantum interactions among subatomic particles.
Distributed Computing for Difficult Calculations
DFT calculations are so complex and fundamental that they currently consume a quarter of the time on all public research computers. They are the subject of 12 of the 100 most-cited scientific papers, used to analyze everything from astrophysics to DNA strands.
Initially, the lab reported its program used nearly 30 percent of Summit’s peak theoretical capability, an unusually high efficiency rate. By comparison, most other DFT codes don’t even report efficiency because they have difficulty scaling beyond use of a few processors.
“It was really exciting to get to that point because it was unprecedented,” said Gavini.
Recognition for a Math Milestone
In late 2019, the group was named a finalist for a Gordon Bell award. It was the lab’s first submission for the award that’s the equivalent of a Nobel in high performance computing.
“That provided a lot of visibility for our lab and our university, and I think this effort is just the beginning,” Gavini said.
Indeed, since the competition, the lab pushed the code’s performance to 64 petaflops and 38 percent efficiency on Summit. And it’s already exploring its use on other systems and applications.
Seeking More Apps, Performance
The initial work analyzed magnesium, a metal much lighter than the steel and aluminum used in cars and planes today, promising significant fuel savings. Last year, the lab teamed up with another group exploring how electrons move in DNA, work that could help other researchers develop more effective drugs.
The next big step is running the code on Perlmutter, a supercomputer using the latest NVIDIA A100 Tensor Core GPUs. Das reports he’s already getting 4x speedups compared to the Summit GPUs thanks to the A100 GPUs’ support for TensorFloat-32, a mixed-precision format that delivers both fast results and high accuracy.
The lab’s program already offers 100x speedups compared to other DFT codes, but Gavini’s not stopping there. He’s already thinking about testing it on Fugaku, an Arm-based system that’s currently the world’s fastest supercomputer.
“It’s always exciting to see how far you can get, and there’s always a next milestone. We see this as the beginning of a journey,” he said.