Skip to main content
CS Colloquium | October 28, 2020

Next-Generation of HPC Batch Scheduling

Tapasya Patki
Lawrence Livermore National Laboratories

Stevenson Hall 1300
12:00 PM - 12:50 PM

Resource management and batch job scheduling is a crucial part of the software stack of operating capable large-scale supercomputers, enabling multiple users to share the available resources fairly and efficiently. In this talk, I will discuss some of the key challenges in the next-generation of HPC scheduling, which include: managing resources such as power, supporting diverse and complex high-throughput workflows, and efficient utilization of heterogeneous components. I will discuss SLURM and Flux, which are both well-known resource management frameworks for HPC. Flux is a hierarchical next-generation resource management framework that is being actively developed at LLNL. I will also present my ongoing research in power-aware scheduling and variation-aware scheduling with Flux.