Optimizing Dependency Solves in Spack
10-26, 13:20–13:45 (Europe/Berlin), Main stage

Dependency solving is at the core of most package managers, and it can be one of the major performance bottlenecks. Keeping a package solver performant as more and more packages, versions, and features are added can be very challenging. This talk is a deep dive into how we’ve done this for Spack.

Spack’s solver has two phases: one that sets up a solve based on metadata from package descriptions, and another to run the solver. We’ll look at performance issues that come from metadata management, including the cost of loading python files and the difficulty of caching metadata from package descriptions written in a pure Python DSL. We’ll also look at the solver itself. Spack’s solver uses Answer Set Programming (ASP), a Prolog-like paradigm for combinatorial problems. We have developed a profiler to look at which parts of our ASP program are exercised by different types of solves, and we’ll discuss a few useful optimizations that this has enabled.


The first part of this talk will talk about the challenges all solvers face in managing package metadata, and how we do it in Spack. Specifically, we’ll talk about how metadata makes its way from Spack’s package.py files (an embedded Python DSL) to a set of data dictionaries, and finally how those are translated to ASP. Loading data from hundreds of pure python files can be very slow, and it’s not easy to know which metadata files can be cached when users can use arbitrary Python code in their package descriptions. We’ll talk about a const-ness analyzer we’ve built to optimize this and to enable caching of most package files in Spack’s builtin repository, greatly speeding the metadata management portion of our solve. We’ll also talk about caching solves based on solver input, which can allow us to avoid solves some scenarios.

The second part of this talk covers optimizations to the logic solver itself. Most SAT tools do not give much insight into which decisions are costing the most time during the solve, but we’ve developed a profiler for clingo that allows us to see exactly which rules are most costly. We will talk about how this has allowed us to fine-tune our solver even as the logic in it has become more and more complicated. We’ll talk about what sorts of rules tend to cost the most and how we worked around them.

Todd Gamblin is a Distinguished Member of Technical Staff in the Livermore Computing division at Lawrence Livermore National Laboratory. He created Spack, a popular open source HPC package management tool with a rapidly growing community of contributors. He leads the Packaging Technologies Project in the U.S. Exascale Computing Project, LLNL's DevRAMP project on developer productivity, and an LLNL Strategic Initiative on software integration. His research interests include dependency management, software engineering, parallel computing, performance measurement, and performance analysis.

This speaker also appears in: