Probabilistic Package Builds: Guiding Spack's Concretizer with Predicted Build Outcomes
10-26, 15:40–16:05 (Europe/Berlin), Main stage

In recent years software has grown in its complexity and many software packages now have a large number of dependencies.
Typical software packages may depend on tens to hundreds of other packages.
As this complexity continues to grow it becomes more and more difficult to find compatible versions in the dependency graph.
Many package managers rely on logic programming and SAT solvers to resolve version constraints, yet while these version constraints remain hand-annotated there will continue to be errors from version conflicts.
Additionally, these constraints may not hold across different architectures, OS's, and/or compilers.
In this talk we demonstrate how machine learning models that predict the probability of dependency graphs successfully building can be integrated into the package manager Spack's version selection mechanism.
We discuss how to integrate probabilistic build information into Spack's Answer Set Programming (ASP) solver via a probabilistic variant of ASP, Plingo.
Additionally, we present several means of extrapolating to new versions as they are added to the package manager.
Finally, we demonstrate and discuss the effectiveness of using probabilistic information in version selection.


In recent years, the ever-increasing complexity of software packages has posed significant challenges in managing dependencies.
As software projects depend on numerous packages, ensuring compatibility and resolving version constraints become critical tasks for package managers.
While traditional methods rely on greedy selection, logic programming, and SAT solvers, the hand-annotated version constraints can lead to errors and conflicts.
Additionally, version constraints need to be constantly updated for new versions and may not hold across platforms.

In this talk, we explore how to address these challenges by integrating machine learning models that predict the probability of successful dependency graph builds.
We demonstrate this integration using the Spack package manager's highly parameterized package definitions and extensive build capabilities.
It is further accomplished through Spack's concretizer, which is responsible for selecting dependencies, versions, and build flags.
We discuss how probabilistic build information can be integrated into the Answer Set Programming (ASP) solver that powers Spack's concretizer.
Several methods using the probabilistic ASP variant Plingo are compared in addition to methods that adapt the current Clingo solver.
We demonstrate the effectiveness of each of these approaches and discuss their technical merits as methodologies to integrate probabilistic build information into a package manager.

We further discuss how to handle new packages and versions that are added to the package manager.
Several heuristics and statistical metrics are compared for extrapolating build probabilities to newer versions of packages.
It is also shown how to handle packages that are unknown to the model within the probabilistic concretizer.

Daniel Nichols is a Computer Science PhD student at the University of Maryland, College Park studying topics in high performance computing, applied machine learning, and performance modeling.