Opening remarks and information about PackagingCon 2023.
The software supply chain has been an increasingly vulnerable target due to the downstream users of open source software not being aware that they are using compromised or vulnerable components. Log4Shell and SolarWinds are just two prominent examples of supply chain attacks causing significant damage to a large population of downstream users.
Package Managers already provide critical information through package metadata, however most software developed crosses several package manager boundaries (e.g. using Gradle in the back-end and NPM in the front-end). To really provide a solution to supply chain vulnerabilities, an integrated view of software dependencies need to be provided..
In this talk, we will provide practical advice to package manager software developers on how they can provide critical information in a manner that can be integrated across different package manager ecosystems resulting in greatly improved security across the entire software supply chain. We will also cover how that same information can improve open source license compliance and software product safety.
For many years the Update Framework (TUF) has been a prime reference for secure package delivery and updates. Despite its popularity, integration with existing package managers remains a challenging task.
Enter RSTUF: This new OpenSSF project has taken on the challenge to provide a generic TUF application, which primarily focuses on ease of adoption.
Cloud computing adoption is increasing, and organizations have an increasing need to secure their access to cloud resources. Traditional access control mechanisms such as access tokens, while still widely used, are insufficient to protect against modern threats. Even if the least-privilege principles are preserved, these tokens could leak and expose your infrastructure.
Identity tokens, such as OpenID Connect (OIDC), have emerged as a popular alternative for authentication and authorization in cloud environments. Even though major CI/CD platforms are now supporting these tokens - GitHub Actions, GitLab CI, CircleCI, etc. - it isn't widely adopted yet.
In this session, we'll explore the advantages of leveraging OIDC (OpenID Connect) for artifact registries, setting up artifact registries to accept OIDC tokens, and integrating OIDC-based authentication and authorization into popular artifact registry systems. Additionally, we'll showcase practical demonstrations of OIDC-based authentication and authorization in action.
Although a compromise of an entire open-source software package repository would be deadly serious, and we have evermore tools to try to address different parts of the story, the problem is that simply adding signatures to tamper-evident logs is not enough: given a software package, how can we tell why we are supposed to trust it in the first place? Who or what was supposed to sign the package? (This problem is reminiscent of the PGP/GPG Web of Trust.) Was the package tested for quality? Was it built on a trusted build platform? Who wrote the source code? Did anyone review the code? These questions and answers may be as varied as the hundreds of thousands of packages on such repositories. How would consumers such as package managers know which rules of the game to apply for which packages?
To solve this problem, we explain how we can use three foundational, open-source supply chain security frameworks called in-toto, The Update Framework (TUF), and Sigstore. If using in-toto is like Pfizer or Moderna vouching for exactly how a vaccine was made and what went into them, then TUF is like the FDA telling you why you should trust Pfizer and Moderna for the Comirnaty and Spikevax vaccines respectively in the first place or continue to do so, while Sigstore is like the Library of Congress permanently recording the history of every single vaccine vial. We will use PyPI as a motivating example, and explain how the same ideas and techniques can be used to secure other package repositories such as Cargo, Homebrew, NPM, and RubyGems.
The current software supply chain has become convoluted. We've migrated from virtual machines to containers - but at the end of the day, we're still shipping systems.
Warpforge is paving the way to greatly improved security of software supply chains through increased auditability, while uncoupling ability to build software from network dependencies and the accompanying latency.
The Python packaging ecosystem caters to a diverse set of user needs. In this session, you will learn about how Bloomberg does Python packaging and the details of how it handles the "packaging gradient" for Python, while supporting the ecosystem it is a part of.
BoFs are sessions presented by community members as an opportunity to gather and discuss special topics of interest. BoFs can be anything from agenda-driven to an open-ended discussion.
An exploration of various techniques modern package managers are using, or could use, to optimize package management and make things GO REAL FAST.
Dependency solving is at the core of most package managers, and it can be one of the major performance bottlenecks. Keeping a package solver performant as more and more packages, versions, and features are added can be very challenging. This talk is a deep dive into how we’ve done this for Spack.
Spack’s solver has two phases: one that sets up a solve based on metadata from package descriptions, and another to run the solver. We’ll look at performance issues that come from metadata management, including the cost of loading python files and the difficulty of caching metadata from package descriptions written in a pure Python DSL. We’ll also look at the solver itself. Spack’s solver uses Answer Set Programming (ASP), a Prolog-like paradigm for combinatorial problems. We have developed a profiler to look at which parts of our ASP program are exercised by different types of solves, and we’ll discuss a few useful optimizations that this has enabled.
We build tools for the conda ecosystem. We've build a new package manager (pixi), a website that indexes conda packages (https://prefix.dev), a tool to build conda packages from source (rattler-build) and much more. All of these tools are powered by Rattler, a complete reimplementation of the conda ecosystem in Rust. In this talk we will walk you through some of the technical feats that went into the library like a completely Rust based SAT solver, the different components of the library and how the library might also benefit you!
The Brew package manager is single threaded and fetches packages serially. Package managers are fast now. What could Brew do and how would it change performance?
Over the past few years, the Python community has largely unified around the new backtracking pip resolver, but many widely-used ML frameworks which bundle large amounts of binary code have historically pushed at the boundaries of pip's performance envelope and continue to require further innovation. Starting in 2019, I began to investigate how to reduce the size and bandwidth requirements of ML models deployed by the Twitter Cortex ML team, which produced initial drafts of the work that would later be upstreamed into pip as the install --report
and --use-feature=fast-deps
features. In this talk, I walk through the motivating use cases from Twitter, how these ideas were over time collectively translated into coherent standards, and how to take advantage of these improvements when building Python applications.
Modern package managers often use logic solvers (SAT, ASP, SMT, CDCL, etc) for dependency resolution. Logic solvers are highly efficient at solving NP-complete problems, but often give very little information when a solve is impossible. This talk explains the solver methods used in Spack to introduce legible error messages for users, including generating full causality chains for facts involved in determining an unsolvable state. It shows how this method allows users to bypass incompatible software combinations, and the performance issues and mitigations involved in bringing this work to production. We will also compare this solution to solutions like pubgrub and libsolv and discuss how different underlying solvers require fundamentally different solutions in this space.
In recent years software has grown in its complexity and many software packages now have a large number of dependencies.
Typical software packages may depend on tens to hundreds of other packages.
As this complexity continues to grow it becomes more and more difficult to find compatible versions in the dependency graph.
Many package managers rely on logic programming and SAT solvers to resolve version constraints, yet while these version constraints remain hand-annotated there will continue to be errors from version conflicts.
Additionally, these constraints may not hold across different architectures, OS's, and/or compilers.
In this talk we demonstrate how machine learning models that predict the probability of dependency graphs successfully building can be integrated into the package manager Spack's version selection mechanism.
We discuss how to integrate probabilistic build information into Spack's Answer Set Programming (ASP) solver via a probabilistic variant of ASP, Plingo.
Additionally, we present several means of extrapolating to new versions as they are added to the package manager.
Finally, we demonstrate and discuss the effectiveness of using probabilistic information in version selection.
Language package registries play a pivotal role in the open-source software ecosystem. However their widespread popularity has drawn the attention of malicious actors. Registry developers have responded to these attacks, as well as the public pressure for action, with identity and artifact validation features. But these efforts will take time, maintainer participation, and new package releases to address the pervasive assurance gaps that remain. To address these shortcomings, we explore an alternate approach to assess package integrity using reproducible build concepts.
The Python packaging ecosystem has a massive and diverse user community with various needs. A subset of this user base, the data science and scientific computing communities, have historically relied on the conda package and environment management tools for their workflows. conda has robust solutions for packaging and distributing libraries and managing dependencies in environments, but there are still unsolved challenges for reliably reproducing runtime environments. For instance, compute-intensive R&D activities require certain reproducibility guarantees for collaborative development and ensure production-level tools' stability and integrity. Many teams lack proper documentation and dependable practices for installing and regenerating the same runtime conditions across their software pipelines and systems, leading to product instability and release and production delays.
In this talk, we will:
* Share reproducibility best practices for Python-based data science workflows. For this, we will present real-world examples where reproducibility was not a core requirement or consideration of the project but was introduced as an afterthought.
* Demonstrate a greenfield solution to this problem: conda-store, an open source project that ensures flexible yet reproducible environments with features like version control, role-based access control, and background enforcement of best practices, all the while incorporating a user-friendly user interface.
You will learn about all the variables that affect runtime conditions (like enumerating project dependencies and technical details about your operating system and hardware). We will also present a checklist of automated tasks that should be part of a reproducible workflow and the different packaging solutions in the PyData ecosystem with a deeper focus on conda-store. We hope to share the perspective of a downstream user of the packaging ecosystem and bring attention to the conversations around runtime-environment reproducibility.
When shipping software and systems at scale, it's desirable to bundle libraries into files that can be shared between applications, in order to reduce the system size. However, this comes with tradeoffs: in order to share things, we have to organize. More concretely: this means for any applications that want to share libraries and data objects, they have to agree on file naming conventions.
This simple need leads to a much bigger problem: the organizational agreement problem doesn't scale. Individual package managers solve this problem in distinct ways. And which libraries and data objects themselves can be shared can be a point of contention: some systems are based on ideas like "major version number is enough" (which, spoiler, invariably creates problems). The end result? Packages from different ecosystems can't share dependencies; whole linux distributions become rapidly balkanized from each other because of library versioning and filename collision issues; and in the worst scenarios, it becomes impossible to install different versions of some software and libraries on a single system, due to name collisions.
There has to be a solution. Where else in computing have we seen (and solved) the problem of "there are many small variations of a piece of data, and we need to keep all of them, despite naming each one being an inhuman problem?" Right: version control. And what was ultimately the solution in version control? Content-addressing: hash the thing, and index the storage by that.
So can we use the same solution to make a new golden age where shared objects and shared libraries are both easy and reliable and conflict-free?
In this talk, we'll explore the problem space -- what hurdles are there to sharing? What really needs to count as unique? How can we wire up existing library loading systems to meet our goals (without rewriting the universe?) -- we'll survey some prior art, and we'll wrap up with some questions that we hope can lead to a better future for all!
The advent of WebAssembly has transformed web application development, empowering developers to harness the potential of low-level languages like C and C++ in the browser environment. Emscripten-Forge, a conda-based distribution, closely resembling conda-forge, is designed to cater specifically to WebAssembly code compiled with Emscripten—an LLVM-based toolchain for WebAssembly.
The practice of maintaining a Secure Software Supply Chain (S3C) helps provide actionable insight for developers consuming upstream packages. However, in the industry’s efforts to shift security left, the Software Supply Chain often ignores the “final mile” of the manufacturing and delivery of applications to consumers’ devices. In this talk, we will talk about the history, current status and future of code signing and how it can be leveraged to ship secure applications at massive scale.
Join us as we delve into secure software releases, focusing on the real-world scenario of implementing the SLSA (Supply-Chain Levels for Software Artifacts) v1.0 standard in popular CI/CD systems such as GitHub and Azure Pipelines. In the face of growing threats to packages, distributions, releases, and dependencies from software supply chain attacks, SLSA offers a crucial standard to secure artifacts.
We will explore SLSA in detail, the differences the v1.0 standard offers comparing its predecessor, and understand how SLSA would have helped mitigate previous software supply chain attacks.
Then we dive into the implementation of SLSA and show how to apply it to secure builds done in popular systems, such as GitHub and Azure Pipelines, including a live demo of how to generate and use SLSA to secure a containerized software release in an OCI registry.
Homebrew added license field support since 2.4.3 release, and since then, we have gradually included the licensing information into all the existing formulae. During this journey, we also work with SPDX team to include licensing information for the existing, very old formulae. Often the case, I would also reference the licensing information in the repology or upstream to get a direrct clarification. In this talk, I am gonna talk about the practices that we have done in the homebrew side and how we can collaborate.
Welcome to the second day of PackagingCon 2023.
The one feature every packaging system must have is naming: every package needs an identifier. The naïve approach when designing a new packaging system is to set up a flat namespace. After all, it's simple, and even if you just allow six A-Z characters, that's over 300 million possible names. But how does this work in practice?
Package ecosystems burst onto the scene, sometimes slowly and sometimes… a little bit loudly. That part of the story is familiar. But what happens when that ecosystem is no longer the new shiny thing, and the larger ecosystem it lives in has moved onto? This talk will chronicle the tale of CocoaPods, the unofficial 3rd party package manager for Apple ecosystem development, and how its maintainers have helped it gently fade away over the past decade.
Chainguard started with a plan to build secure container images. They ended up building a whole new Linux (un)distribution and tooling. Come learn all about Wolfi and why it exists!
The Nix ecosystem has long been praised for its pioneering approach to reproducibility, making it a favourite among developers and sysadmins who value consistent, predictable systems. At the heart of this ecosystem is the Nix package manager, and its evolution has been characterized by continuous innovation. One of the most recent and groundbreaking of these innovations is Nix Flakes.
This talk introduces FlakeHub, a new platform from Determinate Systems that supercharges Nix flakes.
This talk will provide a developer-minded introduction to "trusted publishing," an OpenID Connect-based authentication scheme that PyPI has successfully deployed to reduce the need for (and risk associated with) manual configured API tokens. Thousands of packages (including many of Python's most critical packages) have already enrolled in trusted publishing, improving the overall security posture (and audibility) of the Python ecosystem.
We will cover trusted publishing in two parts: the first part will be a high-level overview of the trusted publishing scheme and how it uses ephemeral OpenID Connect credentials, including motivation for the scheme's security properties and how they improve upon pre-existing package index authentication schemes (e.g. user/password pairs and long-lived API tokens).
The second part will dive into the nitty-gritty details of how trusted publishing was implemented on PyPI, and will serve as both a retrospective on the work and a reference for other package indices considering similar models: it will cover some of the challenges posted by OIDC (including support for multiple identity providers), threat model considerations, as well as "knock-on" benefits (such as future adjoiners with code-signing schemes like Sigstore).
Supply chain attacks have increased YoY by more than 700%. High profile attacks like those against SolarWinds or Codecov have exposed the kind of supply chain integrity weaknesses. Supply-chain Levels for Software Artifacts (SLSA) is a set of incrementally adoptable guidelines to prevent tampering, improve integrity, and secure packages and infrastructure. SLSA v1.0 specifications were released in April 2023, and several commercial products are already available.
Writing a SLSA builder from scratch is, however, a tedious multi-month effort. In this talk, we will present the "Build Your Own Builder" (BYOB) framework for GitHub Actions: a set of APIs that empowers anyone to create a SLSA 3 compliant builder on GitHub in a matter of days. In particular, the BYOB framework makes it easy for GitHub Action maintainers to meet the highest SLSA Build L3 requirements. As a builder author, you don't need to worry about keeping signing keys secure, isolation between builds, the creation of attestations; all this is handled seamlessly by the framework.
Lessons learned from adding build provenance to the npm registry: linking npm packages back to their originating source code and build instructions using cloud CI/CD, Sigstore and SLSA.
BoFs are sessions presented by community members as an opportunity to gather and discuss special topics of interest. BoFs can be anything from agenda-driven to an open-ended discussion.
BoFs are sessions presented by community members as an opportunity to gather and discuss special topics of interest. BoFs can be anything from agenda-driven to an open-ended discussion.
In this talk, I look at the two common package managers on Windows and explore their commonly used features in a real-world context.
WinGet and Chocolatey are compared a lot. There is a LOT of marketing and fluff articles and blog posts written about WinGet with little real-world practicality. The landscape is being skewed and becoming a place where it's becoming difficult to understand which is best for you or your organization. As a techie, I wanted to look at a practical comparison that people can use in the real-world decisions.
Have you ever pondered why our software projects have README to explain how to install them?
That's because it can be hard to automate the installation of the dependencies of a project.
In this work, we will challenge and explore the actual difficulty behind why do we still need READMEs and human instructions to install native dependencies for projects, via a research project, called BuildXYZ, which provides an automatic on-demand dependency dispenser based on a FUSE filesystem that will lazily provide the dependency whenever you actually ask for it on the filesystem.
We will show how such a system performs on tasks such as pip install numpy
and relate this to the increasing coupling between application-specific package managers and cross-language dependencies, such as Python library with Rust code, e.g. cryptography.
A deep dive in to GNU Guix and new tooling to help maintain and improve the quality of Guix packages, while at the same time increasing the number available.
The complexity of software has been increasing, where a typical application relies on tens or even hundreds of packages. The task of finding compatible versions and configuring builds for these packages poses a significant challenge. This talk introduces a method in which we leverage cutting-edge AI technology and advanced package management methodologies to address the challenges of managing software ecosystems. We use graph neural networks (GNNs) to analyze a prominent software ecosystem in HPC, the Exascale Computing Project (ECP) software stack E4S. By using the ECP’s E4S stack as an example, and leveraging Spack’s parameterized package recipes, we demonstrate that GNNs can be effectively trained to understand the build incompatibilities in a large software ecosystem and identify configurations that will not work, without the need to actually build them.
AI models (especially LLMs) are now being released at a never seen before frequency. At the same time, supply chain attacks increase YoY by more than 700%. Coupling these two facts together reveals a shocking perspective: it is very possible for bad actors to infect unsuspecting host that want to benefit from the AI explosion. Fortunately, by drawing analogies between training AI models and building traditional software artifacts, we could build solutions to package ML models such that the majority of the supply chain security risks are alleviated.
Socket Security employs a unique blend of static analysis and AI reasoning to detect malicious packages within npm and pypi registries. Our system has flagged over 6,000 threats in real time, showcasing its efficacy in scaling across 190,000+ repositories and hundreds of millions of unique package versions. We will discuss some of the challenges and tricks we've used to get this system working and give some general thoughts on prompt engineering for data mining applications.
WebAssembly Interfaces (WAI) is a project associated with Wasmer that functions as a bindings generator framework for WebAssembly programs and their embeddings. WAI is poised to play a pivotal role in creating universal packages, as it bridges the gap between various languages and runtimes. Its capacity to generate language-agnostic interface types allows WebAssembly modules to be written and consumed across a diverse array of languages and environments.
- WAI
- integration with npm, pip for automatically publishing to other registries.
The European Environment for Scientific Software Installations (EESSI, pronounced as "easy") is a collaboration between different HPC sites and industry partners, with the common goal of setting up a shared repository of (optimized) scientific software installations that can be used on a variety of Linux systems, regardless of which (version of) Linux distribution or processor architecture is used.
Bazel is the open source build system from Google, which works for multiple languages on multiple platforms. While the original internal build system was designed around Google's monorepo, Bazel has to handle the external dependencies challenges in the open source world. In our session, we will reveal the history of managing external dependencies with Bazel and how Bzlmod was developed as the package manager for the Bazel ecosystem.
We’ll talk about the tradeoffs of global vs project based package managers and introduce Devbox by jetpack.io, a powerful open-source tool that leverages nix to create portable, reproducible environments.
Analyzing the dependencies as declared by package managers is the first step towards creating SBOMs or to query known vulnerabilities for software projects. This talk gives an overview over the abstractions done in the OSS Review Toolkit to support more than 25 package managers and the challenges in modelling their different behaviors and resolution processes.
Poetry is a quite popular tool for dependency management and packaging of Python projects. A prominent feature of Poetry is the generation of an environment-independent lockfile. This means that it does not matter if the lockfile has been created on Linux or Windows, with Python 3.8 or Python 3.11 and so on, it will be the same and suitable for each possible environment.
There are many different package registries in different ecosystems. We rely on them so much now that we take them for granted. But how do they work, and what’s inside? This talk explores what makes package registries tick, and how to mirror them with integrity. We'll focus on Rubygems, but touch on NPM (JavaScript/Node), Hex (Elixir), Homebrew (macOS), Ubuntu (Debian) and Fedora (RPM).
Software Bill Of Materials (SBOMs) are booming (or sBO(O)Ming) today, becoming a backbone of many Software Supply Chain security and compliance efforts. This session will cover the speakers' real-world experiences when they created their own SBOM format and put it in production long before SBOM became a thing. We will talk about SBOM basics, formats, and industry standards, showcase three stages for SBOM management (collection/producers, distribution/storage, and analysis/consumers), walk you through various rapidly growing tools from each category, and discuss strategies for building your own built-to-your-spec solution.
Wrapping up PackagingCon 2023