Rebuilding Trust: Asserting Integrity in Language Package Ecosystems
10-26, 16:20–16:45 (Europe/Berlin), Main stage

Language package registries play a pivotal role in the open-source software ecosystem. However their widespread popularity has drawn the attention of malicious actors. Registry developers have responded to these attacks, as well as the public pressure for action, with identity and artifact validation features. But these efforts will take time, maintainer participation, and new package releases to address the pervasive assurance gaps that remain. To address these shortcomings, we explore an alternate approach to assess package integrity using reproducible build concepts.


A common view of the OSS supply chain is that a package consumer trusts a package producer based on the source code that comprises the package. However between the consumer and source code, there is often a complex set of systems and processes the consumer implicitly, and often unknowingly, trusts. While some language ecosystems have designed these intervening build/CI/identity components out of the trust base (e.g. Golang), many others have had to adapt their designs to account for these security gaps.

Effecting change in language package registries poses challenges due to the large and diverse community of package maintainers. Rolling out new security policies to existing packaging ecosystems often means maintainers bear the cost of change while consumers reap the security benefits. One approach to addressing registry deficiencies is by creating a parallel ecosystem, similar to OS distros or repackagers like Nix and Conda. However, this approach tends to separate ownership from the original ecosystem and limits contributions back to the upstream registry.

An alternative strategy proposed in this talk involves using automation and heuristics to reproduce upstream artifacts and publish the resulting metadata. This allows consumers to verify the source-equivalence of a package, document the modification or rebuilding process for an artifact, and incorporate these properties into the existing language ecosystem registry. The high success rate of automation for popular ecosystems like PyPI, NPM, and Maven means most packages will inherit these security properties without human intervention. However, manual alternatives are available for cases where automation falls short, enabling consumers to adjust or provide their own reproductions while aligning the cost of conformance with the party benefiting from the security improvements. This model aims to bridge integrity gaps for all users, advancing these ecosystems towards a more secure future.

Matthew Suozzo is a Software Engineer at Google working on supply chain security.