Jaime Rodríguez-Guerra

Jaime holds a PhD in Biotechnology and believes that packaging is one of the pillars for reproducible research. He became a conda enthusiast while working on molecular modelling frameworks and machine learning pipelines for drug design.

The speaker's profile picture

Sessions

10-26
16:45
25min
Ensuring Runtime Reproducibility in the Python Ecosystem
Jaime Rodríguez-Guerra

The Python packaging ecosystem has a massive and diverse user community with various needs. A subset of this user base, the data science and scientific computing communities, have historically relied on the conda package and environment management tools for their workflows. conda has robust solutions for packaging and distributing libraries and managing dependencies in environments, but there are still unsolved challenges for reliably reproducing runtime environments. For instance, compute-intensive R&D activities require certain reproducibility guarantees for collaborative development and ensure production-level tools' stability and integrity. Many teams lack proper documentation and dependable practices for installing and regenerating the same runtime conditions across their software pipelines and systems, leading to product instability and release and production delays.
In this talk, we will:
* Share reproducibility best practices for Python-based data science workflows. For this, we will present real-world examples where reproducibility was not a core requirement or consideration of the project but was introduced as an afterthought.
* Demonstrate a greenfield solution to this problem: conda-store, an open source project that ensures flexible yet reproducible environments with features like version control, role-based access control, and background enforcement of best practices, all the while incorporating a user-friendly user interface.

You will learn about all the variables that affect runtime conditions (like enumerating project dependencies and technical details about your operating system and hardware). We will also present a checklist of automated tasks that should be part of a reproducible workflow and the different packaging solutions in the PyData ecosystem with a deeper focus on conda-store. We hope to share the perspective of a downstream user of the packaging ecosystem and bring attention to the conversations around runtime-environment reproducibility.

Main stage