Principle 9 Be Reproducible
Work in a way which is reproducible. Within the department, analysis is used to enable evidence based decision making. A piece of evidence which you cannot rely on being able to reproduce is not much good. There are many reproducibility pitfalls and it is our responsibility to overcome them.
You Must - Keep track of what you have done and document it unambiguously so that someone else can recreate it.
You Should - Write portable code, in a standard project structure so that it is easy for someone else to run it.
You Could - Turn your code into a package / library / module, learn and promote RAP techniques, or use containers to achieve reproducibility.
Related Areas: | Demonstrably Correct Documentation |
---|
9.1 Unambiguous Documentation for Reproducibility
To be able to reproduce your analysis a colleague may need the following:
- The right copy of the code
- The right versions of any dependencies (i.e. libraries used in the code)
- The platform on which code is run
- operating system
- folder structure
- machine specifications
- The source data, or details of how to get it.
At the most basic level, documenting all of these will go a long way to making your analysis reproducible. It might not make it easy to reproduce however.
9.2 Portability
There are some simple thing you can do to improve the chance that your code runs on other computers:
- Use relative paths, not absolute paths. (Wikipedia - Absolute and Relative Paths).
- Use a standard and consistent structure for organising your work. See Projects and Environments for more details.
9.3 Project Structure
Most languages offer tools and templates for a project based workflow. Typically these include a way of organising the following components:
- Source Data
- Code
- Outputs
- Environment / Dependencies
- Documentation
By following a standard template for these components you can take advantage of workflow tools provided by your IDE which make it easier to:
- Version Control your work
- Organise your code and source data
- Refactoring and improving your code
- Producing documentation
- Control your environment and dependencies
All of these things are good for sharing or collaborating with others.
See R at DHSC and Python at DHSC for more information.
9.4 Reproducible Analytical Pipelines
There is a government community dedicated to the production of reproducible analysis. See Reproducible Analytical Pipelines for more.
9.5 Packages and Modules
Most languages have a standard structure which is used to share code and documentation with other people. You will likely have used code in this structure (libraries / packages / modules) when performing your analysis. Typically these structures include documentation, information about dependencies, and tests.
There is no reason you can't use the same approach to sharing your analysis!
See R at DHSC and Python at DHSC for more information.
9.6 Containers / Docker
Containers allow you to manage the whole environment which a bit of code runs in. They are powerful but perhaps more technically involved than packaging your code or using project structures to manage your environment.
Docker is a containerisation platform, which lets you reproduce environments with a wider scope than just the packages present. With Docker you can manage the entire environment from the operating system and network up (including any packages).
You can use tools such as docker-compose and Kubernetes to manage groups of containers relative to one another.