Principle 7 Be Demonstrably Correct

Have a clear and robust way to demonstrate your code is correct. We need to be confident in the outputs we provide. Just because something is done with code doesn't make it immune from answering the wrong question, using the wrong inputs, or doing the calculation incorrectly.

You Must - Hold your code to the same standard as any other analysis and record evidence demonstrating it produces the right output.

You Should - Use version control to unambiguously link QA to code and outputs and construct automated tests to provide confidence that changes don't break things.

You Could - Make a fully automated reproducible analytical pipeline (RAP) which incorporates checks and validation and minimises opportunity for human error.

Related Areas: Version Control
Be Reproducible

7.1 Quality Assurance Applies to Code

Just because you have written code rather then making a spreadsheet doesnt mean your analysis is correct. Code is not exempt from Quality Assurance processes. As with any other analysis you need to record evidence that your code is:

  • doing the right thing
  • using valid inputs
  • producing a sensible answer.

You should refer to the Analytical Modelling Oversight Committee (AMOC) Quality Assurance (QA) materials which contain useful prompts and frameworks for quality assuring analysis of different size, importance and complexity.

7.2 Testing Frameworks

Your code and analysis will grow and evolve. You wont have time to QA every version, and it can be tricky to keep track of which bits of QA have been made obsolete due to new or changed code.

There are frameworks which help you construct and run tests on units of your code. These can be a good way to demonstrate that code is working correctly as you update it.

See R at DHSC and Python at DHSC for more details.

7.3 Version Control Integration

Having unit tests, and QA is good. Ideally however you can tie a particular result to a particular version of the QA'd and tested code. You could do this manually, by keeping the code for each set of outputs.

Using git for version control makes this process easy. You can:

  • Make a commit to the repository with a note like: output for XYZ on dd/mm/yy so you can identify the version used to produce outputs in future.
  • Use tools such as gitpython or git2r to include the git commit hash which identifies the current version of the code in the output. This can then later be retrieved from your git repository.

7.4 Reproducible Analytical Pipelines

Once you have some QA'd, version controlled and test covered code, the biggest source of error will be the manual steps performed by the analyst running it. You can eliminate a lot of this, see Reproducible Analytical Pipelines.