C R at DHSC

The following are the DHSC sensible defaults for R:

C.1 R Version & IDE

The dominant IDE for R is Rstudio, which comes packaged with R. For a new project you should use the latest version of Rstudio available from the software portal.

C.2 General

Default to packages from the Tidyverse.These have been carefully designed to work together effectively as part of a modern data analysis workflow. More info can be found here: R for Data Science by Hadley Wickham.

For example:

  • Prefer tibbles to data.frames
  • Use ggplot2 rather than base graphics
  • Use the pipe %>% rather than nesting function calls. (...but not always e.g. see here).
  • Prefer purrr to the apply family of functions. See here

C.3 Packages

Recommended Packages:

C.4 Project Workflow

Always work in a project. See the guide to Using Projects.

Projects functionality is broken in DHSC's packaged version of Rstudio - see the fix here

C.5 Packaging Your Code

Packages are the fundamental unit of reproducible R code. Therefore, if possible, build an R Package to share and document your code.

Hadley's book on R Packages is an effective guide on how to produce a package.

The usethis package has lots of useful shortcuts for package builders.

C.6 Managing Dependencies

There are two key competing ways of managing dependencies for an R Project:

  • packrat - current established way to manage R dependencies
  • renv - rapidly maturing, successor to packrat.

See also:

C.6.1 Using old versions of packages

You may come across code which doesn't work because it depends on a different version of a package to the one you have.

Fortunately, Microsoft keep daily snapshots of CRAN and store them on the Microsoft R Application Network.

The checkpoint package from Microsoft lets you use these snapshots to install packages as if it were any day since 2017-07-01.

Simply start your script with:

library(checkpoint)

checkpoint(snapshotDate = "2015-01-15",
           checkpointLocation = getwd()) 

This will download and fetch all the packages as they existed on the given date and install them to a library on your home drive.

Notes:

  • If the code depends on BH (a lot of tidyverse code will) then this will take some time!
  • By default checkpoint puts packages on your P drive - this will be slow.
    • You can use the checkpointLocation argument to tell checkpoint to use the C drive.

C.7 Error Handling

Base R includes the try() and tryCatch() functions for handling errors. You can find an example of basic use of these on r-bloggers.

Effective error handling in R requires understanding the conditions system. There is a good chapter on this in Hadley's Advanced R book

If you are iterating over many inputs, it is recommended that you use the safely() family of functions from purrr to create versions which return errors within a list for handling at a later stage.

C.8 Unit Testing

Use the testthat package for performing unit tests. For details see the 'tests' chapter of R Packages.