S05E07: R4DS (2e) - Ch. 9 - Workflow: scripts and projects
Today we will go over tools for organizing code: scripts and projects, and ways to write code to facilitate getting help.
Introduction
Today we are going to talk about tools that will help you organize and execute your code, and get help when you need it:
- scripts: a
.R
file where you can write code, execute code, and save code for future use. - projects: a .
Rproj
file that allows you to keep all the files associated with an a analysis together. - reprex: a reproducible example that allows someone else to help you troubleshoot your problem.
R Scripts (.R
)
If you have the conventional layout in RStudio, your scripts will be loaded in the top left quadrant of your screen.
You could simply type any code you want to execute into the console to run it, but long term this is not a good or practical solution. It can get cramped, and most of the time you want to save your code you write so you can use it again in the future.
The R script file acts like a text file where you can write code, and then send it to the console. You can also save the file so you can use/edit it later.
Running code
There are a few different ways in which you can execute code from your script by sending it to the console.
- By using Run in the top right corner of your Environment pane.
- Using a keyboard shortcut: place your cursor on the line you want to run and type
Cmd + Enter
on a Mac/Linux, orCtrl + Enter
on a PC.
Code diagnostics
RStudio gets continually smarter in pointing out potential mistakes in your code to you, so you can fix them. Below are some examples of what this will look like.
If you put non-allowable characters in an object name:
If you are missing a parentheses:
Saving and naming
Talking about conventions for saving and naming might sound picky, but doing this in a systematic and predictable way will help you (and anyone who uses your code) in the future. In general you want your names to be:
- Machine readable: don’t contain any spaces, symbols, or otherwise unallowable characters. For example, R will not prevent you from using spaces in column names, but it will from that point forward need to be encased in backticks which you will definitely find annoying.
- Human readable: contain names that indicate to you and others what is contained within that object. For example, if you name all your dataframes Data, Data1, Data2, Data3, Data 4… you will have trouble in the future remembering the differences between them.
- Some inherent organizational structure, like numbering the names of scripts in the order they are run, making sub-folders, so it’s easy to understand what comes from where, and to find what you’re looking for. For example, I like to make sub-folders for data, output, figs, so that my parent directory stays organized. If you have a few scripts and they need to run sequentially, it would be a good idea to name them so that order is apparent, like
01_data-import-wrangling.R
,02_stat-analysis.R
,03_plot-figs.R
.
R Projects (.Rproj
)
Artwork by @allison_horst
A reminder about directories
A couple of weeks ago, Jelmer shared with us some information about getting and your working directory using getwd()
and setwd()
.
The reason why we have to do this is because we need to indicate in R where it should be looking for importing files, and where it should be writing out files from our analyses. R cannot read your mind, so you have to tell it where this location is.
Why saving your environment is a bad idea
By default, R will save, or ask if you if you want to save your environment. Your environment contains all of the working objects, data, functions that you have been using for your analysis.
We recommend that you turn off this auto-saving of your environment to aid in your own reproducible analysis in the future. You can do that by going to Tools > Global Options
Making a reprex
It is likely that as you travel on your coding journey, you will come across a problem that despite whatever you try, you are struggling to solve. You might want to post about your problem on Stack Overflow so you can get some help.
To do this, you should create a reprex: a reproducible example.
Artwork by @allison_horst
Within your reprex, you should:
-
include everything you need to make your code reproducible. That includes any
library()
calls and any necessary objects. -
think minimalistically - don’t use more or more complicated data if you example can be simpler. If you can use built in datasets from R (think,
mtcars
,iris
), do so. -
make sure its easy for someone to reproduce what you did via copy and pasting your code (no screenshots!)
Often the process of creating the reprex will help you figure out the answer to your own problem.
This isn’t just good for posting about coding problems, if you want to ask your lab mate, instructor, collaborator, friend about a problem and you’d like their help in solving it, make it easy for them to help you.
Here is some more information about creating a good reprex from the package reprex and on Stack overflow.
Breakout Rooms
We are going to practice what we’ve gone over today with some breakout exercises.
Exercise 1
Set yourself up an R project for your Code Club files. Store it in a permanent and logical space on your computer (i.e., not your downloads folder or desktop), and once you’ve done it, open it up and use it for the rest of Code Club.
Exercise 2
Using your new project, take a file from a past Code Club or from your research, and load it into R using what you’ve learned in the past two sessions (Data Import and Data Import 2). Also try and export what you’ve just imported and save it in a subfolder called “output”.
Exercise 3
Go to the RStudio Tips Twitter account, https://twitter.com/rstudiotips and find one tip that looks interesting. Practice using it!
Exercise 4
What other common mistakes will RStudio diagnostics report? Read https://support.posit.co/hc/en-us/articles/205753617-Code-Diagnostics to find out.
Bonus 1
Try and create a reprex for a coding problem you’ve run into.