S04E08: R for Data Science - Chapters 4, 6, and 8
Covering some R Basics, how to work with R scripts, and introducing RStudio Projects
I – Brief Code Club Introduction
Organizers
- Michael Broe – Evolution, Ecology and Organismal Biology (EEOB)
- Jessica Cooperstone – Horticulture & Crop Science (HCS) / Food Science & Technology (FST)
- Stephen Opiyo – Molecular & Cellular Imaging Center (MCIC) - Columbus
- Jelmer Poelstra – Molecular & Cellular Imaging Center (MCIC) - Wooster
- Mike Sovic – Infectious Diseases Institute AMSL - Genomics Lab
Code Club practicalities
-
In-person (Columbus & Wooster) and Zoom hybrid
-
Mix of instruction/discussion with the entire group, and doing exercises in breakout groups of up to 4-5 people.
-
When doing exercises in breakout groups, we encourage you:
- To briefly introduce yourselves and to do the exercises as a group
- On Zoom, to turn your cameras on and to have someone share their screen (use the
Ask for help
button in Zoom to get help from an organizer) - To let a less experienced person do the screen sharing and coding
-
You can ask a question at any time, by speaking or typing in the Zoom chat.
-
You can generally come early or stay late for troubleshooting but also for questions related to your research.
Some more notes:
-
We recommend that you read the relevant (part of the) chapter before each session, especially if the material in the chapter is new to you.
-
We try to make each session as stand-alone as possible. Still, if you missed one or more sessions, you would ideally catch up on reading those parts of the book, especially when we split a chapter across multiple sessions.
-
We record the whole-group parts of the Zoom call, and share the recordings only with Code Club participants.
New to Code Club or R?
Take a look at these pages on our website:
- Computer setup for Code Club
- Resources and tips to get started with R
- List of all previous Code Club session topics
II – The R for Data Science book (R4DS)
This excellent book by Hadley Wickham (also author of many of the R packages used in the book!) and Garret Grolemund, has a freely available online version that is regularly updated and contains exercises. It was originally published in 2016.
The book focuses on the so-called "tidyverse" ecosystem in R. The tidyverse can be seen as a modern dialect of R. In previous Code Clubs, we have often –but not always!– been doing things “the tidyverse way” as well.
For today’s chapters, The R4DS exercises I think are not so good, so I’ve replaced some and added some of my own.
III – R4DS Chapter 4
In the first two R4DS exercises for this chapter, the message is that R does not handle typos so make sure you spell things correctly.
Exercise 3: take a look at the RStudio keyboard shortcuts by clicking Tools
> Keyboard Shortcut Help
, or you can press Alt + Shift + K on a PC.
IV – R4DS Chapter 6
Run the following code:
glimpse(cars)
What went wrong?
Solution (click here)
glimpse()
is a function from the dplyr package, one of the core tidyverse packages that are loaded as part of the tidyverse.
However, in every R session in which you want to use tidyverse function, you always need call library(tidyverse)
.
Now you can use glimpse()
:
glimpse(cars)#> Rows: 50
#> Columns: 2
#> $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13…
#> $ dist <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34…
Note, if you got an error like this when running library(tidyverse)
:
#> Error in library(tidyverse) : there is no package called ‘tidyverse’
…that means you still need to install it:
install.packages("tidyverse")
library(tidyverse)
IV – R4DS Chapter 8
Create an RStudio Project for Code Club.
Run the code below in your new Project:
library(tidyverse)
ggplot(diamonds, aes(carat, price)) +
geom_hex()
ggsave("diamonds.pdf")
write_csv(diamonds, "diamonds.csv")
-
What does the code above do?
-
Find the files
diamonds.pdf
anddiamonds.csv
on your computer, without using a search function. How did you know where to look for them? -
Where is the R working directory on your computer?
Solution (click here)
-
The code does the following:
- Load the tidyverse package
- Create a simple plot using the tidyverse
diamonds
dataset - Save the plot to disk as a PDF file
- Save the
diamonds
dataframe to disk as a CSV file
-
The files were saved in the same folder as your newly created RStudio project. (See also the next point.)
-
Whenever you have an active RStudio Project, R’s working directory will be in the same folder as your RStudio project.