S04E08: R for Data Science - Chapters 4, 6, and 8

Covering some R Basics, how to work with R scripts, and introducing RStudio Projects


I – Brief Code Club Introduction

Organizers

  • Michael Broe – Evolution, Ecology and Organismal Biology (EEOB)
  • Jessica Cooperstone – Horticulture & Crop Science (HCS) / Food Science & Technology (FST)
  • Stephen Opiyo – Molecular & Cellular Imaging Center (MCIC) - Columbus
  • Jelmer Poelstra – Molecular & Cellular Imaging Center (MCIC) - Wooster
  • Mike Sovic – Infectious Diseases Institute AMSL - Genomics Lab

Code Club practicalities

  • In-person (Columbus & Wooster) and Zoom hybrid

  • Mix of instruction/discussion with the entire group, and doing exercises in breakout groups of up to 4-5 people.

  • When doing exercises in breakout groups, we encourage you:

    • To briefly introduce yourselves and to do the exercises as a group
    • On Zoom, to turn your cameras on and to have someone share their screen (use the Ask for help button in Zoom to get help from an organizer)
    • To let a less experienced person do the screen sharing and coding
  • You can ask a question at any time, by speaking or typing in the Zoom chat.

  • You can generally come early or stay late for troubleshooting but also for questions related to your research.

Some more notes:

  • We recommend that you read the relevant (part of the) chapter before each session, especially if the material in the chapter is new to you.

  • We try to make each session as stand-alone as possible. Still, if you missed one or more sessions, you would ideally catch up on reading those parts of the book, especially when we split a chapter across multiple sessions.

  • We record the whole-group parts of the Zoom call, and share the recordings only with Code Club participants.

New to Code Club or R?

Take a look at these pages on our website:



II – The R for Data Science book (R4DS)

This excellent book by Hadley Wickham (also author of many of the R packages used in the book!) and Garret Grolemund, has a freely available online version that is regularly updated and contains exercises. It was originally published in 2016.

The book focuses on the so-called "tidyverse" ecosystem in R. The tidyverse can be seen as a modern dialect of R. In previous Code Clubs, we have often –but not always!– been doing things “the tidyverse way” as well.

For today’s chapters, The R4DS exercises I think are not so good, so I’ve replaced some and added some of my own.



III – R4DS Chapter 4

In the first two R4DS exercises for this chapter, the message is that R does not handle typos so make sure you spell things correctly.

Exercise 3: take a look at the RStudio keyboard shortcuts by clicking Tools > Keyboard Shortcut Help, or you can press Alt + Shift + K on a PC.



IV – R4DS Chapter 6

Run the following code:

glimpse(cars)

What went wrong?

Solution (click here)

glimpse() is a function from the dplyr package, one of the core tidyverse packages that are loaded as part of the tidyverse.

However, in every R session in which you want to use tidyverse function, you always need call library(tidyverse).

Now you can use glimpse():

glimpse(cars)#> Rows: 50
#> Columns: 2
#> $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13…
#> $ dist  <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34…

Note, if you got an error like this when running library(tidyverse):

#> Error in library(tidyverse) : there is no package called ‘tidyverse’

…that means you still need to install it:



IV – R4DS Chapter 8

Create an RStudio Project for Code Club.

Run the code below in your new Project:

library(tidyverse)

ggplot(diamonds, aes(carat, price)) + 
  geom_hex()

ggsave("diamonds.pdf")

write_csv(diamonds, "diamonds.csv")
  • What does the code above do?

  • Find the files diamonds.pdf and diamonds.csv on your computer, without using a search function. How did you know where to look for them?

  • Where is the R working directory on your computer?

Solution (click here)
  • The code does the following:

    • Load the tidyverse package
    • Create a simple plot using the tidyverse diamonds dataset
    • Save the plot to disk as a PDF file
    • Save the diamonds dataframe to disk as a CSV file
  • The files were saved in the same folder as your newly created RStudio project. (See also the next point.)

  • Whenever you have an active RStudio Project, R’s working directory will be in the same folder as your RStudio project.




Jessica Cooperstone
Jessica Cooperstone
Assistant Professor at HCS