S02E12: Plotly

Making our plots interactive


Prep homework

Basic computer setup

  • If you didn’t already do this, please follow the Code Club Computer Setup instructions, which also has pointers for if you’re new to R or RStudio.

  • If you’re able to do so, please open RStudio a bit before Code Club starts – and in case you run into issues, please join the Zoom call early and we’ll help you troubleshoot.

New to ggplot?

Check out the PAST Code Club sessions covering ggplot2:

  • S01E04: intro to ggplot2
  • S01E05: intro to ggplot2 round 2
  • S01E10: faceting and animating
  • S02E06: another intro to ggplot2
  • S02E07: a second intro to ggplot2 round 2
  • S02E08: combining plots using faceting
  • S02E09: combining plots using faceting and patchwork
  • S02E10: adding statistics to plots

If you’ve never used ggplot2 before (or even if you have), you may find this cheat sheet useful.


Getting Started

RMarkdown for today’s session

# directory for Code Club Session 15:
dir.create("S02E12")

# directory for our RMarkdown
# ("recursive" to create two levels at once.)
dir.create("S02E12/Rmd/")

# save the url location for today's script
todays_Rmd <- 
  "https://raw.githubusercontent.com/biodash/biodash.github.io/master/content/codeclub/S02E12_plotly/plotly.Rmd"

# indicate the name of the new Rmd
S02E12_Rmd <- "S02E12/Rmd/S02E12_plotly.Rmd"

# go get that file! 
download.file(url = todays_Rmd,
              destfile = S02E12_Rmd)


1 - What is plotly?

Today we are going to talk about making interactive plots using Plotly. Plotly exists in a variety of programming languages, but today we will be just talking about using it in R. All of the plotly documentation can be found here.

If you have never used plotly before, install it with the code below.

install.packages("plotly")

Here are some useful links to find info about using ggplotly.

Before we start, there are two basic ways to use plot in R using plotly:

  • Using ggplotly() - this is what we will go over today because it has the same syntax as ggplot() which we have already learned
  • Using plot_ly() - there is slightly more functionality in this function, but the syntax is all new, so I’d suggest if you can do what you want with ggplotly(), do that. The syntax is not particularly hard so don’t be scared to use it if interactive plots are something you’re very interested in.

When you are googling about using plotly, you will find a combination of ggplotly() and plot_ly() approaches, and some parts of the code are interchangable. The easiesy way to see which parts are, is to try.

Also note, Google gets a bit confused when googling “ggplotly” and often returns information about just ggplot, so read extra carefully when problem solving.

This is an example of work from my group where we have found plotly to be particularly useful.

Data from Bilbrey et al., New Phytologist 2021



2 - Load libraries, get data

Lets load the libraries we are using for today.

library(tidyverse)
library(plotly) # for making interactive plots
library(glue) # for easy pasting
library(htmlwidgets) # for saving html files

We are going to continue to use the pumpkins data we downloaded last week when we were learning about Shiny.

pumpkins <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-19/pumpkins.csv')
#> Rows: 28065 Columns: 14
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (14): id, place, weight_lbs, grower_name, city, state_prov, country, gpc...
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.

We will start with the wrangling that Matt shared with us last week, and then go from there.

pumpkins <- pumpkins %>%
  # separate the year column
  separate(col = id, into = c("year", "vegetable"), sep = "-") %>%
  # find and tag the rows that do not have data
  mutate(delete = str_detect(place, "\\d*\\s*Entries")) %>%
  # filter out the rows that do not have data
  filter(delete==FALSE) %>%
  # remove the tagging column
  select(-delete)

# rename the vegetables to their actual names
pumpkins$vegetable <- pumpkins$vegetable %>%
  str_replace("^F$", "Field Pumpkin") %>%
  str_replace("^P$", "Giant Pumpkin") %>%
  str_replace("^S$", "Giant Squash") %>%
  str_replace("^W$", "Giant Watermelon") %>%
  str_replace("^L$", "Long Gourd") %>%
  str_replace("^T$", "Tomato")

# get rid of commas in the weight_lbs column
pumpkins$weight_lbs <- as.numeric(gsub(",","",pumpkins$weight_lbs))

Lets look at our data structure.

glimpse(pumpkins)
#> Rows: 28,011
#> Columns: 15
#> $ year              <chr> "2013", "2013", "2013", "2013", "2013", "2013", "201…
#> $ vegetable         <chr> "Field Pumpkin", "Field Pumpkin", "Field Pumpkin", "…
#> $ place             <chr> "1", "2", "3", "4", "5", "5", "7", "8", "9", "10", "…
#> $ weight_lbs        <dbl> 154.5, 146.5, 145.0, 140.8, 139.0, 139.0, 136.5, 136…
#> $ grower_name       <chr> "Ellenbecker, Todd & Sequoia", "Razo, Steve", "Ellen…
#> $ city              <chr> "Gleason", "New Middletown", "Glenson", "Combined Lo…
#> $ state_prov        <chr> "Wisconsin", "Ohio", "Wisconsin", "Wisconsin", "Wisc…
#> $ country           <chr> "United States", "United States", "United States", "…
#> $ gpc_site          <chr> "Nekoosa Giant Pumpkin Fest", "Ohio Valley Giant Pum…
#> $ seed_mother       <chr> "209 Werner", "150.5 Snyder", "209 Werner", "109 Mar…
#> $ pollinator_father <chr> "Self", NA, "103 Mackinnon", "209 Werner '12", "open…
#> $ ott               <chr> "184.0", "194.0", "177.0", "194.0", "0.0", "190.0", …
#> $ est_weight        <chr> "129.00", "151.00", "115.00", "151.00", "0.00", "141…
#> $ pct_chart         <chr> "20.0", "-3.0", "26.0", "-7.0", "0.0", "-1.0", "-4.0…
#> $ variety           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

Note that all of the columns have the class “character” except weight_lbs which is numeric. We could just fix this now, but I’m going to show you an alternative way to handle this in a minute.



3 - Create base ggplot object

Using the pumpkins dataset lets work towards creating a plot that shows the distribution of weights of tomatoes by country. I will show you here how you can use dplyr functions within your ggplot2 call.

pumpkins %>%
  filter(vegetable == "Tomato") %>%
  ggplot(aes(x = country, 
             y = weight_lbs, 
             color = country)) +
  geom_jitter()

We have a plot, its not horrible but it has a number of issues.

  1. The country names are getting cut off because some are too long, and there are enough of them that we are having overlapping.
  2. We have an overplotting problem
  3. The x-axis countries are ordered alphabetically. We will order our axis based on something more meaningful, like a characteristic of our data (more about this later).
  4. The aesthetics need some adjustment for a more beautiful plot

We will work on making our plot a bit better, and then we will make it interactive, such that you can hover your mouse over each datapoint, and learn more about that datapoint than what is directly visualized in the plot.



4 - Optimize our base plot

1. Prevent country name overlap

We can do this using by using guide_axis() within a scale function, here, scale_x_discrete(). To learn more about ggplot scales, click here.

pumpkins %>%
  filter(vegetable == "Tomato") %>%
  ggplot(aes(x = country, 
             y = weight_lbs, 
             color = country)) +
  geom_jitter() +
  scale_x_discrete(guide = guide_axis(n.dodge = 2)) # dodge every other name

Wow that was easy. We still have some overlapping, though we have a big figure legend that is in this case, not necessary. Lets remove it.

pumpkins %>%
  filter(vegetable == "Tomato") %>%
  ggplot(aes(x = country, 
             y = weight_lbs, 
             color = country)) +
  geom_jitter() +
  scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
  theme(legend.position = "none")

This is not the only way to fix the plot to avoid label overlap. Instead, you could put the x-axis country labels on an angle by using theme(axis.text.x = element_text(angle = 45)).

2. Reduce overplotting

For the countries that have a lot of tomato entries, its hard to see some individual data points because there are just so many of them. We can add some transparency to the datapoints such that its easier to see them. I am also playing around with color, fill, and point shape so you can see what changing those values does to the plot.

pumpkins %>%
  filter(vegetable == "Tomato") %>%
  ggplot(aes(x = country, 
             y = weight_lbs, 
             fill = country)) +
  geom_jitter(alpha = 0.5, color = "black", shape = 21) +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45)) 

We still have overplotting but I think this is an improvement.

3. Reorder x-axis to something meaningful

Our x-axis is currently ordered alphabetically. This is really a meaningless ordering - instead lets order our data by some characteristic of the data that we want to communicate to our viewer. For example, we could order by increasing mean tomato weight. This would tell us, just by looking at the order of the x-axis, which country has on average, the biggest tomatoes. This is something that is hard to see with the data in its current form.

Remember before we saw that each of the columns except for weight_lbs was of the class “character.” To allow reordering, we need to change country to be a factor. We can do this directly in the pumpkins dataframe, or we can do it within the ggplot call using the pipe %>%.

We will use fct_reorder() to do this, where we provide the the column we want to reorder (here, country), and what we want to reorder based on (here, weight_lbs), and what function to use for the reordering (here, .fun = mean).

pumpkins %>%
  filter(vegetable == "Tomato") %>%
  mutate(country = as.factor(country)) %>%
  ggplot(aes(x = fct_reorder(country, weight_lbs, .fun = mean), 
             y = weight_lbs, 
             fill = country)) +
  geom_jitter(alpha = 0.5, color = "black", shape = 21) +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1)) 

Now we can see easily that Switzerland has the heaviest tomatoes on average entered into this competition.

4. Pretty it up

Let’s fix up the aesthetics of the plot, and adjust the axis labels, and add a title. Note, in the title, adding \n into your title inserts a line break.

tomato_plot <- pumpkins %>%
  filter(vegetable == "Tomato") %>%
  mutate(country = as.factor(country)) %>%
  ggplot(aes(x = fct_reorder(country, weight_lbs, .fun = mean), 
             y = weight_lbs, 
             fill = country)) +
  geom_jitter(alpha = 0.5, color = "black", shape = 21) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs (x = "Country",
        y = "Weight (in lbs)",
        title = "Weights of Tomatoes by Country Entered \nin the Great Pumpkin Commonwealth Competition")

tomato_plot

5 - Make it interactive with ggplotly()

You can learn more about the ggplotly() function, including its arguments here.

ggplotly(tomato_plot)

Wow that was easy! Note that when you hover over a data point you see the information mapped in your aes() statement – this is the default. We will go over ways to change this.



6 - Using tooltip

Using tooltip helps you to indicate what appears when you hover over different parts of your plot. You can learn more about controlling tooltip here.

What if we want to hover over each point and be able to tell who grew that tomato?

To do this, we indicate what we want to hover with using text = in our aesthetic mappings. Then, we indicate tooltip = "text" to tell ggplotly() what we want to hover.

tomato_plot <- pumpkins %>%
  filter(vegetable == "Tomato") %>%
  mutate(country = as.factor(country)) %>%
  ggplot(aes(x = fct_reorder(country, weight_lbs, .fun = mean), 
             y = weight_lbs, 
             fill = country,
             text = grower_name)) +
  geom_jitter(alpha = 0.5, color = "black", shape = 21) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45)) +
  labs(x = "Country",
       y = "Weight (in lbs)",
       title = "Weights of Tomatoes by Country Entered \nin the Great Pumpkin Commonwealth Competition")
ggplotly(tomato_plot,
         tooltip = "text")

You can play around a lot with tooltip to get it to be exactly how you want, and you can include multiple things in your hover text.

You can add multiple items to text, and also use the function glue() which allows more intuitive pasting to get your hover text to in your preferred format.

tomato_plot <- pumpkins %>%
  filter(vegetable == "Tomato") %>%
  mutate(country = as.factor(country)) %>%
  ggplot(aes(x = fct_reorder(country, weight_lbs, .fun = mean), 
             y = weight_lbs, 
             fill = country,
             text = glue("Grown by {grower_name}
                         From {city}, {state_prov}"))) +
  geom_jitter(alpha = 0.5, color = "black", shape = 21) +
  theme_classic() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45)) +
  labs(x = "Country",
       y = "Weight (in lbs)",
       title = "Weights of Tomatoes by Country Entered \nin the Great Pumpkin Commonwealth Competition")
ggplotly(tomato_plot,
         tooltip = "text")



7 - Hover label aesthetics

You might not like the default hover text aesthetics, and can change them! You can do this using style and layout and adding these functions using the pipe %>%.

# setting fonts for the plot
font <- list(
  family = "Calibri",
  size = 15,
  color = "white")

# setting hover label specs
label <- list(
  bgcolor = "#FF0000",
  bordercolor = "transparent",
  font = font) # we can do this bc we already set font

# amending our ggplotly call to include new fonts and hover label specs
ggplotly(tomato_plot, tooltip = "text") %>%
  style(hoverlabel = label) %>%
  layout(font = font)



8 - Saving your plots

Now that you’ve made a beautiful interactive plot, you probably want to save it.

Assign the plot you want to save to an object, and use the function saveWidget() to save it. You can find the documentation here.

# assign ggplotly plot to an object
ggplotly_to_save <- ggplotly(tomato_plot,
                             tooltip = "text") %>%
                      style(hoverlabel = label) %>%
                      layout(font = font)

# save
saveWidget(widget = ggplotly_to_save,
           file = "ggplotlying.html")


Breakout rooms

We are going to use the palmerpenguins dataset called penguins.

library(palmerpenguins)

head(penguins)
#> # A tibble: 6 × 8
#>   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
#>   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
#> 1 Adelie  Torge…           39.1          18.7              181        3750 male 
#> 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
#> 3 Adelie  Torge…           40.3          18                195        3250 fema…
#> 4 Adelie  Torge…           NA            NA                 NA          NA NA   
#> 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
#> 6 Adelie  Torge…           39.3          20.6              190        3650 male 
#> # … with 1 more variable: year <int>

Exercise 1

Using the penguins dataset and make a base scatter plot with bill length on the y, and bill depth on the x. Remove any observations with missing data.

Hints (click here) You can use `drop_na()` to remove NAs. The helper `any_of()` is useful for removing NAs only from certain variables. You can also just remove any NAs, it doesn't really matter here.

Solutions (click here)
bill_depth_length <- penguins %>%
  drop_na(any_of(c("bill_depth_mm", "bill_length_mm"))) %>%
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point()

bill_depth_length



Exercise 2

Add appropriate x and y-axis labels, and a title to your plot.

Hints (click here) You can add labels for x, y, and a title using `labs().`

Solutions (click here)
bill_depth_length <- penguins %>%
  drop_na(any_of(c("bill_depth_mm", "bill_length_mm"))) %>%
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point() +
  labs(x = "Culmen Depth (mm)",
       y = "Culmen Length (mm)",
       title = "Exploration of penguin bill length and depth relationships")

bill_depth_length



Exercise 3

Make your plot interactive such that when you hover over a point, it tell you what island the penguin is from.

Hints (click here) Specify what you want your "tooltip" to be by using `text` within your `aes()` statement.

Solutions (click here)
bill_depth_length <- penguins %>%
  drop_na(any_of(c("bill_depth_mm", "bill_length_mm"))) %>%
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, text = island)) +
  geom_point() +
  labs(x = "Culmen Depth (mm)",
       y = "Culmen Length (mm)",
       title = "Exploration of penguin bill length and depth relationships")

ggplotly(bill_depth_length,
         tooltip = "text")



Exercise 4

Add the sex of the penguin to the hover text, change the hover text so that the background color is red, and make all the fonts for the plot something other than the default.

Hints (click here) You can set fonts either within your `ggplot()` call, or setting `font` within [`layout()`](https://rdrr.io/pkg/plotly/man/layout.html). You can customize the hover label with [`style()`](https://rdrr.io/pkg/plotly/man/style.html). Use [`glue()`](https://glue.tidyverse.org/reference/glue.html) to paste in some information that helps your reader know what your hover text is referring to.

Solutions (click here)
# setting fonts for the plot
penguins_font <- list(
  family = "Proxima Nova", # this is the official OSU font
  size = 15,
  color = "white")

# setting hover label specs
penguins_label <- list(
  bgcolor = "blue",
  bordercolor = "transparent",
  font = penguins_font) # we can do this bc we already set font

bill_depth_length <- penguins %>%
  drop_na(any_of(c("bill_depth_mm", "bill_length_mm"))) %>%
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, 
             text = glue("Island: {island}
                         Sex: {sex}"))) +
  geom_point() +
  labs(x = "Culmen Depth (mm)",
       y = "Culmen Length (mm)",
       title = "Exploration of penguin bill length and depth relationships")

# amending our ggplotly call to include new fonts and hover label specs
ggplotly(bill_depth_length, tooltip = "text") %>%
  style(hoverlabel = penguins_label) %>%
  layout(font = penguins_font)





Jessica Cooperstone
Jessica Cooperstone
Assistant Professor at HCS