Session 5: ggplot2, round 2

Getting up close and personal with our data

Jessica Cooperstone

Jan 19, 2021

Artwork by @allison_horst

Prep homework

Basic computer setup

If you didn’t already do this, please follow the Code Club Computer Setup instructions, which also has pointers for if you’re new to R or RStudio.
If you’re able to do so, please open RStudio a bit before Code Club starts – and in case you run into issues, please join the Zoom call early and we’ll troubleshoot.

New to ggplot?

Check out the last Code Club Session 4 on Visualizing Data.

If you’ve never used ggplot2 before (or even if you have), you may find this cheat sheet useful.

Getting Started

Script for today’s session

# directory for Code Club Session 2:
dir.create("S05")

# directory for our script
# ("recursive" to create two levels at once.)
dir.create("S05/scripts/")

# save the url location for today's script
todays_R_script <- 'https://raw.githubusercontent.com/biodash/biodash.github.io/master/content/codeclub/05_ggplot-round-2/Session5_ggplot2.R'

# indicate the name of the new script file
Session5_script <- "S05/scripts/Session5_script.R"

# go get that file! 
download.file(url = todays_R_script,
              destfile = Session5_script)

1 - Why visualize our data?

ggplot2 exploratory cartoon

Artwork by Allison Horst

We make data visualizations for two main reasons:

To explore our data
To share our data with others

Often, we think about figure generation as the last part of the scientic process, something you do as you prepare a manuscript for publication. I hope to convince you that exploring your data, and making exploratory plots is a critical part of the data analysis and interpretation process.

Today we will be using ggplot2 to make a series of plots that help us better understand the underlying structure in our dataset.

When summary statistics don’t cut it

This “Datasaurus Dozen” shows the value of looking at your data beyond means and standard deviations. In the gif above, created by Alberto Cairo, each of these 13 datasets have identical means, standard eviations, and correlations to two decimal places. And one of the datasets is a dinosaur!

What will we go over today

These geoms will help you to get better acquainted with your data.

geom_col() - makes bar plots. I will show you how to do this and then recommend that you don’t.
geom_boxplot() - makes infinitely useful boxplots.
geom_violin() - makes violin plots, a hybrid between a boxplot and a density plot. Very musical.
geom_density_ridges() - a density plot giving you the impression of a side view of a mountain range. Requires the package ggridges
geom_jitter() - adds all datapoints to your plot, and jitters them to handle overplotting.

I will also go over a few tricks along the way, including coord_flip(), adding labels using labs(), and changing the overall look of the plot with theme(), or pre-set themes like theme_classic() which is my go-to.

2 - Accessing our data

Let’s get set up and grab some data so that we can learn more about penguins (and ggplot2)

You can do this locally, or at OSC. You can find instructions if you are having trouble here.

First load your libraries.

library(tidyverse)

Then let’s access the wintry palmerpenguins dataset. We will then look at penguins, the dataset we will be using for the first part of today’s Code Club. This data is collected on penguins from the Palmer Station Antarctica Long-Term Ecological Research study area.

cartoon of the palmer penguins

Artwork by Allison Horst

install.packages("palmerpenguins")

library(palmerpenguins)

Let’s look at the data in penguins.

# look at the first 6 rows, all columns
head(penguins)

#> # A tibble: 6 x 8
#>   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
#>   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
#> 1 Adelie  Torge…           39.1          18.7              181        3750 male 
#> 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
#> 3 Adelie  Torge…           40.3          18                195        3250 fema…
#> 4 Adelie  Torge…           NA            NA                 NA          NA NA   
#> 5 Adelie  Torge…           36.7          19.3              193        3450 fema…
#> 6 Adelie  Torge…           39.3          20.6              190        3650 male 
#> # … with 1 more variable: year <int>


# check the structure
# this tell us what is contained within our df
glimpse(penguins)

#> Rows: 344
#> Columns: 8
#> $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ade…
#> $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgers…
#> $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1,…
#> $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1,…
#> $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 18…
#> $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475,…
#> $ sex               <fct> male, female, female, NA, female, male, female, mal…
#> $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 200…

This dataset contains the following measurements about penguins at Palmer Station in Antarctica:

species
island
bill_length_mm
bill_depth_mm
flipper_length_mm
body_mass_g
sex
year

We are going to be plotting to get an understanding of bill_length_mm which is the length of the bill from the penguins face, protruding outwards (and more easily understood in the image below).

depiction of bill length protruding from the penguins face, and bill depth, the height of the bill parallel to the ground

Artwork by Allison Horst

3 - Removing NAs

Sometimes you will have NAs (or missing data). That might be informative to you, but here we are going to remove missing data using drop_na(), and assign it to a new dataframe called penguins_noNA.

# check dimensions of penguins
dim(penguins)

#> [1] 344   8


# remove NAs
penguins_noNA <- penguins %>%
  drop_na()

dim(penguins_noNA) # we have removed 11 observations

#> [1] 333   8

Note - by removing NAs, we have gotten rid of 11 observations

4 - Bar charts with `geom_col()` and `stat_summary()`

Often, people use bar charts, representing the height or the length of the bar as proportional to the average value that it represents. These charts are sometimes called dynamite plots because they resemble (when they have an error bar with whisker) those cartoon style dynamite sticks. Pow!

However, these bar charts, even if you add a standard deviation/error, really can hide the true distribution of your data, and for this reason, I and others hope you don’t select to make them.

I hope after today, you see that there is always a better chart type to make than a bar chart. But I will show you how to make them anyway.

Before we plot, let’s calculate some summary statistics so we know what we should expect.

# calculating mean bill_length_mm by species
penguins_noNA %>%
  group_by(species) %>%
  summarize(mean_bill_length = mean(bill_length_mm))

#> `summarise()` ungrouping output (override with `.groups` argument)

#> # A tibble: 3 x 2
#>   species   mean_bill_length
#>   <fct>                <dbl>
#> 1 Adelie                38.8
#> 2 Chinstrap             48.8
#> 3 Gentoo                47.6

Just calling geom_col() does not give us what we want. Look at the y-axis scale and how out of line this is with our summary statistics.

# bar plot with geom_col()
# this is wrong!
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm)) +
  geom_col()

Using geom_col() the right way.

# bar plot, the right way with geom_col()
penguins_noNA %>%
  group_by(species) %>%
  summarize(mean_bill_length = mean(bill_length_mm)) %>%
  ggplot(aes(x = species, y = mean_bill_length)) +
  geom_col()

#> `summarise()` ungrouping output (override with `.groups` argument)

# or you could do this in a less bulky way with stat_summary()
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm)) +
  stat_summary(fun = "mean", geom = "bar")

5 - Boxplots with `geom_boxplot()`

A boxplot has the benefit of showing you more than the median and the standard deviation, so you can better see the true distribution of your data. In geom_boxplot():

lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR
lower hinge/bottom line of box part of boxplot = 25% quantile
middle = median, 50% quantile
upper hinge/top line of box part of boxplot = 75% quantile
upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR

# vertical boxplot
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm)) +
  geom_boxplot()

Adding coord_flip() makes your vertical boxplot horizontal. You could do the same thing by flipping the variables on the x and y mappings.

# horizontal boxplot
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm)) +
  geom_boxplot() +
  coord_flip()

Look at how much more information we have here than in our bar plots!

5 - Violin plots with `geom_violin()`

A violin plot is boxplot-esque, but shows a mirrored density distribution. This type of plot is useful when you are trying to particularly show data distribution.

Note here I have also mapped species to color, within the aes statement so it will apply globally to this plot.

# violin plot
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm, color = species)) +
  geom_violin()

Adding geom_point() lets you add another layer of all the actual data points, on top of your violin plot. Remember that this is inherent in the design of ggplot2, that you can layer your plots, of different types, on top of each other.

# violin plot with data points overlaid
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm, fill = species)) +
  geom_violin() +
  geom_point()

Note, I am now mapping species to fill instead of color. See the difference?

This doesn’t look too good because of overplotting, i.e., the smear of datapoints that doesn’t give you much information about distribution.

We can add geom_jitter() to introduce some small amount of randomness to our points to make us able to see them better. Seeing all your data points also lets the reader easily get a sense of your sample size.

# violin plot with data points jittered
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm, fill = species)) +
  geom_violin() +
  geom_jitter()

geom_jitter() is a specialized version of geom_point(), but you could replace the geom_jitter() call with geom_point(position = "jitter) and get the same result. You can also use geom_point(position = position_jitterdodge()) if you only want jitter in the x, and don’t want any jitter in the y.

Wow, we now have so much more information about our data!

6 - Dot plots with `geom_dotplot()`

A dot plot plots each individual datapoint, and can stack how you like. These look a lot like the SigmaPlot plots to me.

binaxis can be set to “x” or “y”
stackdir indicates how to stack the dots: “up” (default), “down”, “center”, “centerwhole” (centered, but with dots aligned)
dotsize indicates the size of the dots, with 1 as default

# dotplot
penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm, fill = species)) +
  geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 0.5)

#> `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

7 - Density ridge plots with `geom_density_ridges()`

A density ridge plots with geom_density_ridges() requires the packages ggridges, and make multiple density plots in a staggered orientation.

You can adjust scale within geom_density_ridges() to adjust the size of each density plot, though I have left it on the default. Adding alpha sets transparency.

# install.packages("ggridges")
library(ggridges)
penguins_noNA %>%
  ggplot(aes(x = bill_length_mm, y = species, fill = species)) +
  geom_density_ridges(alpha = 0.8)

#> Picking joint bandwidth of 1.08

8 - ggplot is made for layering!

I have shown you a bunch of different plot types, and you can combine many of them together. Here is an example of combining geom_violin() and geom_jitter(), while mapping new variables to aesthetics.

penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm, 
             color = sex, 
             shape = island, 
             group = species)) +
  geom_violin() +
  geom_jitter(position = position_jitterdodge(jitter.width = 2))

9 - Increase clarity and visual appeal

We can quickly make our plot:

prettier by setting a theme
more clear by setting plot labels (eg., axes, titles, legend) with labs

penguins_noNA %>%
  ggplot(aes(x = species, y = bill_length_mm, 
             color = sex, 
             shape = island, 
             group = species)) +
  geom_violin() +
  geom_jitter(position = position_jitterdodge(jitter.width = 2),
              alpha = 0.7) +
  theme_classic() +
  labs(title = "Penguin Bill Length by Species, Sex and Location",
       subtitle = "Collected at Palmer Station, Antarctica",
       x = "Penguin Species", # x axis label
       y = "Bill length (mm)", #  y axis label
       color = "Sex", # legend title
       shape = "Island") # legend title

10 - Breakout rooms!

Main exercises

Get data

We are going to use the NHANES dataset we used in Session 3 on joining. What was that data about again? Let’s refresh our memory.

library(NHANES)
knitr::kable(head(NHANES))

ID	SurveyYr	Gender	Age	AgeDecade	AgeMonths	Race1	Race3	Education	MaritalStatus	HHIncome	HHIncomeMid	Poverty	HomeRooms	HomeOwn	Work	Weight	Length	HeadCirc	Height	BMI	BMICatUnder20yrs	BMI_WHO	Pulse	BPSysAve	BPDiaAve	BPSys1	BPDia1	BPSys2	BPDia2	BPSys3	BPDia3	Testosterone	DirectChol	TotChol	UrineVol1	UrineFlow1	UrineVol2	UrineFlow2	Diabetes	DiabetesAge	HealthGen	DaysPhysHlthBad	DaysMentHlthBad	LittleInterest	Depressed	nPregnancies	nBabies	Age1stBaby	SleepHrsNight	SleepTrouble	PhysActive	PhysActiveDays	TVHrsDay	CompHrsDay	TVHrsDayChild	CompHrsDayChild	Alcohol12PlusYr	AlcoholDay	AlcoholYear	SmokeNow	Smoke100	Smoke100n	SmokeAge	Marijuana	AgeFirstMarij	RegularMarij	AgeRegMarij	HardDrugs	SexEver	SexAge	SexNumPartnLife	SexNumPartYear	SameSex	SexOrientation	PregnantNow
51624	2009_10	male	34	30-39	409	White	NA	High School	Married	25000-34999	30000	1.36	6	Own	NotWorking	87.4	NA	NA	164.7	32.22	NA	30.0_plus	70	113	85	114	88	114	88	112	82	NA	1.29	3.49	352	NA	NA	NA	No	NA	Good	0	15	Most	Several	NA	NA	NA	4	Yes	No	NA	NA	NA	NA	NA	Yes	NA	0	No	Yes	Smoker	18	Yes	17	No	NA	Yes	Yes	16	8	1	No	Heterosexual	NA
51624	2009_10	male	34	30-39	409	White	NA	High School	Married	25000-34999	30000	1.36	6	Own	NotWorking	87.4	NA	NA	164.7	32.22	NA	30.0_plus	70	113	85	114	88	114	88	112	82	NA	1.29	3.49	352	NA	NA	NA	No	NA	Good	0	15	Most	Several	NA	NA	NA	4	Yes	No	NA	NA	NA	NA	NA	Yes	NA	0	No	Yes	Smoker	18	Yes	17	No	NA	Yes	Yes	16	8	1	No	Heterosexual	NA
51624	2009_10	male	34	30-39	409	White	NA	High School	Married	25000-34999	30000	1.36	6	Own	NotWorking	87.4	NA	NA	164.7	32.22	NA	30.0_plus	70	113	85	114	88	114	88	112	82	NA	1.29	3.49	352	NA	NA	NA	No	NA	Good	0	15	Most	Several	NA	NA	NA	4	Yes	No	NA	NA	NA	NA	NA	Yes	NA	0	No	Yes	Smoker	18	Yes	17	No	NA	Yes	Yes	16	8	1	No	Heterosexual	NA
51625	2009_10	male	4	0-9	49	Other	NA	NA	NA	20000-24999	22500	1.07	9	Own	NA	17.0	NA	NA	105.4	15.30	NA	12.0_18.5	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	No	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	4	1	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
51630	2009_10	female	49	40-49	596	White	NA	Some College	LivePartner	35000-44999	40000	1.91	5	Rent	NotWorking	86.7	NA	NA	168.4	30.57	NA	30.0_plus	86	112	75	118	82	108	74	116	76	NA	1.16	6.70	77	0.094	NA	NA	No	NA	Good	0	10	Several	Several	2	2	27	8	Yes	No	NA	NA	NA	NA	NA	Yes	2	20	Yes	Yes	Smoker	38	Yes	18	No	NA	Yes	Yes	12	10	1	Yes	Heterosexual	NA
51638	2009_10	male	9	0-9	115	White	NA	NA	NA	75000-99999	87500	1.84	6	Rent	NA	29.8	NA	NA	133.1	16.82	NA	12.0_18.5	82	86	47	84	50	84	50	88	44	NA	1.34	4.86	123	1.538	NA	NA	No	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

# kable just formats as a scrollable table for this website
# you can just use head(NHANES) or glimpse(NHANES)

Exercise 1

Create a new data frame includes the NHANES data only from individuals that are 20 years of age or older, and removes observations where there are NAs for either age subdivided by decade (AgeDecade) or total cholesterol (TotChol).

Hints (click here)

Try using a series of filter() statements. Remember, you can tell filter what you want, or what you don’t want. You can filter for if specific variables have NAs by using is.na() on your variable of interest. Also remember that ! means “not.” You will notice that if you want to use drop_NA() you need to specific which specific variables you want to use, or you will inadvertably drop a lot of observations which have missing data for variables other than those we are plotting..

Solutions (click here)

# here are a few ways to do this
NHANES_over20_noNA <- NHANES %>%
  filter(Age >20) %>%
  drop_na(AgeDecade, TotChol)

dim(NHANES_over20_noNA)

#> [1] 6408   76


NHANES_over20_noNA <- NHANES %>%
  filter(Age >20,
         !is.na(AgeDecade),
         !is.na(TotChol))

dim(NHANES_over20_noNA)

#> [1] 6408   76

Exercise 2

Create a boxplot to show the relationship between total cholesterol (TotChol) and age (AgeDecade).

Hints (click here)

Try geom_boxplot(). Map your variables of interest to the x and y aesthetics. Which you variable you put on x and y will determine if your boxplot is vertical or horizontal.

Solutions (click here)

NHANES_over20_noNA %>%
  ggplot(aes(x = AgeDecade, y = TotChol)) +
  geom_boxplot()

Exercise 3

Take your plot from Exercise 2 and make it a violin plot instead of a boxplot. Then color by age.

Hints (click here)

The geom for a violin plot is geom_violin(). You can change color by mapping to color or to fill.

Solutions (click here)

Note the difference between mapping to color vs. fill.

NHANES_over20_noNA %>%
  ggplot(aes(x = AgeDecade, y = TotChol, color = AgeDecade)) +
  geom_violin()

NHANES_over20_noNA %>%
  ggplot(aes(x = AgeDecade, y = TotChol, fill = AgeDecade)) +
  geom_violin()

Exercise 4

Make add a boxplot to your violin plot from Exercise 3. Adjust the parameters so you the plot looks good to you.

Hints (click here)

In geom_boxplot(), you can adjust the width of the boxplot by setting width = X. A width of 1 is the default.

Solutions (click here)

NHANES_over20_noNA %>%
  ggplot(aes(x = AgeDecade, y = TotChol, color = AgeDecade)) +
  geom_violin() +
  geom_boxplot(width = 0.2)

Exercise 5

Add all of the data points on top of your boxplot from Exercise 2 of total cholesterol by age. Adjust the parameters so you the plot looks good to you. While you are at it, clean up your plot labels and give your plot a title.

Hints (click here)

Remember that ggplot layers your plots, so layers that are further down in your code, will be applied on top of those that come earlier.

Solutions (click here)

geom_boxplot(outlier.shape = NA) removes the outliers from geom_boxplot(), since we are plotting all of the points, we do not want the outliers appearing twice.

NHANES_over20_noNA %>%
  ggplot(aes(x = AgeDecade, y = TotChol, color = AgeDecade)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_jitter(width = 0.3, alpha = 0.1) +
  labs(title = "Total Cholesterol by Age",
       subtitle = "Data from the National Health and Nutrition Examination Survey (NHANES)",
       x = "Age, by Decade",
       y = "Total Cholesterol, mmol/L",
       color = "Age (years)")

Bonus exercises

Bonus 1

Make a density ridge plot for age by total cholesterol.

Hints (click here)

Try geom_density_ridges(), and remember, this is not a part of ggplot2, so be sure to call library(ggridges).

Solutions (click here)

# install.packages("ggridges")
library(ggridges)
NHANES_over20_noNA %>%
  ggplot(aes(x = TotChol, y = AgeDecade, fill = AgeDecade)) +
  geom_density_ridges(alpha = 0.7) 

#> Picking joint bandwidth of 0.224

Bonus 2

Take your density ridge plot from Bonus 1, and try applying a theme from hrbrthemes to it.

Hints (click here)

hrbrthemes is not part of ggplot2 so remember to install the package, and then call library(hrbrthemes). You can google the package to see what all your theme options are. I like theme_ipsum_rc(), try that one if you like!

Solutions (click here)

# install.packages("hrbrthemes")
library(hrbrthemes)

#> NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.

#>       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and

#>       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow

NHANES_over20_noNA %>%
  ggplot(aes(x = TotChol, y = AgeDecade, fill = AgeDecade)) +
  geom_density_ridges(alpha = 0.7, scale = 0.9) +
  theme_ipsum_rc() 

#> Picking joint bandwidth of 0.224

Bonus 3

Tidy up your plot from Bonus 2 by giving it a title, axis labels, and try adding the median total cholesterol to each density ridge plot.

Hints (click here)

Using stat_summary() will help you add the median.

Solutions (click here)

theme(axis.title.x = element_text(hjust = 0.5)) makes the x-axis title center justified.
you can change shape within stat_summary() to be anything you like, either an R shape, a specific keyboard key, or even a pasted emoji. The default is a point.
when you set a theme(), anything that comes below will override what code comes previous, so for this reason, if you are going to amend a pre-made theme, first call the pre-made theme, and then make any changes you like below.

NHANES_over20_noNA %>%
  ggplot(aes(x = TotChol, y = AgeDecade, fill = AgeDecade)) +
  geom_density_ridges(alpha = 0.7, scale = 0.9) +
  stat_summary(fun = median) +
  theme_ipsum_rc() +
  theme(axis.title.x = element_text(hjust = 0.5),
        axis.title.y = element_text(hjust = 0.5)) +
  labs(title = "Total Cholesterol by Age",
       subtitle = "Data from the National Health and Nutrition Examination Survey (NHANES)",
       x = "Total Cholesterol, mmol/L",
       y = "Age, by Decade",
       fill = "Age (years)")

#> Picking joint bandwidth of 0.224

#> Warning: Removed 6 rows containing missing values (geom_segment).

Bonus 4

Commonly used cutoffs for cholesterol are: < 5.2 mmol/L is normal, 5.2-6.2 mmol/L is borderline high and > 6.2 mmol is high. Add a vertical cutoff line showing the level below which cholesterol would be considered normal.

Hints (click here)

Using geom_vline() will let you add a vertical line with an xintercept that is appropriate.

Solutions (click here)

NHANES_over20_noNA %>%
  ggplot(aes(x = TotChol, y = AgeDecade, fill = AgeDecade)) +
  geom_density_ridges(alpha = 0.7, scale = 0.9) +
  stat_summary(fun = median) +
  geom_vline(aes(xintercept = 5.2)) +
  theme_ipsum_rc() +
  theme(axis.title.x = element_text(hjust = 0.5),
        axis.title.y = element_text(hjust = 0.5)) +
  labs(title = "Total Cholesterol by Age",
       subtitle = "Data from the National Health and Nutrition Examination Survey (NHANES)",
       caption = "Vertical line indicates upper limit of normal cholesterol",
       x = "Total Cholesterol, mmol/L",
       y = "Age, by Decade",
       fill = "Age (years)")

#> Picking joint bandwidth of 0.224

#> Warning: Removed 6 rows containing missing values (geom_segment).

Jessica Cooperstone

Assistant Professor at HCS

← Session 4: Visualizing Data Session 6: Factors →

Session 5: ggplot2, round 2

Prep homework

Basic computer setup

New to ggplot?

Getting Started

Script for today’s session

1 - Why visualize our data?

2 - Accessing our data

3 - Removing NAs

4 - Bar charts with geom_col() and stat_summary()

5 - Boxplots with geom_boxplot()

5 - Violin plots with geom_violin()

6 - Dot plots with geom_dotplot()

7 - Density ridge plots with geom_density_ridges()

8 - ggplot is made for layering!

9 - Increase clarity and visual appeal

10 - Breakout rooms!

Main exercises

Get data

Exercise 1

Exercise 2

Exercise 3

Exercise 4

Exercise 5

Bonus exercises

Bonus 1

Bonus 2

Bonus 3

Bonus 4

Jessica Cooperstone

Assistant Professor at HCS

4 - Bar charts with `geom_col()` and `stat_summary()`

5 - Boxplots with `geom_boxplot()`

5 - Violin plots with `geom_violin()`

6 - Dot plots with `geom_dotplot()`

7 - Density ridge plots with `geom_density_ridges()`