Building a plot
step-by-step with ggplot2



Data visualization and transformation

Data Science with R

ggplot2

ggplot2 \(\in\) tidyverse

  • ggplot2 is tidyverse’s data visualization package

    library(tidyverse)
  • gg in ggplot2 stands for Grammar of Graphics, inspired by the book Grammar of Graphics by Leland Wilkinson

  • Package website: ggplot2.tidyverse.org

  • Structure of the code for plots can be summarized as

    ggplot(
      data = [dataset], 
      mapping = aes(x = [x-variable], y = [y-variable])
      ) +
      geom_xxx() +
      other options

Mass vs. height

ggplot(starwars, aes(x = height, y = mass)) +
  geom_point() +
  labs(
    title = "Mass vs. height of Starwars characters",
    x = "Height (cm)", 
    y = "Weight (kg)"
  )
Warning: Removed 28 rows containing missing values or values outside
the scale range (`geom_point()`).

ggplot2 ❤️ 🐧

Data: Palmer Penguins

Measurements for penguin species, island in Palmer Archipelago, size (flipper length, body mass, bill dimensions), and sex.

Data

library(palmerpenguins)
glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, …
$ island            <fct> Torgersen, Torgersen, Torgersen,…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3…
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6…
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650…
$ sex               <fct> male, female, female, NA, female…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 20…

Plot

Code

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm, y = bill_length_mm,
    color = species
  )
) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Bill depth (mm)", 
    y = "Bill length (mm)",
    color = "Species",
    caption = "Source: Palmer Station LTER / palmerpenguins package"
  )
Warning: Removed 2 rows containing missing values or values outside
the scale range (`geom_point()`).

Coding out loud

Step 1

Start with the penguins data frame

ggplot(
  data = penguins
  )

Step 2

Start with the penguins data frame, map bill depth to the x-axis

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm)
  )

Step 3

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis.

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
    )
  )

Step 4

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
    )
  ) +
  geom_point()

Step 5

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point.

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm,
    color = species
    )
  ) +
  geom_point()

Step 6

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length"
  )

Step 7

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins"
  )

Step 8

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Bill depth (mm)", 
    y = "Bill length (mm)"
  )

Step 9

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Bill depth (mm)", y = "Bill length (mm)",
    color = "Species"
  )

Step 10

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”, and add a caption for the data source.

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Bill depth (mm)", y = "Bill length (mm)",
    color = "Species",
    caption = "Source: Palmer Station LTER / palmerpenguins package"
  )

Step 11

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis. Represent each observation with a point and map species to the color of each point. Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”, and add a caption for the data source. Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Bill depth (mm)", y = "Bill length (mm)",
    color = "Species",
    caption = "Source: Palmer Station LTER / palmerpenguins package"
  ) +
  scale_color_viridis_d()

Plot

Warning: Removed 2 rows containing missing values or values outside
the scale range (`geom_point()`).

Code

ggplot(
  data = penguins,
  mapping = aes(x = bill_depth_mm, y = bill_length_mm, color = species)
  ) +
  geom_point() +
  labs(
    title = "Bill depth and length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Bill depth (mm)", y = "Bill length (mm)",
    color = "Species",
    caption = "Source: Palmer Station LTER / palmerpenguins package"
  ) +
  scale_color_viridis_d()

Narrative

Start with the penguins data frame, map bill depth to the x-axis and map bill length to the y-axis.

Represent each observation with a point and map species to the color of each point.

Title the plot “Bill depth and length”, add the subtitle “Dimensions for Adelie, Chinstrap, and Gentoo Penguins”, label the x and y axes as “Bill depth (mm)” and “Bill length (mm)”, respectively, label the legend “Species”, and add a caption for the data source.

Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.

Argument names

You can omit the names of first two arguments when building plots with ggplot().

ggplot(
  data = penguins,
  mapping = aes(
    x = bill_depth_mm,
    y = bill_length_mm
    )
  ) +
  geom_point()
ggplot(
  penguins,
  aes(
    x = bill_depth_mm,
    y = bill_length_mm
    )
  ) +
  geom_point()