Diving deeper with Palmer Penguins
Data visualization and transformation
Data Science with R
Introduction
How do sizes of penguins vary across species, islands, and sexes? What about other characteristics?
Packages
We will use the tidyverse and ggbeeswarm packages for data wrangling and visualization and the palmerpenguins package for the data.
Data
The dataset we will visualize is called penguins. Let’s glimpse() at it.
# add code hereVisualizing penguin weights
Single variable
Analyzing the a single variable is called univariate analysis.
Create visualizations of the distribution of weights of penguins.
Exercise 1
Make a histogram. Set an appropriate binwidth.
# add code hereExercise 2
Make a boxplot.
# add code hereExercise 3
Based on these, determine if each of the following statements about the shape of the distribution is true or false.
- The distribution of penguin weights in this sample is left skewed.
- The distribution of penguin weights in this sample is unimodal.
Two variables
Analyzing the relationship between two variables is called bivariate analysis.
Create visualizations of the distribution of weights of penguins by species.
Exercise 4
Make a single histogram. Set an appropriate binwidth.
# add code hereExercise 5
Use multiple histograms via faceting, one for each species. Set an appropriate binwidth, add color as you see fit, and turn off legends if not needed.
# add code hereExercise 6
Use side-by-side box plots. Add color as you see fit and turn off legends if not needed.
# add code hereExercise 7
Use density plots. Add color as you see fit.
# add code hereExercise 8
Use violin plots. Add color as you see fit and turn off legends if not needed.
# add code hereExercise 9
Make a jittered scatter plot. Add color as you see fit and turn off legends if not needed.
# add code hereExercise 10
Use beeswarm plots. Add color as you see fit and turn off legends if not needed.
# add code hereExercise 11
Use multiple geoms on a single plot. Be deliberate about the order of plotting. Change the theme and the color scale of the plot. Finally, add informative labels.
# add code hereMultiple variables
Analyzing the relationship between three or more variables is called multivariate analysis.
Exercise 12
Facet the plot you created in the previous exercise by island. Adjust labels accordingly.
# add code hereBefore you continue, let’s turn off all warnings the code chunks generate and resize all figures. We’ll do this by editing the YAML.
Visualizing other variables
Exercise 13
Pick a single categorical variable from the data set and make a bar plot of its distribution.
# add code hereExercise 14
Pick two categorical variables and make a visualization to visualize the relationship between the two variables. Along with your code and output, provide an interpretation of the visualization.
Add interpretation here.
# add code hereExercise 15
Make another plot that uses at least three variables. At least one should be numeric and at least one categorical. In 1-2 sentences, describe what the plot shows about the relationships between the variables you plotted. Don’t forget to label your code chunk.
Add interpretation here.
# add code here