library(tidyverse)
library(scales)Country populations over time
Introduction
Goal
Our ultimate goal in this application exercise is to make the following data visualization.

Packages
We will use the tidyverse and scales packages for data wrangling and visualization.
Data
These data come from The World Bank and reflect population counts for the years 2000 to 2023. The populations given are mid-year estimates.
population <- read_csv("https://data-science-with-r.github.io/data/population.csv")Let’s take a look at the data.
population# A tibble: 217 × 28
series_name series_code country_name country_code `2000` `2001` `2002` `2003`
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Population… SP.POP.TOTL Afghanistan AFG 1.95e7 1.97e7 2.10e7 2.26e7
2 Population… SP.POP.TOTL Albania ALB 3.09e6 3.06e6 3.05e6 3.04e6
3 Population… SP.POP.TOTL Algeria DZA 3.08e7 3.12e7 3.16e7 3.21e7
4 Population… SP.POP.TOTL American Sa… ASM 5.82e4 5.83e4 5.82e4 5.79e4
5 Population… SP.POP.TOTL Andorra AND 6.61e4 6.78e4 7.08e4 7.39e4
6 Population… SP.POP.TOTL Angola AGO 1.64e7 1.69e7 1.75e7 1.81e7
7 Population… SP.POP.TOTL Antigua and… ATG 7.51e4 7.62e4 7.72e4 7.81e4
8 Population… SP.POP.TOTL Argentina ARG 3.71e7 3.75e7 3.79e7 3.83e7
9 Population… SP.POP.TOTL Armenia ARM 3.17e6 3.13e6 3.11e6 3.08e6
10 Population… SP.POP.TOTL Aruba ABW 8.91e4 9.07e4 9.18e4 9.27e4
# ℹ 207 more rows
# ℹ 20 more variables: `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
# `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
# `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>,
# `2018` <dbl>, `2019` <dbl>, `2020` <dbl>, `2021` <dbl>, `2022` <dbl>,
# `2023` <dbl>
Analysis
Tidying
- What are the aesthetic mappings in the plot shown above, i.e., what pieces of information do we need represented as columns (variables) in our data frame in order to be able to recreate this plot?
Add response here.
- Reshape the
populationdata such that it can be used to recreate the plot above. Note: For now, you can keep all the countries in the dataset.
# add code here- What is the type of the
yearvariable? Why? What should it be?
Add response here.
- Start over with pivoting, and this time also make sure
yearis a numerical variable in the resulting data frame. Save the resulting data frame aspopulation_longer.
# add code hereVisualization
- Now we start making our plot, but let’s not get too fancy right away. Create a line plot of populations of the United States, India, and China over the years. Represent the data with points and lines.
# add code here- What aspects of the plot need to be updated to go from the draft you created above to the goal plot at the beginning of this application exercise.
Add response here.
- Use different shapes for each country’s points.
# add code here- Update x-axis scale such that the years displayed go from 2000 to 2024 in increments of 4 years.
# add code here- Update the y-axis so it’s scaled to millions and uses the same breaks as the goal plot.
# add code here- Update colors for each country using the following level / color assignments.
- “United States” = “#0A3161”
- “India” = “#FF671F”
- “China” = “#EE1C25”
# add code here- Update the plot labels (
title,subtitle,x,y, andcaption) and usetheme_minimal().
# add code here- Finally, move the legend to the top of the plot and remove its label.
# add code here