Tidy data



Data tidying and importing

Data Science with R

Tidy data

Happy families are all alike; every unhappy family is unhappy in its own way.

Leo Tolstoy

Characteristics of tidy data:

  • Each variable forms a column.
  • Each observation forms a row.
  • Each type of observational unit forms a table.

Characteristics of untidy data:

!@#$%^&*()

What makes this data not tidy?

WW2 Army Air Force combat aircraft

What makes this data not tidy?

Estimated HIV prevalence among 15-49 year olds

What makes this data not tidy?

US Selected Economic Characteristics, ACS 2022

Displaying vs. summarizing data

Display

starwars |>
  select(name, height, gender)
# A tibble: 87 × 3
   name               height gender   
   <chr>               <int> <chr>    
 1 Luke Skywalker        172 masculine
 2 C-3PO                 167 masculine
 3 R2-D2                  96 masculine
 4 Darth Vader           202 masculine
 5 Leia Organa           150 feminine 
 6 Owen Lars             178 masculine
 7 Beru Whitesun Lars    165 feminine 
 8 R5-D4                  97 masculine
 9 Biggs Darklighter     183 masculine
10 Obi-Wan Kenobi        182 masculine
# ℹ 77 more rows

Summarize

starwars |>
  group_by(gender) |>
  summarize(
    n = n(),
    avg_height = mean(height, na.rm = TRUE)
  )
# A tibble: 3 × 3
  gender        n avg_height
  <chr>     <int>      <dbl>
1 feminine     17       167.
2 masculine    66       177.
3 <NA>          4       175