Data tidying and importing
have data organised in an unideal way for our analysis
want to reorganize the data for our analysis
have
# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
want
# A tibble: 6 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
6 2 item_3 <NA>

The goal of tidyr is to help you tidy your data via
NAs should be treatedNot this…

but this!

wider - more columns
# A tibble: 2 × 4
customer_id item_1 item_2 item_3
<dbl> <chr> <chr> <chr>
1 1 bread milk banana
2 2 milk toilet paper <NA>
longer - more rows
# A tibble: 6 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
6 2 item_3 <NA>
pivot_longer()pivot_longer()pivot_longer(
1 data,
cols,
names_to = "name",
values_to = "value"
)data (as usual)
pivot_longer()pivot_longer()pivot_longer()data (as usual)
cols: Columns to pivot into longer format
names_to: Name of the column where column names of pivoted variables go (character string)
values_to: Name of the column where data in pivoted variables go (character string)
pivot_longer() in contextpivot_longer() in contextcustomers |>
pivot_longer(
1 cols = item_1:item_3,
names_to = "item_no",
values_to = "item"
)cols to pivot: item_1 to item_3
pivot_longer() in contextpivot_longer() in contextpivot_longer() in contextNAscustomers |>
pivot_longer(
1 cols = item_1:item_3,
2 names_to = "item_no",
3 values_to = "item",
4 values_drop_na = TRUE
)cols to pivot: item_1 to item_3
names_to new column called item_no
values_to new column called item
NAs in the values_to column
# A tibble: 5 × 3
customer_id item_no item
<dbl> <chr> <chr>
1 1 item_1 bread
2 1 item_2 milk
3 1 item_3 banana
4 2 item_1 milk
5 2 item_2 toilet paper
Most likely, because the next step of your analysis needs it
pivot_wider()pivot_wider(
1 data,
names_from = name,
values_from = value
)data (as usual)
pivot_wider()pivot_wider()data (as usual)
names_from: Which column(s) in the long format contains what should be column names in the wide format
values_from: Which column(s) in the long format contains what should be values in the new columns in the wide format

pivot_longer(
data,
cols,
...,
cols_vary = "fastest",
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes = NULL,
values_transform = NULL
)pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = NULL,
unused_fn = NULL
)