Data tidying and importing
have multiple data frames
want to join them together
Information on 10 women in science who changed the world:
| name |
|---|
| Ada Lovelace |
| Marie Curie |
| Janaki Ammal |
| Chien-Shiung Wu |
| Katherine Johnson |
| Rosalind Franklin |
| Vera Rubin |
| Gladys West |
| Flossie Wong-Staal |
| Jennifer Doudna |
professions# A tibble: 10 × 2
name profession
<chr> <chr>
1 Ada Lovelace Mathematician
2 Marie Curie Physicist and Chemist
3 Janaki Ammal Botanist
4 Chien-Shiung Wu Physicist
5 Katherine Johnson Mathematician
6 Rosalind Franklin Chemist
7 Vera Rubin Astronomer
8 Gladys West Mathematician
9 Flossie Wong-Staal Virologist and Molecular Biologist
10 Jennifer Doudna Biochemist
datesworks# A tibble: 9 × 2
name known_for
<chr> <chr>
1 Ada Lovelace first computer algorithm
2 Marie Curie theory of radioactivity, discovery of…
3 Janaki Ammal hybrid species, biodiversity protection
4 Chien-Shiung Wu confim and refine theory of radioactiv…
5 Katherine Johnson calculations of orbital mechanics crit…
6 Vera Rubin existence of dark matter
7 Gladys West mathematical modeling of the shape of …
8 Flossie Wong-Staal first scientist to clone HIV and creat…
9 Jennifer Doudna one of the primary developers of CRISP…
# A tibble: 10 × 5
name profession birth_year death_year known_for
<chr> <chr> <dbl> <dbl> <chr>
1 Ada Lovelace Mathemati… NA NA first co…
2 Marie Curie Physicist… NA NA theory o…
3 Janaki Ammal Botanist 1897 1984 hybrid s…
4 Chien-Shiung … Physicist 1912 1997 confim a…
5 Katherine Joh… Mathemati… 1918 2020 calculat…
6 Rosalind Fran… Chemist 1920 1958 <NA>
7 Vera Rubin Astronomer 1928 2016 existenc…
8 Gladys West Mathemati… 1930 NA mathemat…
9 Flossie Wong-… Virologis… 1947 2020 first sc…
10 Jennifer Doud… Biochemist 1964 NA one of t…
left_join(): all rows from xright_join(): all rows from yfull_join(): all rows from both x and ysemi_join(): all rows from x where there are matching values in y, keeping just columns from xinner_join(): all rows from x where there are matching values in y, return all combination of multiple matches in the case of multiple matchesanti_join(): return all rows from x where there are not matching values in y, never duplicate rows of xFor the next few slides…
left_join()left_join()Joining with `by = join_by(name)`
# A tibble: 10 × 4
name profession birth_year death_year
<chr> <chr> <dbl> <dbl>
1 Ada Lovelace Mathematician NA NA
2 Marie Curie Physicist and C… NA NA
3 Janaki Ammal Botanist 1897 1984
4 Chien-Shiung Wu Physicist 1912 1997
5 Katherine Johnson Mathematician 1918 2020
6 Rosalind Franklin Chemist 1920 1958
7 Vera Rubin Astronomer 1928 2016
8 Gladys West Mathematician 1930 NA
9 Flossie Wong-Staal Virologist and … 1947 2020
10 Jennifer Doudna Biochemist 1964 NA
right_join()left_join()Joining with `by = join_by(name)`
# A tibble: 8 × 4
name profession birth_year death_year
<chr> <chr> <dbl> <dbl>
1 Janaki Ammal Botanist 1897 1984
2 Chien-Shiung Wu Physicist 1912 1997
3 Katherine Johnson Mathematician 1918 2020
4 Rosalind Franklin Chemist 1920 1958
5 Vera Rubin Astronomer 1928 2016
6 Gladys West Mathematician 1930 NA
7 Flossie Wong-Staal Virologist and M… 1947 2020
8 Jennifer Doudna Biochemist 1964 NA
full_join()left_join()Joining with `by = join_by(name)`
# A tibble: 10 × 4
name profession birth_year death_year
<chr> <chr> <dbl> <dbl>
1 Ada Lovelace Mathematician NA NA
2 Marie Curie Physicist and C… NA NA
3 Janaki Ammal Botanist 1897 1984
4 Chien-Shiung Wu Physicist 1912 1997
5 Katherine Johnson Mathematician 1918 2020
6 Rosalind Franklin Chemist 1920 1958
7 Vera Rubin Astronomer 1928 2016
8 Gladys West Mathematician 1930 NA
9 Flossie Wong-Staal Virologist and … 1947 2020
10 Jennifer Doudna Biochemist 1964 NA
inner_join()inner_join()Joining with `by = join_by(name)`
# A tibble: 8 × 4
name profession birth_year death_year
<chr> <chr> <dbl> <dbl>
1 Janaki Ammal Botanist 1897 1984
2 Chien-Shiung Wu Physicist 1912 1997
3 Katherine Johnson Mathematician 1918 2020
4 Rosalind Franklin Chemist 1920 1958
5 Vera Rubin Astronomer 1928 2016
6 Gladys West Mathematician 1930 NA
7 Flossie Wong-Staal Virologist and M… 1947 2020
8 Jennifer Doudna Biochemist 1964 NA
semi_join()semi_join()Joining with `by = join_by(name)`
# A tibble: 8 × 2
name profession
<chr> <chr>
1 Janaki Ammal Botanist
2 Chien-Shiung Wu Physicist
3 Katherine Johnson Mathematician
4 Rosalind Franklin Chemist
5 Vera Rubin Astronomer
6 Gladys West Mathematician
7 Flossie Wong-Staal Virologist and Molecular Biologist
8 Jennifer Doudna Biochemist
anti_join()anti_join()Joining with `by = join_by(name)`
Joining with `by = join_by(name)`
# A tibble: 10 × 5
name profession birth_year death_year known_for
<chr> <chr> <dbl> <dbl> <chr>
1 Ada Lovelace Mathemati… NA NA first co…
2 Marie Curie Physicist… NA NA theory o…
3 Janaki Ammal Botanist 1897 1984 hybrid s…
4 Chien-Shiung … Physicist 1912 1997 confim a…
5 Katherine Joh… Mathemati… 1918 2020 calculat…
6 Rosalind Fran… Chemist 1920 1958 <NA>
7 Vera Rubin Astronomer 1928 2016 existenc…
8 Gladys West Mathemati… 1930 NA mathemat…
9 Flossie Wong-… Virologis… 1947 2020 first sc…
10 Jennifer Doud… Biochemist 1964 NA one of t…