Data tidying and importing
Rows: 1000 Columns: 19
── Column specification ────────────────────────────────────
Delimiter: ","
chr (17): first_name, last_name, born, died, born_countr...
dbl (2): id, year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1,000 × 19
id first_name last_name born died born_country
<dbl> <chr> <chr> <chr> <chr> <chr>
1 160 Jacobus H. van 't Hoff 1852… 3/1/… the Netherl…
2 569 Sully Prudhomme 1839… 9/7/… France
3 293 Emil von Behring 1854… 3/31… Prussia (no…
4 462 Henry Dunant 1828… 10/3… Switzerland
5 1 Wilhelm Conrad Röntgen 1845… 2/10… Prussia (no…
6 463 Frédéric Passy 1822… 6/12… France
7 464 Élie Ducommun 1833… 12/7… Switzerland
8 465 Albert Gobat 1843… 3/16… Switzerland
9 294 Ronald Ross 1857… 9/16… India
10 161 Emil Fischer 1852… 7/15… Prussia (no…
# ℹ 990 more rows
# ℹ 13 more variables: born_country_code <chr>,
# born_city <chr>, died_country <chr>,
# died_country_code <chr>, died_city <chr>, gender <chr>,
# year <dbl>, category <chr>, overall_motivation <chr>,
# motivation <chr>, organization_name <chr>,
# organization_city <chr>, organization_country <chr>
Write a file:
Read it back in to inspect:
Rows: 3 Columns: 2
── Column specification ────────────────────────────────────
Delimiter: ","
chr (1): y
dbl (1): x
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 3 × 2
x y
<dbl> <chr>
1 1 a
2 2 b
3 3 c
[1] "ID" "Price"
[3] "neighbourhood" "accommodates"
[5] "Number of bathrooms" "Number of Bedrooms"
[7] "n beds" "Review Scores Rating"
[9] "Number of reviews" "listing_url"
Which type is x? Why?
NAs# A tibble: 9 × 3
x y z
<dbl> <chr> <chr>
1 1 a hi
2 NA b hello
3 3 <NA> <NA>
4 4 d ola
5 5 e hola
6 NA f whatup
7 7 g wassup
8 8 h sup
9 9 i <NA>

Warning: One or more parsing issues, call `problems()` on your data
frame for details, e.g.:
dat <- vroom(...)
problems(dat)
# A tibble: 9 × 3
x y z
<dbl> <chr> <chr>
1 1 a hi
2 NA b hello
3 3 Not applicable 9999
4 4 d ola
5 5 e hola
6 NA f whatup
7 7 g wassup
8 8 h sup
9 9 i <NA>

read_csv()| type function | data type |
|---|---|
col_character() |
character |
col_date() |
date |
col_datetime() |
POSIXct (date-time) |
col_double() |
double (numeric) |
col_factor() |
factor |
col_guess() |
let readr guess (default) |
col_integer() |
integer |
col_logical() |
logical |
col_number() |
numbers mixed with non-number characters |
col_numeric() |
double or integer |
col_skip() |
do not read |
col_time() |
time |
# A tibble: 1,000 × 19
id first_name last_name born died
<dbl> <chr> <chr> <chr> <dttm>
1 160 Jacobus H. van 't Ho… 1852… 1911-03-01 00:00:00
2 569 Sully Prudhomme 1839… 1907-09-07 00:00:00
3 293 Emil von Behri… 1854… 1917-03-31 00:00:00
4 462 Henry Dunant 1828… 1910-10-30 00:00:00
5 1 Wilhelm Conrad Röntgen 1845… 1923-02-10 00:00:00
6 463 Frédéric Passy 1822… 1912-06-12 00:00:00
7 464 Élie Ducommun 1833… 1906-12-07 00:00:00
8 465 Albert Gobat 1843… 1914-03-16 00:00:00
9 294 Ronald Ross 1857… 1932-09-16 00:00:00
10 161 Emil Fischer 1852… 1919-07-15 00:00:00
# ℹ 990 more rows
# ℹ 14 more variables: born_country <chr>,
# born_country_code <chr>, born_city <chr>,
# died_country <chr>, died_country_code <chr>,
# died_city <chr>, gender <chr>, year <dbl>,
# category <chr>, overall_motivation <chr>,
# motivation <chr>, organization_name <chr>, …
edibnb_col_names <- read_excel(
"data/edibnb-bad-names.xlsx",
col_names = c(
"id", "price", "neighbourhood", "accommodates", "bathroom",
"bedroom", "bed", "review_scores_rating", "n_reviews", "url"
)
)
names(edibnb_col_names) [1] "id" "price"
[3] "neighbourhood" "accommodates"
[5] "bathroom" "bedroom"
[7] "bed" "review_scores_rating"
[9] "n_reviews" "url"
NAsNAsWarning: Expecting numeric in A3 / R3C1: got 'NA'
Warning: Expecting numeric in A7 / R7C1: got '.'
# A tibble: 9 × 3
x y z
<dbl> <chr> <chr>
1 1 a hi
2 NA b hello
3 3 Not applicable 9999
4 4 d ola
5 5 e hola
6 NA f whatup
7 7 g wassup
8 8 h sup
9 9 i <NA>
read_excel()| type function | data type |
|---|---|
"skip" |
do not read |
"guess" |
let readxl guess (default) |
"logical" |
logical |
"numeric" |
numeric |
"date" |
POSIXct (date-time) |
"text" |
character |
"list" |
a list of length 1 vectors |