Data visualization and transformation
loans <- loans_full_schema |>
select(
loan_amount, interest_rate, term, grade,
state, annual_income, homeownership, debt_to_income
)
glimpse(loans)
Rows: 10,000
Columns: 8
$ loan_amount <int> 28000, 5000, 2000, 21600, 23000, 50…
$ interest_rate <dbl> 14.07, 12.61, 17.09, 6.72, 14.07, 6…
$ term <dbl> 60, 36, 36, 36, 36, 36, 60, 60, 36,…
$ grade <fct> C, C, D, A, C, A, C, B, C, A, C, B,…
$ state <fct> NJ, HI, WI, PA, CA, KY, MI, AZ, NV,…
$ annual_income <dbl> 90000, 40000, 40000, 30000, 35000, …
$ homeownership <fct> MORTGAGE, RENT, RENT, RENT, RENT, O…
$ debt_to_income <dbl> 18.01, 5.04, 21.15, 10.16, 57.96, 6…
The distribution of loan amounts in this sample is unimodal and right-skewed distribution.
Median loan amount in this sample is $14,500.
In this sample, the middle 50% of the loan amounts are between $8,000 and $24,000.
There are no clear outliers in the loan amounts in this sample.
adjust
edsummarize()
summarize()
returns for a summary statistic for all observations in the data:# A tibble: 1 × 1
mean_loan_amt
<dbl>
1 16362.
summarize()
summarize()
will work even if you don’t name your summary statistic, or give it a non-informative/bad name, but I don’t recommend it!