Modeling loan interest rates

Introduction

Goal

Practice modeling with multiple predictors using the data on loan interest rates.

Packages

The dataset is about loans from the peer-to-peer lender, Lending Club, from the openintro package. We will use tidyverse and tidymodels for data exploration and modeling, respectively.

Data prep

Before we use the dataset, we’ll make a few transformations to it.

  • Review the code below with your neighbor and write a summary of the data transformation pipeline.

Add response here.

loans <- loans_full_schema |>
  mutate(
    credit_util = total_credit_utilized / total_credit_limit,
    bankruptcy = as.factor(if_else(public_record_bankrupt == 0, 0, 1)),
    verified_income = droplevels(verified_income),
    homeownership = str_to_title(homeownership),
    homeownership = fct_relevel(homeownership, "Rent", "Mortgage", "Own")
  ) |>
  rename(credit_checks = inquiries_last_12m) |>
  select(
    interest_rate, loan_amount, verified_income, 
    debt_to_income, credit_util, bankruptcy, term, 
    credit_checks, issue_month, homeownership
  )

Here is a glimpse at the data:

glimpse(loans)
Rows: 10,000
Columns: 10
$ interest_rate   <dbl> 14.07, 12.61, 17.09, 6.72, 14.07, 6.72, 13.59, 11.99, …
$ loan_amount     <int> 28000, 5000, 2000, 21600, 23000, 5000, 24000, 20000, 2…
$ verified_income <fct> Verified, Not Verified, Source Verified, Not Verified,…
$ debt_to_income  <dbl> 18.01, 5.04, 21.15, 10.16, 57.96, 6.46, 23.66, 16.19, …
$ credit_util     <dbl> 0.54759517, 0.15003472, 0.66134832, 0.19673228, 0.7549…
$ bankruptcy      <fct> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, …
$ term            <dbl> 60, 36, 36, 36, 36, 36, 60, 60, 36, 36, 60, 60, 36, 60…
$ credit_checks   <int> 6, 1, 4, 0, 7, 6, 1, 1, 3, 0, 4, 4, 8, 6, 0, 0, 4, 6, …
$ issue_month     <fct> Mar-2018, Feb-2018, Feb-2018, Jan-2018, Mar-2018, Jan-…
$ homeownership   <fct> Mortgage, Rent, Rent, Rent, Rent, Own, Mortgage, Mortg…

Analysis

Get to know the data

  • What is a typical interest rate in this dataset? What are some attributes of a typical loan and a typical borrower. Give yourself no more than 5 minutes for this exploration and share 1-2 findings.
# add code here
# add code here

Interest rate vs. credit utilization

  • For a regression model for predicting interest rate from credit utilization. Display the summary output.
# add code here
  • Visualize the model.
# add code here
  • Interpret the intercept and the slope.

Intercept: Add response here.

Slope: Add response here.

Interest rate vs. homeownership

  • Fit a regression model for predicting interest rate from homeownership and display the summary output.
# add code here
  • Interpret each coefficient in context of the problem.

    • Intercept: Add response here.

    • Slopes:

      • Add response here.

      • Add response here.

Interest rate vs. credit utilization and homeownership

Main effects model

  • Fit a regression model to predict interest rate from credit utilization and homeownership, without an interaction effect between the two predictors. Display the summary output.
# add code here
  • Write the estimated regression equation for loan applications from each of the homeownership groups separately.
    • Rent: \(add~math~text~here\)
    • Mortgage: \(add~math~text~here\)
    • Own: \(add~math~text~here\)
  • How does the model predict the interest rate to vary as credit utilization varies for loan applicants with different homeownership status. Are the rates the same or different?

Add response here.

Interaction effects model

  • Fit a regression model to predict interest rate from credit utilization and homeownership, with an interaction effect between the two predictors. Display the summary output.
# add code here
  • Write the estimated regression equation for loan applications from each of the homeownership groups separately.
    • Rent: \(add~math~text~here\)
    • Mortgage: \(add~math~text~here\)
    • Own: \(add~math~text~here\)
  • How does the model predict the interest rate to vary as credit utilization varies for loan applicants with different homeownership status. Are the rates the same or different?

Add response here.

Choosing a model

Rule of thumb: Occam’s Razor - Don’t over-complicate the situation! We prefer the simplest best model.

  • Display model level summary statistics.
# add code here
  • What is R-squared? What is adjusted R-squared?

Add response here.

  • Based on the adjusted \(R^2\)s of these two models, which one do we prefer?

Add response here.

Another model to consider

  • Let’s add one more model to the variable – issue month. Should we add this variable to the interaction effects model from earlier?
# add code here

Add response here.