Modeling fish
Introduction
Goal
Practice modeling using the fish
dataset on two common fish species in fish market sales.
Packages
We will use the tidyverse package for data wrangling and visualization and the tidymodels package for modeling.
Data
These data come from Kaggle and is commonly used in machine learning examples.
fish <- read_csv("https://data-science-with-r.github.io/data/fish.csv")
The data dictionary is below:
variable | description |
---|---|
species |
Species name of fish |
weight |
Weight, in grams |
length_vertical |
Vertical length, in cm |
length_diagonal |
Diagonal length, in cm |
length_cross |
Cross length, in cm |
height |
Height, in cm |
width |
Diagonal width, in cm |
Let’s take a look at the data.
fish
# A tibble: 55 × 7
species weight length_vertical length_diagonal length_cross height width
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Bream 242 23.2 25.4 30 11.5 4.02
2 Bream 290 24 26.3 31.2 12.5 4.31
3 Bream 340 23.9 26.5 31.1 12.4 4.70
4 Bream 363 26.3 29 33.5 12.7 4.46
5 Bream 430 26.5 29 34 12.4 5.13
6 Bream 450 26.8 29.7 34.7 13.6 4.93
7 Bream 500 26.8 29.7 34.5 14.2 5.28
8 Bream 390 27.6 30 35 12.7 4.69
9 Bream 450 27.6 30 35.1 14.0 4.84
10 Bream 500 28.5 30.7 36.2 14.2 4.96
# ℹ 45 more rows
Analysis
Visualizing the model
We’re going to investigate the relationship between the weights and heights of fish, predicting weight from height.
- Create an appropriate plot to investigate this relationship. Add appropriate labels to the plot.
# add code here
- If you were to draw a a straight line to best represent the relationship between the heights and weights of fish, where would it go? Why?
Add response here.
- Now, let R draw the line for you! Hint: Use
geom_smooth()
.
# add code here
- What types of questions can this plot help answer?
Add response here.
- We can use this line to make predictions. Predict what you think the weight of a fish would be with a height of 10 cm, 15 cm, and 20 cm. Which prediction is considered extrapolation?
Add response here.
- What is a residual?
Add response here.
Model fitting
- Fit a model to predict fish weights from their heights.
# add code here
- Predict what the weight of a fish would be with a height of 10 cm, 15 cm, and 20 cm using this model.
# add code here
- Calculate predicted weights for all fish in the data and visualize the residuals under this model.
# add code here
Model summary
- Display the model summary including estimates for the slope and intercept along with measurements of uncertainty around them. Show how you can extract these values from the model output.
# add code here
- Write out your model using mathematical notation.
Add response here.
Correlation
We can also assess correlation between two quantitative variables.
- What is correlation? What are values correlation can take?
Add response here.
Are you good at guessing correlation? Give it a try! https://www.rossmanchance.com/applets/2021/guesscorrelation/GuessCorrelation.html
What is the correlation between heights and weights of fish?
# add code here
Adding a third variable
- Does the relationship between heights and weights of fish change if we take into consideration species? Plot two separate straight lines for the Bream and Roach species.
# add code here
Fitting other models
- We can fit more models than just a straight line. Use
method = "loess"
. What is different from the plot created before?
# add code here