1 + 1
[1] 2
Data visualization and transformation
Data Science with R
Programming exercises are designed to provide an opportunity for you to put what you learn in the videos and readings. These exercises feature interactive code chunks using a tool called WebR, which allow you to write, edit, and run R code without leaving your browser.
When the WEBR STATUS says “Ready!”, you can interact with the code chunks!
Interactive code chunks look like the following:
The majority of the code chunks in these documents will be interactive, like the one shown above. However, some code chunks will be static and include code that you can’t edit. You should just read and review the output of these. They will look like the following:
1 + 1
[1] 2
For example, we’ll use the tidyverse package in every programming exercise. Therefore, each exercise will start with a static code chunk that loads this package. Note that even though this code chunk runs, it does not produce any visible output.
Many of the questions in the programming exercises require you to actively do something – edit existing code, write code from scratch, write a narrative, etc. Code goes in interactive code chunks that have a Run Code button, and narrative goes in text boxes.
However, some questions come with complete code that you don’t need to edit, but you still need to run to view the output. These questions will also often invite you to make modifications to explore alternatives.
Each programming exercise will have accompanying reflection questions that can be found on Coursera. These questions are designed to help you think about concepts you just performed, and engage with other learners!
Your work does not automatically save.
For most browsers, the most efficient way to save your work is by using the Print feature. This may be useful if you want to come back and reference your work after you’ve completed it.
First, let’s get started by using R as a calculator.
In the interactive code chunk below, perform the following calculations by typing the code and then clicking Run Code. Also note that the ‘1’ on the left-hand side is the line number, not part of the code.
Run code: Run the code below as is.
Modify code: Modify the code below to multiply 3 by 6 instead of 3 by 5.
Write code: Use the code chunk below to calculate 10 divided by 2 (10 / 2
):
2 + 2
[1] 4
3 * 6
[1] 18
10 / 2
[1] 5
R is a functional language. Functions act as the name of algorithmic details that are used to accomplish a specific task. For example, if I wanted to round the value 3.23, I could accomplish this with the function round()
in R. The number 3.23 is an argument
to the function round()
. An argument is an input to a function. A value
is produced as a result.
For example:
round(3.22)
[1] 3
Often times, functions can take on multiple arguments. Previously, we saw round(3.225)
produce the value 3
. This is because round defaults to producing a value to the nearest whole number. However, we can override the default inputting a second argument. You are not expected to memorize all arguments of every function you learn during this course. You can run a ?
in front of the function name to pull up a help file that will define the arguments of a function for you. These help files can also be found online. For example, the help file for round here
From the documentation, you can see the second argument of the function round
is digits.
Your Turn Change the 0 to another number, and take note on how this changes the corresponding output when you click Run Code.
round(3.22, digits = 1)
[1] 3.2
Try accessing the documentation for round
with a ?
.
?round
Running the above code will prompt the help file. Help files give information such as a description of the function, arguments, and examples that you can run yourself!
As you may notice, the first argument and second argument are separated by a ,
. This is consistent across all functions in R.
If you provide the arguments in the exact same order as they are defined in the help file, you do not have to include the name of the argument
round(3.225, 1)
[1] 3.2
mtcars
For the remainder of this activity, we are going to practice using functions on the mtcars
data set. These data were extracted from the 1974 Motor Trend US magazine, and contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
Demo Run the following code to see the first six lines of data below
What is the name of the function used in the above code? What is the input?
The function is named head
The argument is the data set mtcars
We can also get a better sense of the data we are working with by using the glimpse()
function. This allows us to see how many rows and columns we have in our data set, the type of data we are working with, and the data inputs. For now, we just want to practice writing the function to initially explore these data.
Your Turn Use the glimpse()
function on the mtcars
data set below.
How many rows are in the mtcars
data set? How many columns?
glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…
There are 32 rows in the mtcars data set
There are 11 columns in the mtcars data set
Your Turn: There are many other functions we can use on data to explore it prior to making data visualizations. You are encouraged to explore the help files of the following:
After doing so, try to answer answer the following questions!
Use tail()
to produce the last six lines of the mtcars
data set.
tail(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
Now, use slice
to produce the first three lines of the mtcars
data set.
slice(mtcars, 1:3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
In the subsequent lessons, we are going to learn how to manipulate, work with, and plot data using a variety of functions in R.