1 Chapter 2 Lab

1.1 Basic Commands

R uses functions to perform operations. To run a function called funcname, we type funcname(input1, input2), where the inputs (or arguments) input1 argument and input2 tell R how to run the function. A function can have any number of inputs. For example, to create a vector of numbers, we use the function c() (for concatenate). Any numbers inside the parentheses are joined together. The following command instructs R to join together the numbers 1, 3, 2, and 5, and to save them as a vector named x. When we type x, it gives us back the vector.

x <- c(1, 3, 2, 5) 
x
[1] 1 3 2 5

Note that the > is not part of the command; rather, it is printed by R to indicate that it is ready for another command to be entered. We can also save things using = rather than <-:

x = c(1, 6, 2)
x
[1] 1 6 2
y = c(1, 4, 3)
y
[1] 1 4 3

Hitting the up arrow multiple times will display the previous commands, which can then be edited. This is useful since one often wishes to repeat a similar command. In addition, typing ?funcname will always cause R to open a new help file window with additional information about the function funcname.

We can tell R to add two sets of numbers together. It will then add the first number from x to the first number from y, and so on. However, x and y should be the same length. We can check their length using the length() function.

length(x)
[1] 3
length(y)
[1] 3
x + y
[1]  2 10  5

The ls() function allows us to look at a list of all of the objects, such as data and functions, that we have saved so far. The rm() function can be used to delete any that we don’t want.

ls()
[1] "x" "y"
rm(x, y)
ls()
character(0)

It’s also possible to remove all objects at once:

rm(list = ls())

The matrix() function can be used to create a matrix of numbers. Before we use the matrix() function, we can learn more about it:

?matrix

The help file reveals that the matrix() function takes a number of inputs, but for now we focus on the first three: the data (the entries in the matrix), the number of rows, and the number of columns. First, we create a simple matrix.

x <- matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2)
x
     [,1] [,2]
[1,]    1    3
[2,]    2    4

Note that we could just as well omit typing data =, nrow =, and ncol = in the matrix() command above: that is, we could just type

x <- matrix(c(1, 2, 3, 4), 2, 2)

and this would have the same effect. However, it can sometimes be useful to specify the names of the arguments passed in, since otherwise R will assume that the function arguments are passed into the function in the same order that is given in the function’s help file. As this example illustrates, by default R creates matrices by successively filling in columns. Alternatively, the byrow = TRUE option can be used to populate the matrix in order of the rows.

matrix(c(1, 2, 3, 4), 2, 2, byrow = TRUE)
     [,1] [,2]
[1,]    1    2
[2,]    3    4

Notice that in the above command we did not assign the matrix to a value such as x. In this case the matrix is printed to the screen but is not saved for future calculations. The sqrt() function returns the square root of each element of a vector or matrix. The command x^2 raises each element of x to the power 2; any powers are possible, including fractional or negative powers.

sqrt(x)
         [,1]     [,2]
[1,] 1.000000 1.732051
[2,] 1.414214 2.000000
x^2
     [,1] [,2]
[1,]    1    9
[2,]    4   16

The rnorm() function generates a vector of random normal variables, with first argument n the sample size. Each time we call this function, we will get a different answer. Here we create two correlated sets of numbers, x and y, and use the cor() function to compute the correlation between them.

x = rnorm(50)
y = x + rnorm(50, mean = 50, sd = 0.1)
cor(x, y)
[1] 0.996964

By default, rnorm() creates standard normal random variables with a mean of 0 and a standard deviation of 1. However, the mean and standard deviation can be altered using the mean and sd arguments, as illustrated above. Sometimes we want our code to reproduce the exact same set of random numbers; we can use the set.seed() function to do this. The set.seed() function takes an (arbitrary) integer argument.

set.seed(1303)
rnorm(50)
 [1] -1.1439763145  1.3421293656  2.1853904757  0.5363925179  0.0631929665
 [6]  0.5022344825 -0.0004167247  0.5658198405 -0.5725226890 -1.1102250073
[11] -0.0486871234 -0.6956562176  0.8289174803  0.2066528551 -0.2356745091
[16] -0.5563104914 -0.3647543571  0.8623550343 -0.6307715354  0.3136021252
[21] -0.9314953177  0.8238676185  0.5233707021  0.7069214120  0.4202043256
[26] -0.2690521547 -1.5103172999 -0.6902124766 -0.1434719524 -1.0135274099
[31]  1.5732737361  0.0127465055  0.8726470499  0.4220661905 -0.0188157917
[36]  2.6157489689 -0.6931401748 -0.2663217810 -0.7206364412  1.3677342065
[41]  0.2640073322  0.6321868074 -1.3306509858  0.0268888182  1.0406363208
[46]  1.3120237985 -0.0300020767 -0.2500257125  0.0234144857  1.6598706557

We use set.seed() throughout the labs whenever we perform calculations involving random quantities. In general this should allow the user to reproduce our results. However, it should be noted that as new versions of R become available it is possible that some small discrepancies may form between the book and the output from R.

The mean() and var() functions can be used to compute the mean and variance of a vector of numbers. Applying sqrt() to the output of var() will give the standard deviation. Or we can simply use the sd() function.

set.seed(3)
y <- rnorm(100)
mean(y)
[1] 0.01103557
var(y)
[1] 0.7328675
sqrt(var(y))
[1] 0.8560768
sd(y)
[1] 0.8560768

1.2 Graphics

The plot() function is the primary way to plot data in R. For instance, plot(x, y) produces a scatterplot of the numbers in x versus the numbers in y. There are many additional options that can be passed in to the plot() function. For example, passing in the argument xlab will result in a label on the x-axis. To find out more information about the plot() function, type ?plot.

x <- rnorm(100)
y <- rnorm(100)
plot(x, y)

plot(x, y, xlab = "this is the x-axis", ylab = "this is the y-axis", 
     main = "Plot of X vs Y")

We will often want to save the output of an R plot. The command that we use to do this will depend on the file type that we would like to create. For instance, to create a jpeg, we use the jpeg() function, and to create a pdf, we use the pdf() function.

jpeg(file= "./JPG/YourFileName.jpeg")
plot(x, y, col = "green")
dev.off()
png 
  2 

To display the saved file as shown in Figure 1.1, use the include_graphics() function from knitr.

Using knitr::include_graphics()

Figure 1.1: Using knitr::include_graphics()

The function dev.off() indicates to R that we are done creating the plot. Alternatively, we can simply copy the plot window and paste it into an appropriate file type, such as a Word document.

The function seq() can be used to create a sequence of numbers. For instance, seq(a, b) makes a vector of integers between a and b. There are many other options: for instance, seq(0, 1, length = 10) makes a sequence of 10 numbers that are equally spaced between 0 and 1. Typing 3:11 is a shorthand for seq(3, 11) for integer arguments.

x <- seq(1, 10)
x
 [1]  1  2  3  4  5  6  7  8  9 10
x <- 1:10
x
 [1]  1  2  3  4  5  6  7  8  9 10
x = seq(-pi, pi, length = 50)

We will now create some more sophisticated plots. The contour() function produces a contour plot in order to represent three-dimensional data; it is like a topographical map. It takes three arguments:

  1. A vector of the x values (the first dimension),
  2. A vector of the y values (the second dimension), and
  3. A matrix whose elements correspond to the z value (the third dimension) for each pair of (x, y) coordinates.

As with the plot() function, there are many other inputs that can be used to fine-tune the output of the contour() function. To learn more about these, take a look at the help file by typing ?contour.

y <- x
f <- outer(x, y, function(x, y){
  cos(y) / (1 + x^2)
})
contour(x, y, f)
contour(x, y, f, nlevels = 45, add = TRUE)

fa <- (f -t(f))/2
contour(x, y, fa, nlevels = 15)

The image() function works the same way as contour(), except that it produces a color-coded plot whose colors depend on the z value. This is known as a heatmap, and is sometimes used to plot temperature in weather forecasts. Alternatively, persp() can be used to produce a three-dimensional plot. The arguments theta and phi control the angles at which the plot is persp() viewed.

image(x, y, fa)

persp(x, y, fa)

persp(x, y, fa, theta = 30)

persp(x, y, fa, theta = 30, phi = 20)

persp(x, y, fa, theta = 30, phi = 70)

persp(x, y, fa, theta = 30, phi = 40)

1.3 Indexing

We often wish to examine part of a set of data. Suppose that our data is stored in the matrix A.

A <- matrix(1:16, 4, 4)
A
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

Then, typing

A[2, 3]
[1] 10

will select the element corresponding to the second row and the third column. The first number after the open-bracket symbol [ always refers to the row, and the second number always refers to the column. We can also select multiple rows and columns at a time, by providing vectors as the indices.

A[c(1, 3), c(2, 4)]
     [,1] [,2]
[1,]    5   13
[2,]    7   15
A[1:3, 2:4]
     [,1] [,2] [,3]
[1,]    5    9   13
[2,]    6   10   14
[3,]    7   11   15
A[1:2, ]
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
A[, 1:2]
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8

The last two examples include either no index for the columns or no index for the rows. These indicate that R should include all columns or all rows, respectively. R treats a single row or column of a matrix as a vector.

A[1, ]
[1]  1  5  9 13

The use of a negative sign - in the index tells R to keep all rows or columns except those indicated in the index.

A[-c(1, 3), ]
     [,1] [,2] [,3] [,4]
[1,]    2    6   10   14
[2,]    4    8   12   16

The dim() function outputs the number of rows followed by the number of columns of a given matrix.

dim(A)
[1] 4 4

1.4 Loading Data

For most analyses, the first step involves importing a data set into R. The read.table() function is one of the primary ways to do this. The help file contains details about how to use this function. We can use the function write.table() to export data. Before attempting to load a data set, we must make sure that R knows to search for the data in the proper directory. For example on a Windows system one could select the directory using the Change dir. . . option under the File menu. However, the details of how to do this depend on the operating system (e.g. Windows, Mac, Unix) that is being used, and so we do not give further details here. We begin by loading in the Auto data set. This data is part of the ISLR library (we discuss libraries in Chapter 3) but to illustrate the read.table() function we load it now from a text file. The following command will load the Auto.data file into R and store it as an object called Auto, in a format referred to as a data frame. (The text file can be obtained from this book’s website.)

site <- "https://hasthika.github.io/STT3851/Assignments/Auto.txt"
Auto <- read.table(file = site)
head(Auto) 
  V1  V2        V3           V4         V5     V6           V7   V8     V9
1 NA mpg cylinders displacement horsepower weight acceleration year origin
2  1  18         8          307        130   3504           12   70      1
3  2  15         8          350        165   3693         11.5   70      1
4  3  18         8          318        150   3436           11   70      1
5  4  16         8          304        150   3433           12   70      1
6  5  17         8          302        140   3449         10.5   70      1
                        V10
1                      name
2 chevrolet chevelle malibu
3         buick skylark 320
4        plymouth satellite
5             amc rebel sst
6               ford torino

Note that Auto.data is simply a text file, which you could alternatively open on your computer using a standard text editor. It is often a good idea to view a data set using a text editor or other software such as Excel before loading it into R. This particular data set has not been loaded correctly, because R has assumed that the variable names are part of the data and so has included them in the first row. The data set also includes a number of missing observations, indicated by a question mark ?. Missing values are a common occurrence in real data sets. Using the option header = TRUE in the read.table() function tells R that the first line of the file contains the variable names, and using the option na.strings tells R that any time it sees a particular character or set of characters (such as a question mark), it should be treated as a missing element of the data matrix.

Auto <- read.table(file = site, header = TRUE, sep = "", na.strings = "?")
head(Auto)  
  X mpg cylinders displacement horsepower weight acceleration year origin
1 1  18         8          307        130   3504         12.0   70      1
2 2  15         8          350        165   3693         11.5   70      1
3 3  18         8          318        150   3436         11.0   70      1
4 4  16         8          304        150   3433         12.0   70      1
5 5  17         8          302        140   3449         10.5   70      1
6 6  15         8          429        198   4341         10.0   70      1
                       name
1 chevrolet chevelle malibu
2         buick skylark 320
3        plymouth satellite
4             amc rebel sst
5               ford torino
6          ford galaxie 500
library(DT)
datatable(Auto)

Excel is a common-format data storage program. An easy way to load such data into R is to save it as a csv (comma separated value) file and then use the read.csv() function to load it in.

site <- "https://hasthika.github.io/STT3851/Assignments/Auto.csv"
Auto1 <- read.csv(file = site, na.strings = "?")
dim(Auto1)
[1] 392  11
datatable(Auto1, rownames = FALSE, class = 'cell-border stripe', colnames = c('cyl' = 'cylinders', 'disp' = 'displacement', 'hp' = 'horsepower', 'accel' = 'acceleration'))

The dim() function tells us that the data has 392 observations, or rows, and 11 variables, or columns. There are various ways to deal with the missing data. In this case, only five of the rows contain missing observations, and so we choose to use the na.omit() function to simply remove these rows.

Auto2 <- na.omit(Auto1)
dim(Auto2)
[1] 392  11

Once the data are loaded correctly, we can use names() to check the variable names.

names(Auto2)
 [1] "X.1"          "X"            "mpg"          "cylinders"    "displacement"
 [6] "horsepower"   "weight"       "acceleration" "year"         "origin"      
[11] "name"        

1.5 Additional Graphical and Numerical Summaries

We can use the plot() function to produce scatterplots of the quantitative variables. However, simply typing the variable names will produce an error message, because R does not know to look in the Auto data set for those variables. To refer to a variable, we must type the data set and the variable name joined with a $ symbol.

plot(Auto2$cylinders, Auto2$mpg)

plot(mpg ~ cylinders, data = Auto2)

with(data = Auto2,
     plot(cylinders, mpg)
     )

The cylinders variable is stored as a numeric vector, so R has treated it as quantitative. However, since there are only a small number of possible values for cylinders, one may prefer to treat it as a qualitative variable. The as.factor() function converts quantitative variables into qualitative variables.

Auto2$cylinders <- as.factor(Auto2$cylinders)

If the variable plotted on the x-axis is categorial, then boxplots will automatically be produced by the plot() function. As usual, a number of options can be specified in order to customize the plots.

plot(Auto2$cylinders, Auto2$mpg)

plot(mpg ~ cylinders, data = Auto2)

plot(mpg ~ cylinders, data = Auto2, col = "red")

plot(mpg ~ cylinders, data = Auto2, col = "red", varwidth = TRUE)

plot(mpg ~ cylinders, data = Auto2, col = "red", varwidth = TRUE, 
     horizontal = TRUE)

plot(mpg ~ cylinders, data = Auto2, col = "red", varwidth = TRUE, 
     horizontal = TRUE, xlab = "cylinders", ylab = "MPG")

The hist() function can be used to plot a histogram. Note that col = 2 has the same effect as col = "red".

hist(Auto2$mpg, col = "red", xlab = "MPG", main = "Your Title Here")

hist(Auto2$mpg, col = "red", xlab = "MPG", main = "Your Title Here", breaks = 15)

1.6 Creating boxplots and histograms with ggplot2

See the geom_boxplot documentation and the geom_freqpoly documentation for more details.

library(ggplot2)
p <- ggplot(data = Auto2, aes(x = cylinders, y = mpg))
p +  geom_boxplot()

p +  geom_boxplot() + 
  coord_flip()

p +  geom_boxplot() + 
  coord_flip() + 
  theme_bw()

p +  geom_boxplot(fill = "red") + 
  coord_flip() + 
  theme_bw()

p +  geom_boxplot(fill = "red") + 
  coord_flip() + theme_bw() + 
  labs(x = "Cylinders", y = "MPG")

p +  geom_boxplot(fill = "red", varwidth = TRUE) + 
  coord_flip() + 
  theme_bw() + 
  labs(x = "Cylinders", y = "MPG")

p <- ggplot(data = Auto2, aes(x = mpg))
p + geom_histogram()

p + geom_histogram(binwidth = 5)

p + geom_histogram(binwidth = 5, fill = "blue")

p + geom_histogram(binwidth = 5, fill = "blue", color = "black")

p + geom_histogram(binwidth = 5, fill = "blue", color = "black") + 
  theme_bw()

p + geom_histogram(binwidth = 5, fill = "blue", 
                   color = "black", aes(y = ..density..)) + 
  theme_bw()

1.7 Creating boxplots and histograms with ggvis

library(ggvis)
Auto2 %>% 
  ggvis(x = ~cylinders, y = ~mpg) %>% 
  layer_boxplots(fill := "red")
Auto2 %>% 
  ggvis(x = ~mpg) %>% 
  layer_histograms(fill := "lightblue", width = 1)
Auto2 %>% 
  ggvis(x = ~mpg) %>% 
  layer_histograms(fill := "pink", width = 5) %>% 
  add_axis("x", title = "Miles Per Gallon")

1.8 Using plotly

library(plotly)
p1 <- ggplot(data = Auto2, aes(x = cylinders, y = mpg)) + 
  geom_boxplot(fill = "red", varwidth = TRUE) + 
  coord_flip() + 
  theme_bw() + 
  labs(x = "Cylinders", y = "MPG")
p2 <- ggplotly(p1)
p2
p3 <- ggplot(data = Auto2, aes(x = mpg)) + 
  geom_histogram(binwidth = 5, fill = "blue", color = "black") + 
  theme_bw()
p4 <- ggplotly(p3)
p4

The summary() function produces a numerical summary of each variable in a particular data set.

summary(Auto2)
      X.1               X               mpg        cylinders  displacement  
 Min.   :  1.00   Min.   :  1.00   Min.   : 9.00   3:  4     Min.   : 68.0  
 1st Qu.: 98.75   1st Qu.: 99.75   1st Qu.:17.00   4:199     1st Qu.:105.0  
 Median :196.50   Median :198.50   Median :22.75   5:  3     Median :151.0  
 Mean   :196.50   Mean   :198.52   Mean   :23.45   6: 83     Mean   :194.4  
 3rd Qu.:294.25   3rd Qu.:296.25   3rd Qu.:29.00   8:103     3rd Qu.:275.8  
 Max.   :392.00   Max.   :397.00   Max.   :46.60             Max.   :455.0  
                                                                            
   horsepower        weight      acceleration        year           origin     
 Min.   : 46.0   Min.   :1613   Min.   : 8.00   Min.   :70.00   Min.   :1.000  
 1st Qu.: 75.0   1st Qu.:2225   1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000  
 Median : 93.5   Median :2804   Median :15.50   Median :76.00   Median :1.000  
 Mean   :104.5   Mean   :2978   Mean   :15.54   Mean   :75.98   Mean   :1.577  
 3rd Qu.:126.0   3rd Qu.:3615   3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000  
 Max.   :230.0   Max.   :5140   Max.   :24.80   Max.   :82.00   Max.   :3.000  
                                                                               
                 name    
 amc matador       :  5  
 ford pinto        :  5  
 toyota corolla    :  5  
 amc gremlin       :  4  
 amc hornet        :  4  
 chevrolet chevette:  4  
 (Other)           :365  

For qualitative variables such as name, R will list the number of observations that fall in each category. We can also produce a summary of just a single variable.

summary(Auto2$mpg)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   9.00   17.00   22.75   23.45   29.00   46.60 

1.9 Aggregating with dplyr

Consider producing summary statistics for the variable mpg when it is grouped by cylinders.

library(dplyr)
Auto2 %>%
  group_by(cylinders) %>%
  summarize(median(mpg), IQR(mpg), n())
# A tibble: 5 × 4
  cylinders `median(mpg)` `IQR(mpg)` `n()`
  <fct>             <dbl>      <dbl> <int>
1 3                  20.2       3.3      4
2 4                  28.4       7.95   199
3 5                  25.4       8.05     3
4 6                  19         3       83
5 8                  14         3      103

1.10 Automagic Generation of R Package References

Suppose the following R packages are used for a project: DT, ggplot2, ISLR, knitr, plotly, dplyr, rmarkdown, and bookdown.

  1. Create an object named PackagesUsed.
  2. Write the packages used to a *.bib file.
  3. Load the packages with lapply().
  4. Add a bibliography entry to the YAML.
  5. Cite the package using @R-packagename (look at the *.bib file for the exact name)
  6. Add a References section header (## References) at the very end of the document. The references will appear (provided they are cited) after the header.
PackagesUsed <- c("DT", "ggplot2", "ISLR", "knitr", "plotly", "dplyr", "rmarkdown", "bookdown")
# Write bib information
knitr::write_bib(PackagesUsed, file = "./PackagesUsed.bib")
# Load packages
lapply(PackagesUsed, library, character.only = TRUE)

Example YAML:

---
title: "Lab: Modified Introduction To R"
author: "Leave this field blank"
date: '`r format(Sys.time(), "%b %d, %Y")`'
bibliography: PackagesUsed.bib
output: 
  bookdown::html_document2: 
    highlight: textmate
    theme: yeti
---

This document uses DT by Xie, Cheng, and Tan (2021), ggplot2 by Wickham, Chang, et al. (2021), ISLR by James et al. (2021), plotly by Sievert et al. (2021), rmarkdown by Allaire et al. (2021), dplyr by Wickham, François, et al. (2021), knitr by Xie (2021b), and bookdown by Xie (2021a).

The previous line with citations was created using:

This document uses `DT` by @R-DT, `ggplot2` by @R-ggplot2, `ISLR` by @R-ISLR, `plotly` by @R-plotly, `rmarkdown` by @R-rmarkdown, `dplyr` by @R-dplyr, `knitr` by @R-knitr, and `bookdown` by @R-bookdown. 
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bookdown_0.24  rmarkdown_2.11 ISLR_1.4       dplyr_1.0.7    plotly_4.10.0 
[6] ggvis_0.4.7    ggplot2_3.3.5  DT_0.20        knitr_1.37    

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1  xfun_0.29         bslib_0.3.1       purrr_0.3.4      
 [5] colorspace_2.0-2  vctrs_0.3.8       generics_0.1.1    viridisLite_0.4.0
 [9] htmltools_0.5.2   yaml_2.2.2        utf8_1.2.2        rlang_1.0.0      
[13] later_1.3.0       jquerylib_0.1.4   pillar_1.6.5      glue_1.6.1       
[17] withr_2.4.3       DBI_1.1.2         jpeg_0.1-9        lifecycle_1.0.1  
[21] stringr_1.4.0     munsell_0.5.0     gtable_0.3.0      htmlwidgets_1.5.4
[25] evaluate_0.14     labeling_0.4.2    fastmap_1.1.0     httpuv_1.6.5     
[29] crosstalk_1.2.0   fansi_1.0.2       highr_0.9         Rcpp_1.0.8       
[33] xtable_1.8-4      promises_1.2.0.1  scales_1.1.1      jsonlite_1.7.1   
[37] mime_0.12         farver_2.1.0      digest_0.6.29     stringi_1.7.6    
[41] shiny_1.7.1       grid_3.6.0        cli_3.1.1         tools_3.6.0      
[45] magrittr_2.0.2    sass_0.4.0        lazyeval_0.2.2    tibble_3.1.6     
[49] tidyr_1.1.4       crayon_1.4.2      pkgconfig_2.0.3   ellipsis_0.3.2   
[53] data.table_1.14.2 httr_1.4.2        assertthat_0.2.1  rstudioapi_0.13  
[57] R6_2.5.1          compiler_3.6.0   

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. Rmarkdown: Dynamic Documents for r. https://CRAN.R-project.org/package=rmarkdown.
James, Gareth, Daniela Witten, Trevor Hastie, and Rob Tibshirani. 2021. ISLR: Data for an Introduction to Statistical Learning with Applications in r. https://www.statlearning.com.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2021. Plotly: Create Interactive Web Graphics via Plotly.js. https://CRAN.R-project.org/package=plotly.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Xie, Yihui. 2021a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.
———. 2021b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2021. DT: A Wrapper of the JavaScript Library DataTables. https://github.com/rstudio/DT.