# R code Chapter 7

This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.

Remember to install the package for the book with library(discovr). After which you can load data into a tibble by executing:

name_of_tib <- discovr::name_of_data


For example, execute these lines to create the tibbles referred to in the chapter:

cats_tib <- discovr::roaming_cats
exam_tib <- discovr::exam_anxiety
liar_tib <- discovr::biggest_liar


If you want to read the file from the CSV and you have set up your project folder as I suggest in Chapter 1, then the general format of the code you would use is:

tibble_name <- here::here("../data/name_of_file.csv") %>%
dplyr::mutate(
...
code to convert character variables to factors
...
)


In which you’d replace tibble_name with the name you want to assign to the tibble and change name_of_file.csv to the name of the file that you’re trying to load. You can use mutate to convert categorical variables to factors. For example, for the exam_anxiety data you would load it by executing:

exam_tib <- here::here("data/exam_anxiety.csv") %>%
dplyr::mutate(
sex = forcats::as_factor(sex)
)


## Bivariate correlations

Plot the data

GGally::ggscatmat(exam_tib, columns = c("exam_grade", "revise", "anxiety")) +
theme_minimal()


## Pearson’s r

A single correlation:

exam_tib %>%
correlation::correlation()

## Parameter1 | Parameter2 |    r |       95% CI |    t |  df |      p |  Method | n_Obs
## -------------------------------------------------------------------------------------
## exam_grade |     revise | 0.40 | [0.22, 0.55] | 4.34 | 101 | < .001 | Pearson |   103


Let’s scale this up to include exam anxiety:

exam_cor <- exam_tib %>%
correlation::correlation()

exam_cor

## Parameter1 | Parameter2 |     r |         95% CI |      t |  df |      p |  Method | n_Obs
## ------------------------------------------------------------------------------------------
## exam_grade |     revise |  0.40 | [ 0.22,  0.55] |   4.34 | 101 | < .001 | Pearson |   103
## exam_grade |    anxiety | -0.44 | [-0.58, -0.27] |  -4.94 | 101 | < .001 | Pearson |   103
## revise     |    anxiety | -0.71 | [-0.79, -0.60] | -10.11 | 101 | < .001 | Pearson |   103


### Change the number of decimal places

exam_tib %>%
correlation::correlation(digits = 3, ci_digits = 3)

## Parameter1 | Parameter2 |      r |           95% CI |       t |  df |      p |  Method | n_Obs
## ----------------------------------------------------------------------------------------------
## exam_grade |     revise |  0.397 | [ 0.220,  0.548] |   4.343 | 101 | < .001 | Pearson |   103
## exam_grade |    anxiety | -0.441 | [-0.585, -0.271] |  -4.938 | 101 | < .001 | Pearson |   103
## revise     |    anxiety | -0.709 | [-0.794, -0.598] | -10.111 | 101 | < .001 | Pearson |   103


### Matrix format

exam_tib %>%
correlation::correlation() %>%
summary()

## Parameter  |  anxiety |  revise
## -------------------------------
## exam_grade | -0.44*** | 0.40***
## revise     | -0.71*** |


### Coefficient of determination

exam_cor <- exam_tib %>%
correlation::correlation()

(exam_cor$r)^2  ## [1] 0.1573873 0.1944752 0.5030345  ## Robust correlations exam_tib %>% dplyr::select(exam_grade, revise, anxiety) %>% WRS2::winall()  ## Call: ## WRS2::winall(x = .) ## ## Robust correlation matrix: ## exam_grade revise anxiety ## exam_grade 1.0000 0.3085 -0.3913 ## revise 0.3085 1.0000 -0.6015 ## anxiety -0.3913 -0.6015 1.0000 ## ## p-values: ## exam_grade revise anxiety ## exam_grade NA 0.00183 7e-05 ## revise 0.00183 NA 0e+00 ## anxiety 0.00007 0.00000 NA  exam_tib %>% dplyr::select(exam_grade, revise, anxiety) %>% correlation::correlation(., method = "percentage")  ## Parameter1 | Parameter2 | r | 95% CI | t | df | p | Method | n_Obs ## ------------------------------------------------------------------------------------------------- ## exam_grade | revise | 0.34 | [ 0.15, 0.50] | 3.60 | 101 | < .001 | Percentage Bend | 103 ## exam_grade | anxiety | -0.40 | [-0.55, -0.23] | -4.41 | 101 | < .001 | Percentage Bend | 103 ## revise | anxiety | -0.61 | [-0.72, -0.47] | -7.66 | 101 | < .001 | Percentage Bend | 103  ## Spearman’s rho Load the Biggest Liar data from the CSV file (assuming you’ve set up a project as described in chapter 1) liar_tib <- readr::read_csv("../data/biggest_liar.csv") %>% dplyr::mutate( novice = forcats::as_factor(novice) )  liar_tib %>% dplyr::select(position, creativity) %>% correlation::correlation(method = "spearman")  ## Parameter1 | Parameter2 | rho | 95% CI | S | p | Method | n_Obs ## -------------------------------------------------------------------------------------- ## position | creativity | -0.38 | [-0.56, -0.15] | 72123.51 | 0.002 | Spearman | 68  ## Kendall’s tau liar_tib %>% dplyr::select(position, creativity) %>% correlation::correlation(method = "kendall")  ## Parameter1 | Parameter2 | 95% CI | tau | z | p | Method | n_Obs ## ---------------------------------------------------------------------------------- ## position | creativity | [-0.50, -0.07] | -0.30 | -3.24 | 0.001 | Kendall | 68  ### Bootstrapped confidence intervals A quick aside, let’s look at the code for writing a function to print the mean: print_mean <- function(variable){ mean <- sum(variable)/nrow(variable) cat("Mean = ", mean) } exam_tib %>% dplyr::select(exam_grade) %>% print_mean()  ## Mean = 56.57282  Or a sillier version: print_mean <- function(harry_the_hungy_hippo){ mean <- sum(harry_the_hungy_hippo)/ nrow(harry_the_hungy_hippo) cat("Harry the hungry hippo say that the mean = ", mean) } exam_tib %>% dplyr::select(exam_grade) %>% print_mean()  ## Harry the hungry hippo say that the mean = 56.57282  Here’s a function to bootstrap Pearson’s r: boot_r <- function(data, i){ cor(data[i, "exam_grade"], data[i, "revise"]) } grade_revise_bs <- boot::boot(exam_tib, boot_r, R = 2000) grade_revise_bs  ## ## ORDINARY NONPARAMETRIC BOOTSTRAP ## ## ## Call: ## boot::boot(data = exam_tib, statistic = boot_r, R = 2000) ## ## ## Bootstrap Statistics : ## original bias std. error ## t1* 0.3967207 -0.001476077 0.06983568  boot::boot.ci(grade_revise_bs)  ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS ## Based on 2000 bootstrap replicates ## ## CALL : ## boot::boot.ci(boot.out = grade_revise_bs) ## ## Intervals : ## Level Normal Basic ## 95% ( 0.2613, 0.5351 ) ( 0.2673, 0.5424 ) ## ## Level Percentile BCa ## 95% ( 0.2510, 0.5261 ) ( 0.2530, 0.5263 ) ## Calculations and Intervals on Original Scale  ### A general bootstrap function boot_r <- function(data, var1, var2, i){ cor(data[i, var1], data[i, var2]) } grade_revise_bs <- boot::boot(exam_tib, boot_r, var1 = "exam_grade", var2 = "revise", R = 2000) grade_revise_bs  ## ## ORDINARY NONPARAMETRIC BOOTSTRAP ## ## ## Call: ## boot::boot(data = exam_tib, statistic = boot_r, R = 2000, var1 = "exam_grade", ## var2 = "revise") ## ## ## Bootstrap Statistics : ## original bias std. error ## t1* 0.3967207 -0.001593843 0.06788372  boot::boot.ci(grade_revise_bs)  ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS ## Based on 2000 bootstrap replicates ## ## CALL : ## boot::boot.ci(boot.out = grade_revise_bs) ## ## Intervals : ## Level Normal Basic ## 95% ( 0.2653, 0.5314 ) ( 0.2732, 0.5405 ) ## ## Level Percentile BCa ## 95% ( 0.2529, 0.5202 ) ( 0.2518, 0.5198 ) ## Calculations and Intervals on Original Scale  grade_anx_bs <- boot::boot(exam_tib, boot_r, var1 = "exam_grade", var2 = "anxiety", R = 2000) grade_anx_bs  ## ## ORDINARY NONPARAMETRIC BOOTSTRAP ## ## ## Call: ## boot::boot(data = exam_tib, statistic = boot_r, R = 2000, var1 = "exam_grade", ## var2 = "anxiety") ## ## ## Bootstrap Statistics : ## original bias std. error ## t1* -0.4409934 0.002270827 0.0644994  boot::boot.ci(grade_anx_bs)  ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS ## Based on 2000 bootstrap replicates ## ## CALL : ## boot::boot.ci(boot.out = grade_anx_bs) ## ## Intervals : ## Level Normal Basic ## 95% (-0.5697, -0.3168 ) (-0.5767, -0.3235 ) ## ## Level Percentile BCa ## 95% (-0.5585, -0.3053 ) (-0.5609, -0.3088 ) ## Calculations and Intervals on Original Scale  revise_anx_bs <- boot::boot(exam_tib, boot_r, var1 = "revise", var2 = "anxiety", R = 2000) revise_anx_bs  ## ## ORDINARY NONPARAMETRIC BOOTSTRAP ## ## ## Call: ## boot::boot(data = exam_tib, statistic = boot_r, R = 2000, var1 = "revise", ## var2 = "anxiety") ## ## ## Bootstrap Statistics : ## original bias std. error ## t1* -0.7092493 0.001022998 0.108373  boot::boot.ci(revise_anx_bs)  ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS ## Based on 2000 bootstrap replicates ## ## CALL : ## boot::boot.ci(boot.out = revise_anx_bs) ## ## Intervals : ## Level Normal Basic ## 95% (-0.9227, -0.4979 ) (-0.9525, -0.5483 ) ## ## Level Percentile BCa ## 95% (-0.8702, -0.4660 ) (-0.8530, -0.3760 ) ## Calculations and Intervals on Original Scale  ### Bootrapping Spearman’s rho and Kendall’s tau boot_r <- function(data, var1, var2, method = "pearson", i){ cor(data[i, var1], data[i, var2], method = method) } liar_bs <- boot::boot(liar_tib, boot_r, var1 = "position", var2 = "creativity", method = "spearman", R = 2000) boot::boot.ci(liar_bs)  ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS ## Based on 2000 bootstrap replicates ## ## CALL : ## boot::boot.ci(boot.out = liar_bs) ## ## Intervals : ## Level Normal Basic ## 95% (-0.6204, -0.1376 ) (-0.6294, -0.1555 ) ## ## Level Percentile BCa ## 95% (-0.5976, -0.1238 ) (-0.5818, -0.1042 ) ## Calculations and Intervals on Original Scale  ## Point-biserial correlation If you need to read the data in from the csv file: cats_tib <- readr::read_csv("../data/roaming_cats.csv") %>% dplyr::mutate( sex = forcats::as_factor(sex) )  cats_tib <- cats_tib %>% dplyr::mutate( sex_bin = ifelse(sex == "Male", 0, 1), sex_bin_recode = ifelse(sex == "Male", 1, 0) )  cats_tib %>% dplyr::select(time, sex_bin) %>% correlation::correlation()  ## Parameter1 | Parameter2 | r | 95% CI | t | df | p | Method | n_Obs ## --------------------------------------------------------------------------------------- ## time | sex_bin | -0.38 | [-0.58, -0.14] | -3.11 | 58 | 0.003 | Pearson | 60  coefficient of determination r_cat <- cats_tib %>% dplyr::select(time, sex_bin) %>% correlation::correlation() (r_cat$r)^2

## [1] 0.1432276

cats_tib %>%
dplyr::select(time, sex_bin_recode) %>%
correlation::correlation()

## Parameter1 |     Parameter2 |    r |       95% CI |    t | df |     p |  Method | n_Obs
## ---------------------------------------------------------------------------------------
## time       | sex_bin_recode | 0.38 | [0.14, 0.58] | 3.11 | 58 | 0.003 | Pearson |    60

cats_tib %>%
dplyr::select(time, sex_bin) %>%
correlation::correlation(method = "biserial")

## Parameter1 | Parameter2 |   rho |         95% CI |     t | df |      p |   Method | n_Obs
## -----------------------------------------------------------------------------------------
## time       |    sex_bin | -0.47 | [-0.65, -0.25] | -4.03 | 58 | < .001 | Biserial |    60


## Partial correlations

exam_tib %>%
correlation::correlation(., partial = TRUE)

## Parameter1 | Parameter2 |     r |         95% CI |     t |  df |      p |  Method | n_Obs
## -----------------------------------------------------------------------------------------
## exam_grade |     revise |  0.13 | [-0.06,  0.32] |  1.35 | 101 | 0.182  | Pearson |   103
## exam_grade |    anxiety | -0.25 | [-0.42, -0.06] | -2.56 | 101 | 0.024  | Pearson |   103
## revise     |    anxiety | -0.65 | [-0.75, -0.52] | -8.56 | 101 | < .001 | Pearson |   103


## Comparing rs

The correlation between exam grade and anxiety for self-identifying males:

exam_tib %>%
dplyr::filter(sex == "Male") %>%
correlation::correlation()

## Parameter1 | Parameter2 |     r |         95% CI |     t | df |      p |  Method | n_Obs
## ----------------------------------------------------------------------------------------
## exam_grade |    anxiety | -0.51 | [-0.68, -0.27] | -4.14 | 50 | < .001 | Pearson |    52


The correlation between exam grade and anxiety for self-identifying females:

exam_tib %>%
dplyr::filter(sex == "Female") %>%
correlation::correlation()

## Parameter1 | Parameter2 |     r |         95% CI |     t | df |     p |  Method | n_Obs
## ---------------------------------------------------------------------------------------
## exam_grade |    anxiety | -0.38 | [-0.59, -0.12] | -2.89 | 49 | 0.006 | Pearson |    51

exam_male <- exam_tib %>%
dplyr::filter(sex == "Male")

exam_female <- exam_tib %>%
dplyr::filter(sex == "Female")

WRS2::twopcor(x1 = exam_male$exam_grade, y1 = exam_male$anxiety, x2 = exam_female$exam_grade, y2 = exam_female$anxiety)

## Call:
## WRS2::twopcor(x1 = exam_male$exam_grade, y1 = exam_male$anxiety,
##     x2 = exam_female$exam_grade, y2 = exam_female$anxiety)
##
## First correlation coefficient: -0.5057
## Second correlation coefficient: -0.3814
## Confidence interval (difference): -0.4574 0.1315

Previous
Next