# Smart Alex solutions Chapter 9 This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.

Is arachnophobia (fear of spiders) specific to real spiders or will pictures of spiders evoke similar levels of anxiety? Twelve arachnophobes were asked to play with a big hairy tarantula with big fangs and an evil look in its eight eyes and at a different point in time were shown only photos of the same spider. The participants’ anxiety was measured in each case. Do a t-test to see whether anxiety is higher for real spiders than pictures (big_hairy_spider.csv).

spider_tib <- readr::read_csv("../data/big_hairy_spider.csv") %>%
dplyr::mutate(
spider_type = forcats::as_factor(spider_type)
)


Alternative, load the data directly from the discovr package:

spider_tib <- discovr::big_hairy_spider


### Summary statistics

spider_summary <- spider_tib %>%
dplyr::group_by(spider_type) %>%
dplyr::summarize(
mean = mean(anxiety),
sd = sd(anxiety),
ci_lower = ggplot2::mean_cl_normal(anxiety)$ymin, ci_upper = ggplot2::mean_cl_normal(anxiety)$ymax
)
spider_summary


Table: Table 1: Summary statistics

spider_typemeansdci_lowerci_upper
Picture409.2934.1045.90
Real4711.0339.9954.01

### Fit the model

We need to run a paired samples t-test on these data. We need to make sure we sort the file by id so that the pairing is done correctly.

spider_mod <- spider_tib %>%
dplyr::arrange(id) %>%
t.test(anxiety ~ spider_type, data = ., paired = TRUE)

spider_mod

##
## 	Paired t-test
##
## data:  anxiety by spider_type
## t = -2.4725, df = 11, p-value = 0.03098
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.2312185  -0.7687815
## sample estimates:
## mean of the differences
##                      -7


### Effect size

effectsize::cohens_d(anxiety ~ spider_type, data = spider_tib)

## Cohen's d |        95% CI
## -------------------------
##     -0.69 | [-1.50, 0.15]


### Reporting the analysis

We could report the result as:

• On average, participants experienced significantly greater anxiety with real spiders (M = 47.00, SE = 3.18) than with pictures of spiders (M = 40.00, SE = 2.68), t(11) = -2.47, p = 0.031, d = -0.69, 0.95, -1.5, 0.15.

Task 2: Estimate the Bayes factor for the difference between anxiety levels induced by a real spider compared to a photo

First get the data messy:

spider_messy_tib <- spider_tib %>%
tidyr::spread(value = anxiety, key = spider_type)

idPictureReal
Aayaat4055
Abdul Wahaab4550
Bepe5065
Blas2535
Cole3030
Colton3040
Dustin4550
Faraj5039
Fred4060
Jordan5550
Naasif3535
Saajida3555

Now get the Bayes factor

spider_bf <- BayesFactor::ttestBF(spider_messy_tib$Picture, spider_messy_tib$Real, paired = TRUE, rscale = "medium")
spider_bf

## Bayes factor analysis
## --------------
##  Alt., r=0.707 : 2.400782 ±0%
##
## Against denominator:
##   Null, mu = 0
## ---
## Bayes factor type: BFoneSample, JZS

BayesFactor::posterior(spider_bf, iterations = 1000) %>%
summary()

##
## Iterations = 1:1000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
##
##           Mean      SD Naive SE Time-series SE
## mu     -5.9255  2.9385 0.092925        0.10640
## sig2  114.5136 55.1995 1.745560        1.90699
## delta  -0.5959  0.3091 0.009774        0.01137
## g       4.8136 45.1384 1.427402        1.42740
##
## 2. Quantiles for each variable:
##
##            2.5%     25%      50%      75%      97.5%
## mu    -11.49773 -7.8859  -6.0097  -3.9913  -0.098881
## sig2   48.92183 78.2696 100.4749 136.8231 258.468777
## delta  -1.21297 -0.8052  -0.5941  -0.3792  -0.009222
## g       0.09906  0.2945   0.6395   1.5788  13.926521


The Bayes factor, $\mathrm{BF}_{10}$ = 2.40, suggested that the data were 2.4 times more probable under the alternative hypothesis than under the null.

‘Pop psychology’ books sometimes spout nonsense that is unsubstantiated by science. I took 20 people in relationships and randomly assigned them to one of two groups. One group read the famous popular psychology book Women are from Bras and men are from Penis, and the other read Marie Claire. The outcome variable was their relationship happiness after their assigned reading. Were people happier with their relationship after reading the pop psychology book? (self_help.csv).

self_help_tib <- readr::read_csv("../data/self_help.csv") %>%
dplyr::mutate(
book = forcats::as_factor(book)
)


Alternative, load the data directly from the discovr package:

self_help_tib <- discovr::self_help


### Summary statistics

self_summary <- self_help_tib %>%
dplyr::group_by(book) %>%
dplyr::summarize(
mean = mean(happy),
sd = sd(happy),
ci_lower = ggplot2::mean_cl_normal(happy)$ymin, ci_upper = ggplot2::mean_cl_normal(happy)$ymax
)
self_summary


Table: Table 2: Summary statistics

bookmeansdci_lowerci_upper
Women are from Bras, Men are from Penis20.04.1117.0622.94
Marie Claire24.24.7120.8327.57

### Fit the model

self_help_mod <- t.test(happy ~ book, data = self_help_tib)
self_help_mod

##
## 	Welch Two Sample t-test
##
## data:  happy by book
## t = -2.1249, df = 17.676, p-value = 0.04796
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.35799532 -0.04200468
## sample estimates:
## mean in group Women are from Bras, Men are from Penis
##                                                  20.0
##                            mean in group Marie Claire
##                                                  24.2


We can estimate Cohen’s $d$ as follows:

d_self_help <- effectsize::cohens_d(happy ~ book, data = self_help_tib)
d_self_help

## Cohen's d |         95% CI
## --------------------------
##     -0.95 | [-1.87, -0.01]


This means that reading the self-help book reduced relationship happiness by about one standard deviation, which is a fairly big effect.

### Report the findings

We could report this result as:

• On average, the reported relationship happiness after reading Marie Claire (M = 24.20, SE = 1.49), was significantly higher than after reading Women are from bras and men are from penis (M = 20.00, SE = 1.30), * t(17.68) = -2.12, p = 0.048, d = -0.95, 0.95, -1.87, -0.01.

Twaddle and Sons, the publishers of Women are from Bras and men are from Penis, were upset about my claims that their book was as useful as a paper umbrella. They ran their own experiment (N = 500) in which relationship happiness was measured after participants had read their book and after reading the book you are currently reading. (Participants read the books in counterbalanced order with a six-month delay.) Was relationship happiness greater after reading their wonderful contribution to pop psychology than after reading my tedious tome about statistics? (self_help_dsur.csv).

dsur_tib <- readr::read_csv("../data/self_help_dsur.csv") %>%
dplyr::mutate(
book = forcats::as_factor(book)
)


Alternative, load the data directly from the discovr package:

dsur_tib <- discovr::self_help_dsur


### Summary statistics

dsur_summary <- dsur_tib %>%
dplyr::group_by(book) %>%
dplyr::summarize(
mean = mean(happiness),
sd = sd(happiness),
ci_lower = ggplot2::mean_cl_normal(happiness)$ymin, ci_upper = ggplot2::mean_cl_normal(happiness)$ymax
)
dsur_summary


Table: Table 3: Summary statistics

bookmeansdci_lowerci_upper
Self-help book20.029.9819.1420.90
Discovering statistics using R18.498.9917.7019.28

### Fit the model

We need to run a paired samples t-test on these data. We need to make sure we sort the file by id so that the pairing is done correctly.

dsur_mod <- dsur_tib %>%
dplyr::arrange(id) %>%
t.test(happiness ~ book, data = ., paired = TRUE)

dsur_mod

##
## 	Paired t-test
##
## data:  happiness by book
## t = 2.7056, df = 499, p-value = 0.00705
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.41843 2.63757
## sample estimates:
## mean of the differences
##                   1.528


### Effect size

d_dsur <- effectsize::cohens_d(happiness ~ book, data = dsur_tib)


### Reporting the analysis

Therefore, although this effect is highly statistically significant, the size of the effect is small. In this example, it would be tempting for Twaddle and Sons to conclude that their book produced significantly greater relationship happiness than our book. In fact, many researchers would write conclusions like this:

• On average, the reported relationship happiness after reading Field and Hole (2003) (M = 18.49, SE = 0.402), was significantly higher than after reading Women are from bras and men are from penis (M = 20.02, SE = 0.446), t(499) = 2.71, p = 0.007. In other words, reading Women are from bras and men are from penis produces significantly greater relationship happiness than that book by smelly old Field.

However, to reach such a conclusion is to confuse statistical significance with the importance of the effect. By calculating the effect size we’ve discovered that although the difference in happiness after reading the two books is statistically different, the size of effect that this represents is small. A more balanced interpretation might, therefore, be:

• On average, the reported relationship happiness after reading Field and Hole (2003) (M = 18.49, SE = 0.402), was significantly higher than after reading Women are from bras and men are from penis (M = 20.02, SE = 0.446), t(499) = 2.71, p = 0.007, d = 0.16, 0.95, 0.04, 0.28. However, the effect size was small, revealing that this finding was not substantial in real terms.

Of course, this latter interpretation would be unpopular with Twaddle and Sons who would like to believe that their book had a huge effect on relationship happiness.

In Chapter 1 (Task 6) we looked at data from people who had been forced to marry goats and dogs and measured their life satisfaction as well as how much they like animals (goat_or_dog.csv). Conduct a t-test to see whether life satisfaction depends upon the type of animal to which a person was married.

goat_tib <- readr::read_csv("../data/goat_or_dog.csv") %>%
dplyr::mutate(
wife = forcats::as_factor(wife)
)


Alternative, load the data directly from the discovr package:

goat_tib <- discovr::animal_bride


### Summary statistics

goat_summary <- goat_tib %>%
dplyr::group_by(wife) %>%
dplyr::summarize(
mean = mean(life_satisfaction),
sd = sd(life_satisfaction),
ci_lower = ggplot2::mean_cl_normal(life_satisfaction)$ymin, ci_upper = ggplot2::mean_cl_normal(life_satisfaction)$ymax
)
goat_summary


Table: Table 4: Summary statistics

wifemeansdci_lowerci_upper
Goat38.1715.5128.3148.02
Dog60.1211.1050.8469.41

### Fit the model

goat_mod <- t.test(life_satisfaction ~ wife, data = goat_tib)
goat_mod

##
## 	Welch Two Sample t-test
##
## data:  life_satisfaction by wife
## t = -3.6879, df = 17.843, p-value = 0.001704
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -34.475353  -9.441314
## sample estimates:
## mean in group Goat  mean in group Dog
##           38.16667           60.12500


We can estimate Cohen’s $d$ as follows:

d_goat <- effectsize::cohens_d(life_satisfaction ~ wife, data = goat_tib)
d_goat

## Cohen's d |         95% CI
## --------------------------
##     -1.57 | [-2.59, -0.53]


### Report the findings

As well as being statistically significant, this effect is very large and so represents a substantive finding. We could report:

• On average, the life satisfaction of men married to dogs (M = 60.13, SE = 3.93) was significantly higher than that of men who were married to goats (M = 38.17, SE = 4.48), t(17.84) = -3.69, p = 0.002, d = -1.57, 0.95, -2.59, -0.53.

Repeat the t-test from Task 5 but turn of Welch’s correction. Then fit a linear model to see whether life satisfaction is significantly predicted from the type of animal that was married. What do you notice about the t-value and significance in these two models?

Fit the t-test without Welch’s correction:

goat_mod <- t.test(life_satisfaction ~ wife, data = goat_tib, var.equal = TRUE)
goat_mod

##
## 	Two Sample t-test
##
## data:  life_satisfaction by wife
## t = -3.4458, df = 18, p-value = 0.002883
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -35.346354  -8.570312
## sample estimates:
## mean in group Goat  mean in group Dog
##           38.16667           60.12500


Now the linear model:

goat_lm <- lm(life_satisfaction ~ wife, data = goat_tib)
summary(goat_lm)

##
## Call:
## lm(formula = life_satisfaction ~ wife, data = goat_tib)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -32.167  -5.906   4.354   8.833  17.833
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   38.167      4.030   9.470 2.05e-08 ***
## wifeDog       21.958      6.372   3.446  0.00288 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.96 on 18 degrees of freedom
## Multiple R-squared:  0.3975,	Adjusted R-squared:  0.364
## F-statistic: 11.87 on 1 and 18 DF,  p-value: 0.002883


The values of t and p are the same. (Technically, t is different because for the linear model it is a positive value and for the t-test it is negative However, the sign of t merely reflects which way around you coded the dog and goat groups. The linear model, by default, has coded the groups the opposite way around to the t-test.) The main point I wanted to make here is that whether you run these data through the lm() or t.test() functions, the results are identical.

In Chapter 6 we looked at hygiene scores over three days of a rock music festival (download_festival.csv). Do a paired-samples t-test to see whether hygiene scores on day 1 differed from those on day 3.

download_tib <- readr::read_csv("../data/download_festival.csv") %>%
dplyr::mutate(
gender = forcats::as_factor(gender)
)


Alternative, load the data directly from the discovr package:

download_tib <- discovr::download


### Summary statistics

are missing values for days 2 and 3 so we need to remember to include na.rm = TRUE in the functions for the mean, sd and confidence interval.

It’s also useful if we convert the data to tidy format:

download_tidy_tib <- download_tib %>%
tidyr::pivot_longer(
cols = day_1:day_3,
names_to = "day",
values_to = "hygiene"
)

download_summary <- download_tidy_tib %>%
dplyr::group_by(day) %>%
dplyr::summarize(
mean = mean(hygiene, na.rm = TRUE),
sd = sd(hygiene, na.rm = TRUE),
ci_lower = ggplot2::mean_cl_normal(hygiene)$ymin, ci_upper = ggplot2::mean_cl_normal(hygiene)$ymax
)



Table: Table 5: Summary statistics

daymeansdci_lowerci_upper
day_11.790.941.731.86
day_20.960.720.871.05
day_30.980.710.851.10

### Fit the model

We need to run a paired samples t-test on these data. We need to ignore the day 2 scores because we want to compare days 1 and 3 only, and to add to our woe we have missing values for days 2 and 3, so we need to deal with those (the t.test() function will fail if there are NAs. In the book we looked only at inputting variables into t.test() in a formula:

t.test(outcome ~ predictor, data = tibble)

This method works well for tidy data (hence why I use it in the book). However, there is an alterantive way to use the function where you input the sets of scores for your two groups as separate variables:

t.test(x = first_set_of_scores, y = second_set_of_scores)

We have used this sort of input for other functions in the chapter. This input comes in handy when you have missing values and paired scores because if we have the day_1 and day_3 scores in separate columns (i.e. in messy format), then we can place these variables into the t.test() function and include na.action = "na.exclude" and the function will ignore any participant who doesn’t have scores for both days. So, we could execute:

download_mod <- t.test(download_tib$day_1, download_tib$day_3, paired = TRUE, na.action = "na.exclude")

##
## 	Paired t-test
##
## data:  download_tib$day_1 and download_tib$day_3
## t = 10.587, df = 122, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.5487477 0.8011710
## sample estimates:
## mean of the differences
##               0.6749593


### Effect size

To calculate $d$ we first need to filter out the day two data from the original tibble using dplyr::filter(), within which we include day != "day_2" which translates as the variable day is not equal to ‘day_2’.

d_download <- download_tidy_tib %>%
dplyr::filter(day != "day_2") %>%
effectsize::cohens_d(hygiene ~ day, data = .)


## Cohen's d |       95% CI
## ------------------------
##      0.89 | [0.79, 0.99]


### Reporting the analysis

We could report:

• On average, hygiene scores significantly decreased from day 1 (M = 1.65, SE = 0.06), to day 3 (M = 0.98, SE = 0.06) of the Download music festival, t(122) = 10.59, p = 0, d = 0.89, 0.95, 0.79, 0.99.

A psychologist was interested in the cross-species differences between men and dogs. She observed a group of dogs and a group of men in a naturalistic setting (20 of each). She classified several behaviours as being dog-like (urinating against trees and lampposts, attempts to copulate with anything that moved, and attempts to lick their own genitals). For each man and dog she counted the number of dog-like behaviours displayed in a 24-hour period. It was hypothesized that dogs would display more dog-like behaviours than men. Test this hypothesis using a robust test. (men_dogs.csv)

dogs_tib <- readr::read_csv("../data/men_dogs.csv") %>%
dplyr::mutate(
species = forcats::as_factor(species)
)


Alternative, load the data directly from the discovr package:

dogs_tib <- discovr::men_dogs


### Summary statistics

There are missing values for days 2 and 3 so we need to remember to include na.rm = TRUE in the functions for the mean, sd and confidence interval.

dogs_summary <- dogs_tib %>%
dplyr::group_by(species) %>%
dplyr::summarize(
mean = mean(behaviour),
sd = sd(behaviour),
ci_lower = ggplot2::mean_cl_normal(behaviour)$ymin, ci_upper = ggplot2::mean_cl_normal(behaviour)$ymax
)

dogs_summary


Table: Table 6: Summary statistics

speciesmeansdci_lowerci_upper
Dog28.0510.9822.9133.19
Man26.859.9022.2231.48

### Fit the model

Fit the model

dog_mod <- t.test(behaviour ~ species, data = dogs_tib)
dog_mod

##
## 	Welch Two Sample t-test
##
## data:  behaviour by species
## t = 0.36297, df = 37.6, p-value = 0.7187
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.495178  7.895178
## sample estimates:
## mean in group Dog mean in group Man
##             28.05             26.85


We can estimate Cohen’s $d$ as follows:

d_dog <- effectsize::cohens_d(behaviour ~ species, data = dogs_tib)
d_dog

## Cohen's d |        95% CI
## -------------------------
##      0.11 | [-0.51, 0.73]


### Report the findings

The effect is not statistically significant, and is a small effect. We could report:

• On average, men (M = 26.85, SE = 2.23) engaged in less dog-like behaviour than dogs (M = 28.05, SE = 2.37). However, this difference, 1.2, 95% CI [-5.50 to 7.90], was not significant, t(37.6) = 0.36, p = 0.719, d = 0.11, 0.95, -0.51, 0.73.

Both Ozzy Osbourne and Judas Priest have been accused of putting backward masked messages on their albums that subliminally influence poor unsuspecting teenagers into doing things like blowing their heads off with shotguns. A psychologist was interested in whether backward masked messages could have an effect. He created a version of Britney Spears’ ‘Baby one more time’ that contained the masked message ‘deliver your soul to the dark lord’ repeated in the chorus. He took this version, and the original, and played one version (randomly) to a group of 32 people. Six months later he played them whatever version they hadn’t heard the time before. So, each person heard both the original and the version with the masked message, but at different points in time. The psychologist measured the number of satanic intrusions the person had in the week after listening to each version. Test the hypothesis that the backward message would lead to more intrusions (dark_lord.csv) using a robust test.

dark_tib <- readr::read_csv("../data/dark_lord.csv") %>%
dplyr::mutate(
message = forcats::as_factor(message)
)


Alternative, load the data directly from the discovr package:

dark_tib <- discovr::dark_lord


### Summary statistics

There are missing values for days 2 and 3 so we need to remember to include na.rm = TRUE in the functions for the mean, sd and confidence interval.

dark_summary <- dark_tib %>%
dplyr::group_by(message) %>%
dplyr::summarize(
mean = mean(intrusions),
sd = sd(intrusions),
ci_lower = ggplot2::mean_cl_normal(intrusions)$ymin, ci_upper = ggplot2::mean_cl_normal(intrusions)$ymax
)

dark_summary

## # A tibble: 2 x 5
##   message     mean    sd ci_lower ci_upper
##   <fct>      <dbl> <dbl>    <dbl>    <dbl>
## 1 Message     9.16  3.55     7.88     10.4
## 2 No message 11.5   4.38     9.92     13.1


Table: Table 7: Summary statistics

messagemeansdci_lowerci_upper
Message9.163.557.8810.44
No message11.504.389.9213.08

### Fit the model

We need to run a paired samples t-test on these data. We need to make sure we sort the file by id so that the pairing is done correctly.

dark_messy_tib <- dark_tib %>%
tidyr::spread(value = intrusions, key = message)

dark_mod <- WRS2::yuend(dark_messy_tib$Message, dark_messy_tib$No message)
dark_mod

## Call:
## WRS2::yuend(x = dark_messy_tib$Message, y = dark_messy_tib$No message)
##
## Test statistic: -1.9638 (df = 19), p-value = 0.06435
##
## Trimmed mean difference:  -1.55
## 95 percent confidence interval:
## -3.202     0.102
##
## Explanatory measure of effect size: 0.36


### Effect size

d_dark <- effectsize::cohens_d(intrusions ~ message, data = dark_tib)


### Reporting the analysis

• Participants had fewer Satanic intrusions after hearing the backward message (M = 9.16, SE = 0.62), than after hearing the normal version of the Britney song (M = 11.50, SE = 0.80). This difference, -1.55, 95% CI [-3.2, 0.1], was not significant, t(19) = -1.96, p = 0.064. However, it represented a substantial effect, d = -0.59, 0.95, -1.09, -0.08.

Test whether the number of offers was significantly different in people listening to Bon Scott than in those listening to Brian Johnson, using a robust independent t-test with bootstrapping. Do your results differ from Oxoby (2008)? (acdc.csv)

oxoby_tib <- readr::read_csv("../data/acdc.csv") %>%
dplyr::mutate(
singer = forcats::as_factor(singer)
)


Alternative, load the data directly from the discovr package:

oxoby_tib <- discovr::acdc


### Summary statistics

oxoby_summary <- oxoby_tib %>%
dplyr::group_by(singer) %>%
dplyr::summarize(
mean = mean(offer),
sd = sd(offer),
ci_lower = ggplot2::mean_cl_normal(offer)$ymin, ci_upper = ggplot2::mean_cl_normal(offer)$ymax
)

oxoby_summary


Table: Table 8: Summary statistics

singermeansdci_lowerci_upper
Bon Scott3.281.182.693.86
Brian Johnson4.000.973.524.48

### Fit the model

Fit the model

acdc_mod <- WRS2::yuenbt(offer ~ singer, data = oxoby_tib, nboot = 1000, side = TRUE)
acdc_mod

## Call:
## WRS2::yuenbt(formula = offer ~ singer, data = oxoby_tib, nboot = 1000,
##     side = TRUE)
##
## Test statistic: -1.7339 (df = NA), p-value = 0.066
##
## Trimmed mean difference:  -0.83333
## 95 percent confidence interval:
## -1.7316     0.0649


The bootstrap confidence interval ranged from -1.73 to 0.06, which just about crosses zero suggesting that (if we assume that it is one of the 95% of confidence intervals that contain the true value) that the effect in the population could be zero.

We can estimate Cohen’s $d$ as follows:

d_acdc <- effectsize::cohens_d(offer ~ singer, data = oxoby_tib)
d_acdc

## Cohen's d |        95% CI
## -------------------------
##     -0.67 | [-1.34, 0.01]


### Report the findings

We could report:

• On average, more offers were made when listening to Brian Johnson (M = 4.00, SE = 0.23) than Bon Scott (M = 3.28, SE = 0.28). This difference, -0.72, BCa 95% CI [-1.73, 0.06], was not significant, t = -1.73, p = 0.066; however, it produced a medium effect, d = -0.67, 0.95, -1.34, 0.01.