Smart Alex

Alex was aptly named because she’s, like, super smart. She likes teaching people, and her hobby is posing people questions so that she can explain the answers to them. Alex appears at the end of each chapter of Discovering Statistics Using R and RStudio (2nd edition) to pose you some questions and give you tasks to help you to practice your data analysis skills. This page contains her answers to those questions.
This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.
General tips
When attempting these tasks, remember to use a setup code chunk at the start of your Quarto document (below the yaml) and load any packages you need using library(). In general, most tasks are likely to use the tidyverse and easystats packages so as a bare minimum your setup code chunk should contain
Many tasks will require you to load data. You can do this in one of two ways. The easiest is to grab the data directly from the discovr package using the general code:
my_tib <- discovr::name_of_datain which you replace my_tib with the name you wish to give the data once loaded and name_of_data is the data you wish to load. In RStudio if you type discovr:: it will list the datasets available. For example, to load the data called students into a tibble called students_tib we’d execute:
students_tib <- discovr::studentsAlternatively, download the CSV file from https://www.discovr.rocks/csv/name_of_file replacing name_of_file with the name of the csv file. For example, to download students.csv use this url: https://www.discovr.rocks/csv/students.csv
Set up an RStudio project (exactly as described in Chapter 4), place the csv file in your data folder and use this general code:
in which you replace my_tib with the name you wish to give the data once loaded and name_of_file.csv is the name of the file you wish to read into here to locate the file within the project and read_csv() to read it into students.csv you would execute
Chapter 1
Task 1.1
What are (broadly speaking) the five stages of the research process?
- Generating a research question: through an initial observation (hopefully backed up by some data).
- Generate a theory to explain your initial observation.
- Generate hypotheses: break your theory down into a set of testable predictions.
- Collect data to test the theory: decide on what variables you need to measure to test your predictions and how best to measure or manipulate those variables.
- Analyse the data: look at the data visually and by fitting a statistical model to see if it supports your predictions (and therefore your theory). At this point you should return to your theory and revise it if necessary.
Task 1.2
What is the fundamental difference between experimental and correlational research?
In a word, causality. In experimental research we manipulate a variable (predictor, independent variable) to see what effect it has on another variable (outcome, dependent variable). This manipulation, if done properly, allows us to compare situations where the causal factor is present to situations where it is absent. Therefore, if there are differences between these situations, we can attribute cause to the variable that we manipulated. In correlational research, we measure things that naturally occur and so we cannot attribute cause but instead look at natural covariation between variables.
Task 1.3
What is the level of measurement of the following variables?
- The number of downloads of different bands’ songs on iTunes:
- This is a discrete ratio measure. It is discrete because you can download only whole songs, and it is ratio because it has a true and meaningful zero (no downloads at all).
- The names of the bands downloaded.
- This is a nominal variable. Bands can be identified by their name, but the names have no meaningful order. The fact that Norwegian black metal band 1349 called themselves 1349 does not make them better than British boy-band has-beens 911; the fact that 911 were a bunch of talentless idiots does, though.
- Their positions in the download chart.
- This is an ordinal variable. We know that the band at number 1 sold more than the band at number 2 or 3 (and so on) but we don’t know how many more downloads they had. So, this variable tells us the order of magnitude of downloads, but doesn’t tell us how many downloads there actually were.
- The money earned by the bands from the downloads.
- This variable is continuous and ratio. It is continuous because money (pounds, dollars, euros or whatever) can be broken down into very small amounts (you can earn fractions of euros even though there may not be an actual coin to represent these fractions).
- The weight of drugs bought by the band with their royalties.
- This variable is continuous and ratio. If the drummer buys 100 g of cocaine and the singer buys 1 kg, then the singer has 10 times as much.
- The type of drugs bought by the band with their royalties.
- This variable is categorical and nominal: the name of the drug tells us something meaningful (crack, cannabis, amphetamine, etc.) but has no meaningful order.
- The phone numbers that the bands obtained because of their fame.
- This variable is categorical and nominal too: the phone numbers have no meaningful order; they might as well be letters. A bigger phone number did not mean that it was given by a better person.
- The gender of the people giving the bands their phone numbers.
- This variable is categorical: the people dishing out their phone numbers could fall into one of several categories based on how they self-identify when asked about their gender (their gender identity could be fluid). Taking a very simplistic view of gender, the variable might contain categories of male, female, and non-binary.
- This variable is categorical: the people dishing out their phone numbers could fall into one of several categories based on how they self-identify when asked about their gender (their gender identity could be fluid). Taking a very simplistic view of gender, the variable might contain categories of male, female, and non-binary.
- The instruments played by the band members.
- This variable is categorical and nominal too: the instruments have no meaningful order but their names tell us something useful (guitar, bass, drums, etc.).
- The time they had spent learning to play their instruments.
- This is a continuous and ratio variable. The amount of time could be split into infinitely small divisions (nanoseconds even) and there is a meaningful true zero (no time spent learning your instrument means that, like 911, you can’t play at all).
Task 1.4
Say I own 857 CDs. My friend has written a computer program that uses a webcam to scan my shelves in my house where I keep my CDs and measure how many I have. His program says that I have 863 CDs. Define measurement error. What is the measurement error in my friend’s CD counting device?
Measurement error is the difference between the true value of something and the numbers used to represent that value. In this trivial example, the measurement error is 6 CDs. In this example we know the true value of what we’re measuring; usually we don’t have this information, so we have to estimate this error rather than knowing its actual value.
Task 1.5
Sketch the shape of a normal distribution, a positively skewed distribution and a negatively skewed distribution.
Normal

Positive skew

Negative skew

Task 1.6
In 2011 I got married and we went to Disney Florida for our honeymoon. We bought some bride and groom Mickey Mouse hats and wore them around the parks. The staff at Disney are really nice and upon seeing our hats would say ‘congratulations’ to us. We counted how many times people said congratulations over 7 days of the honeymoon: 5, 13, 7, 14, 11, 9, 17. Calculate the mean, median, sum of squares, variance and standard deviation of these data.
First compute the mean:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{5+13+7+14+11+9+17}{7} \\ &= \frac{76}{7} \\ &= 10.86 \end{aligned} \]
To calculate the median, first let’s arrange the scores in ascending order: 5, 7, 9, 11, 13, 14, 17. The median will be the (n + 1)/2th score. There are 7 scores, so this will be the 8/2 = 4th. The 4th score in our ordered list is 11.
To calculate the sum of squares, first take the mean from each score, then square this difference, finally, add up these squared values:
| Score | Error (score - mean) | Error squared | |
|---|---|---|---|
| 5 | -5.86 | 34.34 | |
| 13 | 2.14 | 4.58 | |
| 7 | -3.86 | 14.90 | |
| 14 | 3.14 | 9.86 | |
| 11 | 0.14 | 0.02 | |
| 9 | -1.86 | 3.46 | |
| 17 | 6.14 | 37.70 | |
| Total | — | — | 104.86 |
So, the sum of squared errors is (Table 1):
\[ \begin{aligned} \text{SS} &= 34.34 + 4.58 + 14.90 + 9.86 + 0.02 + 3.46 + 37.70 \\ &= 104.86 \\ \end{aligned} \]
The variance is the sum of squared errors divided by the degrees of freedom:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{104.86}{6} \\ &= 17.48 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{17.48} \\ &= 4.18 \end{aligned} \]
Task 1.7
In this chapter we used an example of the time taken for 21 heavy smokers to fall off a treadmill at the fastest setting (18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57). Calculate the sums of squares, variance and standard deviation of these data.
To calculate the sum of squares, take the mean from each value, then square this difference. Finally, add up these squared values (the values in the final column). The sum of squared errors is a massive 2685.24 (Table 2).
| Score | Mean | Difference | Difference squared | |
|---|---|---|---|---|
| 18 | 32.19 | -14.19 | 201.356 | |
| 16 | 32.19 | -16.19 | 262.116 | |
| 18 | 32.19 | -14.19 | 201.356 | |
| 24 | 32.19 | -8.19 | 67.076 | |
| 23 | 32.19 | -9.19 | 84.456 | |
| 22 | 32.19 | -10.19 | 103.836 | |
| 22 | 32.19 | -10.19 | 103.836 | |
| 23 | 32.19 | -9.19 | 84.456 | |
| 26 | 32.19 | -6.19 | 38.316 | |
| 29 | 32.19 | -3.19 | 10.176 | |
| 32 | 32.19 | -0.19 | 0.036 | |
| 34 | 32.19 | 1.81 | 3.276 | |
| 34 | 32.19 | 1.81 | 3.276 | |
| 36 | 32.19 | 3.81 | 14.516 | |
| 36 | 32.19 | 3.81 | 14.516 | |
| 43 | 32.19 | 10.81 | 116.856 | |
| 42 | 32.19 | 9.81 | 96.236 | |
| 49 | 32.19 | 16.81 | 282.576 | |
| 46 | 32.19 | 13.81 | 190.716 | |
| 46 | 32.19 | 13.81 | 190.716 | |
| 57 | 32.19 | 24.81 | 615.536 | |
| Total | — | — | — | 2685.236 |
The variance is the sum of squared errors divided by the degrees of freedom (\(N-1\)). There were 21 scores and so the degrees of freedom were 20. The variance is, therefore:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{2685.24}{20} \\ &= 134.26 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{134.26} \\ &= 11.59 \end{aligned} \]
Task 1.8
Sports scientists sometimes talk of a ‘red zone’, which is a period during which players in a team are more likely to pick up injuries because they are fatigued. When a player hits the red zone it is a good idea to rest them for a game or two. At a prominent London football club that I support, they measured how many consecutive games the 11 first team players could manage before hitting the red zone: 10, 16, 8, 9, 6, 8, 9, 11, 12, 19, 5. Calculate the mean, standard deviation, median, range and interquartile range.
First we need to compute the mean:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{10+16+8+9+6+8+9+11+12+19+5}{11} \\ &= \frac{113}{11} \\ &= 10.27 \end{aligned} \]
Then the standard deviation, which we do as follows:
| Score | Error (score - mean) | Error squared | |
|---|---|---|---|
| 10 | -0.27 | 0.07 | |
| 16 | 5.73 | 32.83 | |
| 8 | -2.27 | 5.15 | |
| 9 | -1.27 | 1.61 | |
| 6 | -4.27 | 18.23 | |
| 8 | -2.27 | 5.15 | |
| 9 | -1.27 | 1.61 | |
| 11 | 0.73 | 0.53 | |
| 12 | 1.73 | 2.99 | |
| 19 | 8.73 | 76.21 | |
| 5 | -5.27 | 27.77 | |
| Total | — | — | 172.15 |
So, the sum of squared errors is (Table 3):
\[ \begin{aligned} \text{SS} &= 0.07 + 32.83 + 5.15 + 1.61 + 18.23 + 5.15 + 1.61 + 0.53 + 2.99 + 76.21 + 27.77 \\ &= 172.15 \\ \end{aligned} \]
The variance is the sum of squared errors divided by the degrees of freedom:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{172.15}{10} \\ &= 17.22 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{17.22} \\ &= 4.15 \end{aligned} \]
- To calculate the median, range and interquartile range, first let’s arrange the scores in ascending order: 5, 6, 8, 8, 9, 9, 10, 11, 12, 16, 19. The median: The median will be the (\(n + 1\))/2th score. There are 11 scores, so this will be the 12/2 = 6th. The 6th score in our ordered list is 9 games. Therefore, the median number of games is 9.
- The lower quartile: This is the median of the lower half of scores. If we split the data at 9 (the 6th score), there are 5 scores below this value. The median of 5 = 6/2 = 3rd score. The 3rd score is 8, the lower quartile is therefore 8 games.
- The upper quartile: This is the median of the upper half of scores. If we split the data at 9 again (not including this score), there are 5 scores above this value. The median of 5 = 6/2 = 3rd score above the median. The 3rd score above the median is 12; the upper quartile is therefore 12 games.
- The range: This is the highest score (19) minus the lowest (5), i.e. 14 games.
- The interquartile range: This is the difference between the upper and lower quartile: 12−8 = 4 games.
Task 1.9
Celebrities always seem to be getting divorced. The (approximate) length of some celebrity marriages in days are: 240 (J-Lo and Cris Judd), 144 (Charlie Sheen and Donna Peele), 143 (Pamela Anderson and Kid Rock), 72 (Kim Kardashian, if you can call her a celebrity), 30 (Drew Barrymore and Jeremy Thomas), 26 (Axl Rose and Erin Everly), 2 (Britney Spears and Jason Alexander), 150 (Drew Barrymore again, but this time with Tom Green), 14 (Eddie Murphy and Tracy Edmonds), 150 (Renee Zellweger and Kenny Chesney), 1657 (Jennifer Aniston and Brad Pitt). Compute the mean, median, standard deviation, range and interquartile range for these lengths of celebrity marriages.
First we need to compute the mean:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{240+144+143+72+30+26+2+150+14+150+1657}{11} \\ &= \frac{2628}{11} \\ &= 238.91 \end{aligned} \]
Then the standard deviation, which we do as follows:
| Score | Error (score - mean) | Error squared | |
|---|---|---|---|
| 240 | 1.09 | 1.19 | |
| 144 | -94.91 | 9007.91 | |
| 143 | -95.91 | 9198.73 | |
| 72 | -166.91 | 27858.95 | |
| 30 | -208.91 | 43643.39 | |
| 26 | -212.91 | 45330.67 | |
| 2 | -236.91 | 56126.35 | |
| 150 | -88.91 | 7904.99 | |
| 14 | -224.91 | 50584.51 | |
| 150 | -88.91 | 7904.99 | |
| 1657 | 1418.09 | 2010979.25 | |
| Total | — | — | 2268541 |
So, the sum of squared errors is the sum of the final column (Table 4). The variance is the sum of squared errors divided by the degrees of freedom:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{2268541}{10} \\ &= 226854.1 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{226854.1} \\ &= 476.29 \end{aligned} \]
- To calculate the median, range and interquartile range, first let’s arrange the scores in ascending order: 2, 14, 26, 30, 72, 143, 144, 150, 150, 240, 1657. The median: The median will be the (n + 1)/2th score. There are 11 scores, so this will be the 12/2 = 6th. The 6th score in our ordered list is 143. The median length of these celebrity marriages is therefore 143 days.
- The lower quartile: This is the median of the lower half of scores. If we split the data at 143 (the 6th score), there are 5 scores below this value. The median of 5 = 6/2 = 3rd score. The 3rd score is 26, the lower quartile is therefore 26 days.
- The upper quartile: This is the median of the upper half of scores. If we split the data at 143 again (not including this score), there are 5 scores above this value. The median of 5 = 6/2 = 3rd score above the median. The 3rd score above the median is 150; the upper quartile is therefore 150 days.
- The range: This is the highest score (1657) minus the lowest (2), i.e. 1655 days.
- The interquartile range: This is the difference between the upper and lower quartile: 150−26 = 124 days.
Task 1.10
Repeat Task 9 but excluding Jennifer Anniston and Brad Pitt’s marriage. How does this affect the mean, median, range, interquartile range, and standard deviation? What do the differences in values between Tasks 9 and 10 tell us about the influence of unusual scores on these measures?
First let’s compute the new mean:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{240+144+143+72+30+26+2+150+14+150}{11} \\ &= \frac{971}{11} \\ &= 97.1 \end{aligned} \]
The mean length of celebrity marriages is now 97.1 days compared to 238.91 days when Jennifer Aniston and Brad Pitt’s marriage was included. This demonstrates that the mean is greatly influenced by extreme scores.
Let’s now calculate the standard deviation excluding Jennifer Aniston and Brad Pitt’s marriage (Table 5):
| Score | Error (score - mean) | Error squared | |
|---|---|---|---|
| 240 | 142.9 | 20420.41 | |
| 144 | 46.9 | 2199.61 | |
| 143 | 45.9 | 2106.81 | |
| 72 | -25.1 | 630.01 | |
| 30 | -67.1 | 4502.41 | |
| 26 | -71.1 | 5055.21 | |
| 2 | -95.1 | 9044.01 | |
| 150 | 52.9 | 2798.41 | |
| 14 | -83.1 | 6905.61 | |
| 150 | 52.9 | 2798.41 | |
| Total | — | — | 56460.9 |
So, the sum of squared errors is:
\[ \begin{aligned} \text{SS} &= 20420.41 + 2199.61 + 2106.81 + 630.01 + 4502.41 + 5055.21 + 9044.01 + 2798.41 + 6905.61 + 2798.41 \\ &= 56460.90 \\ \end{aligned} \]
The variance is the sum of squared errors divided by the degrees of freedom:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{56460.90}{9} \\ &= 6273.43 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{6273.43} \\ &= 79.21 \end{aligned} \]
From these calculations we can see that the variance and standard deviation, like the mean, are both greatly influenced by extreme scores. When Jennifer Aniston and Brad Pitt’s marriage was included in the calculations (see Smart Alex Task 9), the variance and standard deviation were much larger, i.e. 226854.09 and 476.29 respectively.
- To calculate the median, range and interquartile range, first, let’s again arrange the scores in ascending order but this time excluding Jennifer Aniston and Brad Pitt’s marriage: 2, 14, 26, 30, 72, 143, 144, 150, 150, 240.
- The median: The median will be the (n + 1)/2 score. There are now 10 scores, so this will be the 11/2 = 5.5th. Therefore, we take the average of the 5th score and the 6th score. The 5th score is 72, and the 6th is 143; the median is therefore 107.5 days.
- The lower quartile: This is the median of the lower half of scores. If we split the data at 107.5 (this score is not in the data set), there are 5 scores below this value. The median of 5 = 6/2 = 3rd score. The 3rd score is 26; the lower quartile is therefore 26 days.
- The upper quartile: This is the median of the upper half of scores. If we split the data at 107.5 (this score is not actually present in the data set), there are 5 scores above this value. The median of 5 = 6/2 = 3rd score above the median. The 3rd score above the median is 150; the upper quartile is therefore 150 days.
- The range: This is the highest score (240) minus the lowest (2), i.e. 238 days. You’ll notice that without the extreme score the range drops dramatically from 1655 to 238 – less than half the size.
- The interquartile range: This is the difference between the upper and lower quartile: 150 − 26 = 124 days of marriage. This is the same as the value we got when Jennifer Aniston and Brad Pitt’s marriage was included. This demonstrates the advantage of the interquartile range over the range, i.e. it isn’t affected by extreme scores at either end of the distribution
Chapter 2
Task 2.1
Why do we use samples?
We are usually interested in populations, but because we cannot collect data from every human being (or whatever) in the population, we collect data from a small subset of the population (known as a sample) and use these data to infer things about the population as a whole.
Task 2.2
What is the mean and how do we tell if it’s representative of our data?
The mean is a simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the ‘typical’ score. We use the variance, or standard deviation, to tell us whether it is representative of our data. The standard deviation is a measure of how much error there is associated with the mean: a small standard deviation indicates that the mean is a good representation of our data.
Task 2.3
What’s the difference between the standard deviation and the standard error?
The standard deviation tells us how much observations in our sample differ from the mean value within our sample. The standard error tells us not about how the sample mean represents the sample itself, but how well the sample mean represents the population mean. The standard error is the standard deviation of the sampling distribution of a statistic. For a given statistic (e.g. the mean) it tells us how much variability there is in this statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came.
Task 2.4
In Chapter 1 we used an example of the time in seconds taken for 21 heavy smokers to fall off a treadmill at the fastest setting (18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57). Calculate standard error and 95% confidence interval for these data.
If you did the tasks in Chapter 1, you’ll know that the mean is 32.19 seconds:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{16+(2\times18)+(2\times22)+(2\times23)+24+26+29+32+(2\times34)+(2\times36)+42+43+(2\times46)+49+57}{21} \\ &= \frac{676}{21} \\ &= 32.19 \end{aligned} \]
We also worked out that the sum of squared errors was 2685.24; the variance was 2685.24/20 = 134.26; the standard deviation is the square root of the variance, so was \(\sqrt(134.26)\) = 11.59. The standard error will be:
\[ SE = \frac{s}{\sqrt{N}} = \frac{11.59}{\sqrt{21}} = 2.53 \]
The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, \(N − 1\). With 21 data points, the degrees of freedom are 20. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.09. The confidence intervals is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(2.09 \times SE)) \\ &= 32.19 – (2.09 × 2.53) \\ & = 26.90 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(2.09 \times SE) \\ &= 32.19 + (2.09 × 2.53) \\ &= 37.48 \end{aligned} \]
Task 2.5
What do the sum of squares, variance and standard deviation represent? How do they differ?
All of these measures tell us something about how well the mean fits the observed sample data. Large values (relative to the scale of measurement) suggest the mean is a poor fit of the observed scores, and small values suggest a good fit. They are also, therefore, measures of dispersion, with large values indicating a spread-out distribution of scores and small values showing a more tightly packed distribution. These measures all represent the same thing, but differ in how they express it. The sum of squared errors is a ‘total’ and is, therefore, affected by the number of data points. The variance is the ‘average’ variability but in units squared. The standard deviation is the average variation but converted back to the original units of measurement. As such, the size of the standard deviation can be compared to the mean (because they are in the same units of measurement).
Task 2.6
What is a test statistic and what does it tell us?
A test statistic is a statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses, or to establish whether a model is a reasonable representation of what’s happening in the population.
Task 2.7
What are Type I and Type II errors?
A Type I error occurs when we believe that there is a genuine effect in our population, when in fact there isn’t. A Type II error occurs when we believe that there is no effect in the population when, in reality, there is.
Task 2.8
What is statistical power?
Power is the ability of a test to detect an effect of a particular size (a value of 0.8 is a good level to aim for).
Task 2.9
Figure 2.21 (in the book) shows two experiments that looked at the effect of singing versus conversation on how much time a woman would spend with a man. In both experiments the means were 10 (singing) and 12 (conversation), the standard deviations in all groups were 3, but the group sizes were 10 per group in the first experiment and 100 per group in the second. Compute the values of the confidence intervals displayed in the Figure.
Experiment 1:
In both groups, because they have a standard deviation of 3 and a sample size of 10, the standard error will be:
\[ SE = \frac{s}{\sqrt{N}} = \frac{3}{\sqrt{10}} = 0.95 \]
The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, \(N − 1\). With 10 data points, the degrees of freedom are 9. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.26. The confidence interval for the singing group is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(2.26 \times SE) \\ &= 10 – (2.26 × 0.95) \\ & = 7.85 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(2.26 \times SE) \\ &= 10 + (2.26 × 0.95) \\ &= 12.15 \end{aligned} \]
For the conversation group:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(2.26 \times SE) \\ &= 12 – (2.26 × 0.95) \\ & = 9.85 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(2.26 \times SE) \\ &= 12 + (2.26 × 0.95) \\ &= 14.15 \end{aligned} \]
Experiment 2
In both groups, because they have a standard deviation of 3 and a sample size of 100, the standard error will be:
\[ SE = \frac{s}{\sqrt{N}} = \frac{3}{\sqrt{100}} = 0.3 \]
The sample is large, so to calculate the confidence interval we need to find the appropriate value of z. For a 95% confidence interval we should look up the value of 0.025 in the column labelled Smaller Portion in the table of the standard normal distribution (Appendix). The corresponding value is 1.96. The confidence interval for the singing group is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(1.96 \times SE) \\ &= 10 – (1.96 × 0.3) \\ & = 9.41 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(1.96 \times SE) \\ &= 10 + (1.96 × 0.3) \\ &= 10.59 \end{aligned} \]
For the conversation group:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(1.96 \times SE) \\ &= 12 – (1.96 × 0.3) \\ & = 11.41 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(1.96 \times SE) \\ &= 12 + (1.96 × 0.3) \\ &= 12.59 \end{aligned} \]
Task 2.10
Figure 2.22 shows a similar study to above, but the means were 10 (singing) and 10.01 (conversation), the standard deviations in both groups were 3, and each group contained 1 million people. Compute the values of the confidence intervals displayed in the figure.
In both groups, because they have a standard deviation of 3 and a sample size of 1,000,000, the standard error will be:
\[ SE = \frac{s}{\sqrt{N}} = \frac{3}{\sqrt{1000000}} = 0.003 \]
The sample is large, so to calculate the confidence interval we need to find the appropriate value of z. For a 95% confidence interval we should look up the value of 0.025 in the column labelled Smaller Portion in the table of the standard normal distribution (Appendix). The corresponding value is 1.96. The confidence interval for the singing group is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(1.96 \times SE) \\ &= 10 – (1.96 × 0.003) \\ & = 9.99412 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(1.96 \times SE) \\ &= 10 + (1.96 × 0.003) \\ &= 10.00588 \end{aligned} \]
For the conversation group:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(1.96 \times SE) \\ &= 10.01 – (1.96 × 0.003) \\ & = 10.00412 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(1.96 \times SE) \\ &= 10.01 + (1.96 × 0.003) \\ &= 10.01588 \end{aligned} \]
Note: these values will look slightly different than the plot because the exact means were 10.00147 and 10.01006, but we rounded off to 10 and 10.01 to make life a bit easier. If you use these exact values you’d get, for the singing group:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(1.96 \times SE) \\ &= 10.01006 – (1.96 × 0.003) \\ & = 9.99559 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(1.96 \times SE) \\ &= 10.01006 + (1.96 × 0.003) \\ &= 10.00735 \end{aligned} \]
For the conversation group:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(1.96 \times SE) \\ &= 10.01006 – (1.96 × 0.003) \\ & = 10.00418 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(1.96 \times SE) \\ &= 10.01006 + (1.96 × 0.003) \\ &= 10.01594 \end{aligned} \]
Task 2.11
In Chapter 1 (Task 8) we looked at an example of how many games it took a sportsperson before they hit the ‘red zone’ Calculate the standard error and confidence interval for those data.
We worked out in Chapter 1 that the mean was 10.27, the standard deviation 4.15, and there were 11 sportspeople in the sample. The standard error will be:
\[ SE = \frac{s}{\sqrt{N}} = \frac{4.15}{\sqrt{11}} = 1.25 \]
The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, \(N − 1\). With 11 data points, the degrees of freedom are 10. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘.05’ in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.23. The confidence interval is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(2.23 \times SE) \\ &= 10.27 – (2.23 × 1.25) \\ & = 7.48 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(2.23 \times SE) \\ &= 10.27 + (2.23 × 1.25) \\ &= 13.06 \end{aligned} \]
Task 2.12
At a rival club to the one I support, they similarly measured the number of consecutive games it took their players before they reached the red zone. The data are: 6, 17, 7, 3, 8, 9, 4, 13, 11, 14, 7. Calculate the mean, standard deviation, and confidence interval for these data.
First we need to compute the mean:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{6+17+7+3+8+9+4+13+11+14+7}{11} \\ &= \frac{99}{11} \\ &= 9.00 \end{aligned} \]
Then the standard deviation, which we do as follows:
| Score | Error (score - mean) | Error squared | |
|---|---|---|---|
| 6 | -3 | 9 | |
| 17 | 8 | 64 | |
| 7 | -2 | 4 | |
| 3 | -6 | 36 | |
| 8 | -1 | 1 | |
| 9 | 0 | 0 | |
| 4 | -5 | 25 | |
| 13 | 4 | 16 | |
| 11 | 2 | 4 | |
| 14 | 5 | 25 | |
| 7 | -2 | 4 | |
| Total | — | — | 188 |
The sum of squared errors is (Table 6):
\[ \begin{aligned} \text{SS} &= 9 + 64 + 4 + 36 + 1 + 0 + 25 + 16 + 4 + 25 + 4 \\ &= 188 \\ \end{aligned} \]
The variance is the sum of squared errors divided by the degrees of freedom:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{188}{10} \\ &= 18.8 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{18.8} \\ &= 4.34 \end{aligned} \]
There were 11 sportspeople in the sample, so the standard error will be: \[ SE = \frac{s}{\sqrt{N}} = \frac{4.34}{\sqrt{11}} = 1.31\]
The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, \(N − 1\). With 11 data points, the degrees of freedom are 10. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.23. The confidence intervals is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(2.23\times SE)) \\ &= 9 – (2.23 × 1.31) \\ & = 6.08 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(2.23\times SE) \\ &= 9 + (2.23 × 1.31) \\ &= 11.92 \end{aligned} \]
Task 2.13
In Chapter 1 (Task 9) we looked at the length in days of 11 celebrity marriages. Here are the approximate lengths in months of nine marriages, one being mine and the others being those of some of my friends and family. In all but two cases the lengths are calculated up to the day I’m writing this, which is 20 June 2023, but the 3- and 111-month durations are marriages that have ended – neither of these is mine, in case you’re wondering: 3, 144, 267, 182, 159, 152, 693, 50, and 111. Calculate the mean, standard deviation and confidence interval for these data.
First we need to compute the mean:
\[ \begin{aligned} \overline{X} &= \frac{\sum_{i=1}^{n} x_i}{n} \\ &= \frac{3 + 144 + 267 + 182 + 159 + 152 + 693 + 50 + 111}{9} \\ &= \frac{1761}{9} \\ &= 195.67 \end{aligned} \]
Compute the standard deviation as follows:
| Score | Error (score - mean) | Error squared | |
|---|---|---|---|
| 3 | -192.67 | 37121.73 | |
| 144 | -51.67 | 2669.79 | |
| 267 | 71.33 | 5087.97 | |
| 182 | -13.67 | 186.87 | |
| 159 | -36.67 | 1344.69 | |
| 152 | -43.67 | 1907.07 | |
| 693 | 497.33 | 247337.13 | |
| 50 | -145.67 | 21219.75 | |
| 111 | -84.67 | 7169.01 | |
| Total | — | — | 324044 |
The sum of squared errors is (Table 7):
\[ \begin{aligned} \text{SS} &= 37121.73 + 2669.79 + 5087.97 + 186.87 + 1344.69 + 1907.07 + 247337.13 + 21219.75 + 7169.01 \\ &= 324044 \\ \end{aligned} \]
The variance is the sum of squared errors divided by the degrees of freedom:
\[ \begin{aligned} s^2 &= \frac{SS}{N - 1} \\ &= \frac{324044}{8} \\ &= 40505.5 \end{aligned} \]
The standard deviation is the square root of the variance:
\[ \begin{aligned} s &= \sqrt{s^2} \\ &= \sqrt{40505.5} \\ &= 201.2598 \end{aligned} \]
The standard error is:
\[ \begin{aligned} SE &= \frac{s}{\sqrt{N}} \\ &= \frac{201.2598}{\sqrt{9}} \\ &= 67.0866 \end{aligned} \]
The sample is small, so to calculate the confidence interval we need to find the appropriate value of t. First we need to calculate the degrees of freedom, \(N − 1\). With 9 data points, the degrees of freedom are 8. For a 95% confidence interval we can look up the value in the column labelled ‘Two-Tailed Test’, ‘0.05’ in the table of critical values of the t-distribution (Appendix). The corresponding value is 2.31. The confidence interval is, therefore, given by:
\[ \begin{aligned} \text{95% CI}_\text{lower boundary} &= \overline{X}-(2.31 \times SE)) \\ &= 195.67 – (2.31 × 67.0866) \\ & = 40.70 \\ \text{95% CI}_\text{upper boundary} &= \overline{X}+(2.31 \times SE) \\ &= 195.67 + (2.31 × 67.0866) \\ &= 350.64 \end{aligned} \]
Chapter 3
Task 3.1
What is an effect size and how is it measured?
An effect size is an objective and standardized measure of the magnitude of an observed effect. Measures include Cohen’s d, the odds ratio and Pearson’s correlations coefficient, r. Cohen’s d, for example, is the difference between two means divided by either the standard deviation of the control group, or by a pooled standard deviation.
Task 3.2
In Chapter 1 (Task 8) we looked at an example of how many games it took a sportsperson before they hit the ‘red zone’, then in Chapter 2 we looked at data from a rival club. Compute and interpret Cohen’s \(\hat{d}\) for the difference in the mean number of games it took players to become fatigued in the two teams mentioned in those tasks.
Cohen’s d is defined as:
\[ \hat{d} = \frac{\bar{X_1}-\bar{X_2}}{s} \]
There isn’t an obvious control group, so let’s use a pooled estimate of the standard deviation:
\[ \begin{aligned} s_p &= \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} \\ &= \sqrt{\frac{(11-1)4.15^2+(11-1)4.34^2}{11+11-2}} \\ &= \sqrt{\frac{360.23}{20}} \\ &= 4.24 \end{aligned} \]
Therefore, Cohen’s \(\hat{d}\) is:
\[ \hat{d} = \frac{10.27-9}{4.24} = 0.30 \]
Therefore, the second team fatigued in fewer matches than the first team by about 1/3 standard deviation. By the benchmarks that we probably shouldn’t use, this is a small to medium effect, but I guess if you’re managing a top-flight sports team, fatiguing 1/3 of a standard deviation faster than one of your opponents could make quite a substantial difference to your performance and team rotation over the season.
Task 3.3
Calculate and interpret Cohen’s \(\hat{d}\) for the difference in the mean duration of the celebrity marriages in Chapter 1 (Task 9) and me and my friend’s marriages (Chapter 2, Task 13).
Cohen’s \(\hat{d}\) is defined as:
\[ \hat{d} = \frac{\bar{X_1}-\bar{X_2}}{s} \]
There isn’t an obvious control group, so let’s use a pooled estimate of the standard deviation:
\[ \begin{aligned} s_p &= \sqrt{\frac{(N_1-1) s_1^2+(N_2-1) s_2^2}{N_1+N_2-2}} \\ &= \sqrt{\frac{(11-1)476.29^2+(9-1)8275.91^2}{11+9-2}} \\ &= \sqrt{\frac{550194093}{18}} \\ &= 5528.68 \end{aligned} \]
Therefore, Cohen’s d is: \[\hat{d} = \frac{5057-238.91}{5528.68} = 0.87\] Therefore, my friend’s marriages are 0.87 standard deviations longer than the sample of celebrities. By the benchmarks that we probably shouldn’t use, this is a large effect.
Task 3.4
What are the problems with null hypothesis significance testing?
- We can’t conclude that an effect is important because the p-value from which we determine significance is affected by sample size. Therefore, the word ‘significant’ is meaningless when referring to a p-value.
- The null hypothesis is never true. If the p-value is greater than .05 then we can decide to reject the alternative hypothesis, but this is not the same thing as the null hypothesis being true: a non-significant result tells us is that the effect is not big enough to be found but it doesn’t tell us that the effect is zero.
- A significant result does not tell us that the null hypothesis is false (see text for details).
- It encourages all or nothing thinking: if p < 0.05 then an effect is significant, but if p > 0.05 it is not. So, a p = 0.0499 is significant but a p = 0.0501 is not, even though these ps differ by only 0.0002.
Task 3.5
What is the difference between a confidence interval and a credible interval?
A 95% confidence interval is set so that before the data are collected there is a long-run probability of 0.95 (or 95%) that the interval will contain the true value of the parameter. This means that in 100 random samples, the intervals will contain the true value in 95 of them but won’t in 5. Once the data are collected, your sample is either one of the 95% that produces an interval containing the true value, or one of the 5% that does not. In other words, having collected the data, the probability of the interval containing the true value of the parameter is either 0 (it does not contain it) or 1 (it does contain it), but you do not know which. A credible interval is different in that it reflects the plausible probability that the interval contains the true value. For example, a 95% credible interval has a plausible 0.95 probability of containing the true value.
Task 3.6
What is a meta-analysis?
Meta-analysis is where effect sizes from different studies testing the same hypothesis are combined to get a better estimate of the size of the effect in the population.
Task 3.7
Describe what you understand by the term Bayes factor.
The Bayes factor is the ratio of the probability of the data given the alternative hypothesis to that of the data given the null hypothesis. A Bayes factor less than 1 supports the null hypothesis (it suggests the data are more likely given the null hypothesis than the alternative hypothesis); conversely, a Bayes factor greater than 1 suggests that the observed data are more likely given the alternative hypothesis than the null. Values between 1 and 3 are considered evidence for the alternative hypothesis that is ‘barely worth mentioning’, values between 3 and 10 are considered to indicate evidence for the alternative hypothesis that ‘has substance’, and values greater than 10 are strong evidence for the alternative hypothesis.
Task 3.8
Various studies have shown that students who use laptops in class often do worse on their modules (Payne-Carter et al., 2016). Table 3.3 (reproduced in in Table 8) shows some fabricated data that mimics what has been found. What is the odds ratio for passing the exam if the student uses a laptop in class compared to if they don’t?
| Laptop | No Laptop | Sum | |
|---|---|---|---|
| Pass | 24 | 49 | 73 |
| Fail | 16 | 11 | 27 |
| Sum | 40 | 60 | 100 |
First we compute the odds of passing when a laptop is used in class:
\[ \begin{aligned} \text{Odds}_{\text{pass when laptop is used}} &= \frac{\text{Number of laptop users passing exam}}{\text{Number of laptop users failing exam}} \\ &= \frac{24}{16} \\ &= 1.5 \end{aligned} \]
Next we compute the odds of passing when a laptop is not used in class:
\[ \begin{aligned} \text{Odds}_{\text{pass when laptop is not used}} &= \frac{\text{Number of students without laptops passing exam}}{\text{Number of students without laptops failing exam}} \\ &= \frac{49}{11} \\ &= 4.45 \end{aligned} \]
The odds ratio is the ratio of the two odds that we have just computed:
\[ \begin{aligned} \text{Odds Ratio} &= \frac{\text{Odds}_{\text{pass when laptop is used}}}{\text{Odds}_{\text{pass when laptop is not used}}} \\ &= \frac{1.5}{4.45} \\ &= 0.34 \end{aligned} \]
The odds of passing when using a laptop are 0.34 times those when a laptop is not used. If we take the reciprocal of this, we could say that the odds of passing when not using a laptop are 2.97 times those when a laptop is used.
Task 3.9
From the data in Table 3.1 (reproduced in Table 8) what is the conditional probability that someone used a laptop given that they passed the exam, p(laptop|pass). What is the conditional probability of that someone didn’t use a laptop in class given they passed the exam, p(no laptop |pass)?
The conditional probability that someone used a laptop given they passed the exam is 0.33, or a 33% chance:
\[ p(\text{laptop|pass})=\frac{p(\text{laptop ∩ pass})}{p(\text{pass})}=\frac{{24}/{100}}{{73}/{100}}=\frac{0.24}{0.73}=0.33 \]
The conditional probability that someone didn’t use a laptop in class given they passed the exam is 0.67 or a 67% chance.
\[ p(\text{no laptop|pass})=\frac{p(\text{no laptop ∩ pass})}{p(\text{pass})}=\frac{{49}/{100}}{{73}/{100}}=\frac{0.49}{0.73}=0.67 \]
Task 3.10
Using the data in Table 3.1 (reproduced in Table 8), what are the posterior odds of someone using a laptop in class (compared to not using one) given that they passed the exam?
The posterior odds are the ratio of the posterior probability for one hypothesis to another. In this example it would be the ratio of the probability that a used a laptop given that they passed (which we have already calculated above to be 0.33) to the probability that they did not use a laptop in class given that they passed (which we have already calculated above to be 0.67). The value turns out to be 0.49, which means that the probability that someone used a laptop in class if they passed the exam is about half of the probability that someone didn’t use a laptop in class given that they passed the exam.
\[ \text{posterior odds}= \frac{p(\text{hypothesis 1|data})}{p(\text{hypothesis 2|data})} = \frac{p(\text{laptop|pass})}{p(\text{no laptop| pass})} = \frac{0.33}{0.67} = 0.49 \]
Chapter 4
Task 4.1
Smart Alex’s first task for this chapter is to set up an RStudio project (you should have done this already if you’ve been following the chapter) and within the main project folder create folders called ‘data’, and ‘quarto’. Then create a Quarto document and follow the self-tests in the chapter to create a document containing some code and text.
This is a general video on workflow (note that I name the quarto folder as r_docs in this tutorial, but otherwise it follows the book chapter):
You can download a complete version of this file (chapter_4_self_tests.qmd) and it’s rendered output (chapter_4_self_tests.html) from www.discovr.rocks/repository/chapter_4_self_tests.zip.
Task 4.2
Render the document in Task 1, then locate the rendered file within your project folder.
A general tutorial on rendering is here:
You can download a complete version of this file (chapter_4_self_tests.qmd) and it’s rendered output (chapter_4_self_tests.html) from www.discovr.rocks/repository/chapter_4_self_tests.zip.
Task 4.3
The data below show the score (out of 20) for 20 different students, some of whom are male and some female, and some of whom were taught using positive reinforcement (being nice) and others who were taught using punishment (electric shock). Enter these data into a tibble. (Hint: the data below are in messy format, you need to enter them in tidy format.)
We have three variables here: sex (was the person male or female), the method of teaching they underwent and their mark on an assignment. Therefore, the tidy format is to arrange the data in three columns. There are several ways to do this, here’s one of them:
The data can be found in discovr::teaching (this version has an extra variable with the participant id).
Task 4.4
Thinking back to Labcoat Leni’s Real Research 4.1, Oxoby also measured the minimum acceptable offer; these MAOs (in dollars) are below (again, they are approximations based on the graphs in the paper). Enter these data into a tibble.
- Bon Scott group: 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5
- Brian Johnson group: 0, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 1
The full data can be found in discovr::acdc
Task 4.5
According to some highly unscientific research done by a UK department store chain and reported in Marie Claire magazine, shopping is good for you. They found that the average woman spends 150 minutes and walks 2.6 miles when she shops, burning off around 385 calories. In contrast, men spend only about 50 minutes shopping, covering 1.5 miles. This was based on strapping a pedometer on a mere 10 participants. Although I don’t have the actual data, some simulated data based on these means are below. Enter these data into a tibble (again, you need to think about making the data tidy format).
We have three variables here: sex (was the person male or female), the distance they walked and the time they spent shopping. Therefore, we arrange the data in three columns:
The data can be found in discovr::shopping.
Task 4.6
I wondered whether a fish or cat made a better pet. I found some people who had either fish or cats as pets and measured their life satisfaction and how much they like animals. Enter these data into a tibble.
We have three variables here: pet (did the person have a fish or a cat as a pet?), their love of animals (animal) and their life_satisfaction score. The tidy format is to arrange the data in three columns. There are several ways to do this, here’s one of them:
The full data can be found in discovr::pets.
Task 4.7
One of my favourite activities, especially when trying to do brain-melting things like writing statistics books, is drinking tea. I am English, after all. Fortunately, tea improves your cognitive function – well, it does in old Chinese people at any rate (Feng et al., 2010). I may not be Chinese and I’m not that old, but I nevertheless enjoy the idea that tea might help me think. Here are some data based on Feng et al.’s study that measured the number of cups of tea drunk per day and cognitive functioning (out of 80) in 15 people. Enter these data into a tibble.
We have two variables here: the number of cups of tea a person drinks per day (tea) and their cognitive functioning out of 80 (cog_fun) The tidy format is to arrange the data in three columns. There are several ways to do this, here’s one of them:
The data (with an additional variable containing the participant id) can be found in discovr::tea_15.
Task 1.8
Statistics and maths anxiety are common and affect people’s performance on maths and stats assignments; women in particular can lack confidence in mathematics (Field, 2010). Zhang et al. (2013) did an intriguing study in which students completed a maths test in which some put their own name on the test booklet, whereas others were given a booklet that already had either a male or female name on. Participants in the latter two conditions were told that they would use this other person’s name for the purpose of the test. Women who completed the test using a different name performed significantly better than those who completed the test using their own name. (There were no such significant effects for men.) The data below are a random subsample of Zhang et al.’s data. Enter them into a tibble.
The design of this study is such that different people were put in one of three different conditions (female fake name, male fake name, own name), but the groups are not equal. In the female fake name group there were 10 females but only 7 males, in the fake male name there were 10 females and 9 males, and in the own name condition 7 females and 9 males. With a bit of adding we can see that there were 27 females in total, and 25 males. We’re going to have to use a lot of rep() statements and keep our wits about us!
We have three variables: whether the participant was male or female (sex), which booklet condition they were in (name_type) and their test score (accuracy). I’ve also included a participant id, just so that your data matches the file zhang_2013_subsample.csv that I provide for you. We can enter the data as follows (there are other ways too …):
zhang_tib <- tibble(.rows = 52) |>
mutate(
id = c(171, 35, 57, 36, 53, 176, 76, 184, 64, 166, 14, 100, 30, 49, 157, 14, 68, 71, 4, 40, 66, 27, 61, 27, 36, 33, 120, 113, 95, 99, 78, 32, 43, 183, 103, 31, 86, 54, 5, 20, 13, 59, 58, 188, 187, 15, 50, 9, 45, 60, 73, 189) |> as_factor(),
sex = c(rep("Female", 27), rep("Male", 25)) |> as_factor(),
name_type = c(rep("Female fake name", 10), rep("Male fake name", 10), rep("Own name", 7), rep("Female fake name", 7), rep("Male fake name", 9), rep("Own name", 9)) |> as_factor(),
accuracy = c(33, 22, 46, 53, 14, 27, 64, 62, 75, 50, 69, 60, 82, 78, 38, 63, 46, 27, 61, 29, 75, 33, 83, 42, 10, 44, 27, 53, 47, 87, 41, 62, 67, 57, 31, 63, 34, 40, 22, 17, 60, 47, 57, 70, 57, 33, 83, 86, 65, 64, 37, 80)
)The data are in discovr::zhang_sample.
Task 4.9
What is a factor variable?
A variable in which numbers are used to represent group or category membership. An example would be a variable in which a score of 1 represents a person being female, and a 0 represents them being male.
Task 4.10
What is the difference between messy and tidy format data?
Tidy format data are arranged such that scores on an outcome variable appear in a single column and rows represent a combination of the attributes of those scores (for example, the entity from which the scores came, when the score was recorded etc.). In tidy format data, scores from a single entity can appear over multiple rows where each row represents a combination of the attributes of the score (e.g., levels of an independent variable or time point at which the score was recorded etc.) In contrast, messy format data are arranged such that scores from a single entity appear in a single row and levels of independent or predictor variables are arranged over different columns. As such, in designs with multiple measurements of an outcome variable within a case the outcome variable scores will be contained in multiple columns each representing a level of an independent variable, or a timepoint at which the score was observed. Columns can also represent attributes of the score or entity that are fixed over the duration of data collection (e.g., participant sex, employment status etc.).
Chapter 5
Task 5.1
The file students.csv contains some fictional data about students and lecturers including their number of friends, income (per anum), alcohol consumption per week (in units) and neuroticism (neurotic). Plot and interpret an error bar chart showing the mean number of friends for students and lecturers.
First, load the data (see Tip 2) using
student_tib <- discovr::studentsTo find out more about the data execute ?students. The code for the plot (Figure 1) is:
ggplot(student_tib, aes(group, friends)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange") +
coord_cartesian(ylim = c(0, 20)) +
scale_y_continuous(breaks = seq(0, 20, 2)) +
labs(x = "Group", y = "Number of friends") +
theme_minimal()
We can conclude that, on average, students had more friends than lecturers.
Task 5.2
Using the same data, plot and interpret an error bar chart showing the mean alcohol consumption for students and lecturers.
The code for the plot in Figure 2 is:
ggplot(student_tib, aes(group, alcohol)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange") +
coord_cartesian(ylim = c(0, 30)) +
scale_y_continuous(breaks = seq(0, 30, 5)) +
labs(x = "Group", y = "Alcohol consumption per week (units)") +
theme_minimal()
We can conclude that, on average, students and lecturers drank similar amounts, but the error bars tell us that the mean is a better representation of the population for students than for lecturers (there is more variability in lecturers’ drinking habits compared to students’).
Task 5.3
Using the same data, plot and interpret an error line chart showing the mean income for students and lecturers.
The code for the plot in Figure 3 is:
ggplot(student_tib, aes(group, income)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange") +
coord_cartesian(ylim = c(0, 50000)) +
scale_y_continuous(breaks = seq(0, 50000, 5000)) +
labs(x = "Group", y = "Income (p.a.)") +
theme_minimal()
We can conclude that, on average, students earn less than lecturers, but the error bars tell us that the mean is a better representation of the population for students than for lecturers (there is more variability in lecturers’ income compared to students’).
Task 5.4
Using the same data, plot and interpret error a line chart showing the mean neuroticism for students and lecturers.
The code for the plot in Figure 4 is:
ggplot(student_tib, aes(group, neurotic)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange") +
coord_cartesian(ylim = c(0, 20)) +
scale_y_continuous(breaks = seq(0, 20, 1)) +
labs(x = "Group", y = "Neuroticism") +
theme_minimal()
We can conclude that, on average, students are slightly less neurotic than lecturers.
Task 5.5
Using the same data, plot and interpret a scatterplot with regression lines of alcohol consumption and neuroticism grouped by lecturer/student.
The code for the plot in Figure 5 is:
ggplot(student_tib, aes(alcohol, neurotic, colour = group)) +
geom_point() +
geom_smooth(method = "lm", aes(fill = group), alpha = 0.1) +
coord_cartesian(xlim = c(0, 30), ylim = c(0, 30)) +
scale_x_continuous(breaks = seq(0, 30, 5)) +
scale_y_continuous(breaks = seq(0, 30, 5)) +
labs(x = "Alcohol consumption per week (units)", y = "Neuroticism", colour = "Group", fill = "Group") +
theme_minimal()
We can conclude that for lecturers, as neuroticism increases so does alcohol consumption (a positive relationship), but for students the opposite is true, as neuroticism increases alcohol consumption decreases.
Task 5.6
Using the same data, plot and interpret a scatterplot matrix with regression lines of alcohol consumption, neuroticism and number of friends.
The code for the plot in Figure 5 is:
We can conclude that there is no relationship (flat line) between the number of friends and alcohol consumption; there was a negative relationship between how neurotic a person was and their number of friends ; and there was a slight positive relationship between how neurotic a person was and how much alcohol they drank.
Task 5.7
Using the zhang_2013_subsample.csv data from Chapter 1 (see Smart Alex’s task) plot a clustered error bar chart of the mean test accuracy as a function of the type of name participants completed the test under (x-axis) and whether they were male or female (different coloured bars).
First, let’s load the data (see Tip 2)
zhang_tib <- discovr::zhang_sampleTo find out more about the data execute ?zhang_sample.
To get the plot in Figure 7 execute:
ggplot(zhang_tib, aes(name_type, accuracy, colour = sex)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.5)) +
coord_cartesian(ylim = c(0, 80)) +
scale_y_continuous(breaks = seq(0, 80, 10)) +
labs(x = "Type of name", y = "Accuracy (%)", colour = "Biological sex") +
theme_minimal()
The plot shows that, on average, males did better on the test than females when using their own name (the control) but also when using a fake female name. However, for participants who did the test under a fake male name, the women did better than males.
Task 5.8
Using the teaching.csv data from Chapter 4 (Task 3), plot a clustered error line chart of the mean score when electric shocks were used compared to being nice, and plot males and females as different coloured lines.
First, let’s load the data (see Tip 2):
teaching_tib <- discovr::teachingTo find out more about the data execute ?teaching. To get the plot in Figure 8 execute:
ggplot(teaching_tib, aes(method, mark, colour = sex)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.5)) +
coord_cartesian(ylim = c(0, 20)) +
scale_y_continuous(breaks = seq(0, 20, 1)) +
labs(x = "Method of teaching", y = "Mark (out of 20)", colour = "Biological sex") +
theme_minimal()
We can see that when the being nice method of teaching is used, males and females have comparable scores on their coursework, with females scoring slightly higher than males on average, although their scores are also more variable than the males’ scores as indicated by the longer error bar). However, when an electric shock is used, males score higher than females but there is more variability in the males’ scores than the females’ for this method (as seen by the longer error bar for males than for females). Additionally, the graph shows that females score higher when the being nice method is used compared to when an electric shock is used, but the opposite is true for males. This suggests that there may be an interaction effect of sex.
Task 5.9
Using the shopping_exercise.csv data from Chapter 3, plot two error bar graphs comparing men and women (x-axis): one for the distance walked, and the other of the time spent shopping.
First, let’s load the data (see Tip 2):
shopping_tib <- discovr::shoppingTo find out more about the data execute ?shopping. To get the plot in Figure 9 execute:
ggplot(shopping_tib, aes(sex, distance)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.5)) +
coord_cartesian(ylim = c(0, 5)) +
scale_y_continuous(breaks = seq(0, 5, 1)) +
labs(x = "Biological sex", y = "Distance walked (Miles)") +
theme_minimal()
Looking at the plot in Figure 9, we can see that, on average, females walk longer distances (probably not significantly so) while shopping than males.
To get the plot in Figure 10 execute:
ggplot(shopping_tib, aes(sex, time)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.5)) +
coord_cartesian(ylim = c(0, 260)) +
scale_y_continuous(breaks = seq(0, 260, 20)) +
labs(x = "Biological sex", y = "Time spent shopping (minutes)") +
theme_minimal()
Figure 10 shows that, on average, females spend more time shopping than males (but again, probably not significantly so). The females’ scores are more variable than the males’ scores (longer error bar).
Task 5.10
Using the pets.sav data from Chapter 4 (Task 6), plot two error bar charts comparing scores when having a fish or cat as a pet (x-axis): one for the animal liking variable, and the other for the life satisfaction.
First, let’s load the data (see Tip 2):
pet_tib <- discovr::petsTo find out more about the data execute ?pets. To get the first plot execute:
ggplot(pet_tib, aes(pet, animal)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.5)) +
coord_cartesian(ylim = c(0, 60)) +
scale_y_continuous(breaks = seq(0, 60, 5)) +
labs(x = "Species of pet", y = "Liking of animals") +
theme_minimal()
Figure 11 shows that the mean love of animals was the same for people with cats and fish as pets. To get the second plot execute:
ggplot(pet_tib, aes(pet, life_satisfaction)) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.5)) +
coord_cartesian(ylim = c(0, 80)) +
scale_y_continuous(breaks = seq(0, 80, 10)) +
labs(x = "Species of pet", y = "Life satisfaction") +
theme_minimal()
Figure 12 shows that, on average, life satisfaction was higher for people with a cat as a pety than those with a fish.
Task 5.11
Using the same data as above, plot a scatterplot of animal liking scores against life satisfaction (plot scores for those with fish and cats in different colours).
The code will be something like:
ggplot(pet_tib, aes(animal, life_satisfaction, colour = pet)) +
geom_point() +
geom_smooth(method = "lm", aes(fill = pet), alpha = 0.1) +
coord_cartesian(xlim = c(0, 75), ylim = c(0, 100)) +
scale_x_continuous(breaks = seq(0, 75, 5)) +
scale_y_continuous(breaks = seq(0, 100, 5)) +
labs(x = "Animal liking", y = "Life satisfaction", colour = "Species of pet", fill = "Species of pet") +
theme_minimal()
We can conclude from Figure 13 that for for regardles sof whether someone had a cat or fish as a pet, as love of animals increases so does life satisfaction (a positive relationship).
Task 5.12
Using the tea_15.csv data from Chapter 4, plot a scatterplot showing the number of cups of tea drunk (`
tea) on the x-axis against cognitive functioning (cog_fun) and the y-axis.
First, let’s load the data (see Tip 2):
#
tea15_tib <- discovr::tea_15To find out more about the data execute ?tea_15. The code will be something like:
ggplot(tea15_tib, aes(tea, cog_fun)) +
geom_point() +
geom_smooth(method = "lm") +
coord_cartesian(xlim = c(0, 6), ylim = c(0, 70)) +
scale_x_continuous(breaks = seq(0, 6, 1)) +
scale_y_continuous(breaks = seq(0, 70, 5)) +
labs(x = "Cups of tea drunk (per day)", y = "Cognitive functioning") +
theme_minimal()
The scatterplot (and near-flat line especially) in Figure 14 tells us that there is a tiny relationship (practically zero) between the number of cups of tea drunk per day and cognitive function.
Chapter 6
Task 6.1
Using the
notebook.csvdata from Chapter 5, create and interpret a Q-Q plot for the two films (ignoresex)?
Load the data directly from the discovr package (see Tip 2).
notebook_tib <- discovr::notebookFigure 15 shows the plot.
ggplot(data = notebook_tib, aes(sample = arousal)) +
qqplotr::stat_qq_band(fill = "#C4F134FF", alpha = 0.3) +
qqplotr::stat_qq_line(colour = "#C4F134FF") +
qqplotr::stat_qq_point(colour = "#3E9BFEFF", alpha = 0.9, size = 1) +
labs(x = "Theoretical quantiles", y = "Sample quantiles") +
facet_wrap(~film, ncol = 1, scales = "free") +
theme_minimal()
For both films the expected quantile points are close, on the whole, to those that would be expected from a normal distribution (i.e. the dots fall within the confidence interval for the line).
Task 6.2
The file
r_exam.csvcontains data on students’ performance on an SPSS exam. Four variables were measured:exam(first-year SPSS exam scores as a percentage),computer(measure of computer literacy in percent),lecture(percentage of statistics lectures attended) andnumeracy(a measure of numerical ability out of 15). There is a variable calleduniindicating whether the student attended Sussex University (where I work) or Duncetown University. Compute and interpret summary statistics forexam,computer,lecturesandnumeracyfor the sample as a whole.
Load the data (see Tip 2).
rexam_tib <- discovr::r_examTable 9 shows the table of descriptive statistics for the four variables in this example. From this table, we can see that, on average, students attended nearly 60% of lectures, obtained 58% in their exam, scored only 51% on the computer literacy test, and only 5 out of 15 on the numeracy test. In addition, the confidence interval for computer literacy was relatively narrow compared to that of the percentage of lectures attended and exam scores.
| Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|
| exam | 58.10 | (53.94, 61.82) | 21.32 | 37.00 | (15.00, 99.00) | -0.11 | -1.11 | 100 | 0 |
| computer | 50.71 | (49.11, 52.09) | 8.26 | 10.75 | (27.00, 73.00) | -0.17 | 0.36 | 100 | 0 |
| lectures | 59.77 | (55.21, 63.84) | 21.68 | 28.75 | (8.00, 100.00) | -0.42 | -0.18 | 100 | 0 |
| numeracy | 4.85 | (4.24, 5.37) | 2.71 | 4.00 | (1.00, 14.00) | 0.96 | 0.95 | 100 | 0 |
Task 6.3
Produce histograms for each of the four measures in the previous task and interpret them
The data are in messy format, so first let’s make the data tidy:
rexam_tidy_tib <- rexam_tib |>
pivot_longer(
cols = exam:numeracy,
values_to = "score",
names_to = "measure"
)The data now look like Table 10
DT::datatable(rexam_tidy_tib)To create the plot in Figure 16 use the code below.
ggplot(rexam_tidy_tib, aes(score)) +
geom_histogram(bins = 20, fill = "#C42503FF", colour = "#C42503FF", alpha = 0.2) +
labs(y = "Frequency", x = "Score") +
facet_wrap(~measure, scale = "free") +
theme_minimal()
The histograms show us several things. The exam scores are very interesting because this distribution is quite clearly not normal; in fact, it looks suspiciously bimodal (there are two peaks, indicative of two modes). It looks as though computer literacy is somewhat normally distributed (a few people are very good with computers and a few are very bad, but the majority of people have a similar degree of knowledge) as is the lecture attendance. Both are a little negatively skewed. Finally, the numeracy test has produced very positively skewed data (the majority of people did very badly on this test and only a few did well).
Descriptive statistics and histograms are a good way of getting an instant picture of the distribution of your data. This snapshot can be very useful: for example, the bimodal distribution of exam scores instantly indicates a trend that students are typically either very good at statistics or struggle with it (there are relatively few who fall in between these extremes). Intuitively, this finding fits with the nature of the subject: statistics is very easy once everything falls into place, but before that enlightenment occurs it all seems hopelessly difficult!
Task 6.4
Produce Q-Q plots for the four measures in Task 2.
Figure 17 was created using the code below.
ggplot(rexam_tidy_tib, aes(sample = score)) +
qqplotr::stat_qq_band(fill = "#C4F134FF", alpha = 0.3) +
qqplotr::stat_qq_line(colour = "#C4F134FF") +
qqplotr::stat_qq_point(colour = "#3E9BFEFF", alpha = 0.5, size = 1) +
labs(x = "Theoretical quantiles", y = "Sample quantiles") +
facet_wrap(~measure, scales = "free") +
theme_minimal()
Task 6.5
Use Task 2 and 4 to interpret skew and kurtosis for each of the four measures.
The skewness values are calculated in the table for task 2. For all of the measures the value of skewness is fairly close to zero, indicating the expected value of 0. The exception is numeracy scores for which skew is about 1. This is within acceptable limits. Looking at the Q-Q plots, none of them show the characteristic upward or downward bend associated with skew.
For kurtosis, computer literacy and lecture attendance have kurtosis around the expected value of 3. The Q-Q plots confirm that these measures show the characteristic patterns of normality.
Exam scores have kurtosis below 2 suggesting lighter tails than you’d expect in a normal distribution, which is also shown in the Q-Q plot with observed quantiles at the low end being higher than expected and observed quantiles at the high end being lower than expected.
Numeracy scores have kurtosis close to 4 indicating heavier tails than expected in a normal distribution, but the Q-Q plot looks relatively normal (although observed quantiles rise above the normal line at the low end.
Task 6.6
Produce summary statistics for the four measures separately for each university.
Table 11 shows the summary statistics split by university. From this table it is clear that Sussex students scored higher on both their exam and the numeracy test than their Duncetown counterparts. In fact, looking at the means reveals that, on average, Sussex students scored an amazing 36% more on the exam than Duncetown students, and had higher numeracy scores too (what can I say, my students are the best). Skew statistics are all close to the expected value of 0 and most of the kurtosis values are close to the expected value of 3. The notable exception are the computer literacy scores in the Sussex students, which have a kurtosis value above 4 suggesting heavy tails.
| uni | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Duncetown University | exam | 40.18 | (37.34, 42.82) | 12.59 | 17.50 | (15.00, 66.00) | 0.31 | -0.57 | 50 | 0 |
| Duncetown University | computer | 50.26 | (48.15, 52.70) | 8.07 | 12.25 | (35.00, 67.00) | 0.23 | -0.51 | 50 | 0 |
| Duncetown University | lectures | 56.26 | (48.80, 61.99) | 23.77 | 28.75 | (8.00, 100.00) | -0.31 | -0.38 | 50 | 0 |
| Duncetown University | numeracy | 4.12 | (3.63, 4.66) | 2.07 | 3.25 | (1.00, 9.00) | 0.51 | -0.48 | 50 | 0 |
| Sussex University | exam | 76.02 | (73.62, 78.46) | 10.21 | 12.25 | (56.00, 99.00) | 0.27 | -0.26 | 50 | 0 |
| Sussex University | computer | 51.16 | (49.01, 53.13) | 8.51 | 9.25 | (27.00, 73.00) | -0.54 | 1.38 | 50 | 0 |
| Sussex University | lectures | 63.27 | (57.45, 67.98) | 18.97 | 30.00 | (12.50, 100.00) | -0.36 | -0.22 | 50 | 0 |
| Sussex University | numeracy | 5.58 | (4.82, 6.49) | 3.07 | 5.00 | (1.00, 14.00) | 0.79 | 0.26 | 50 | 0 |
Task 6.7
Produce and interpret histograms for the four measures split by the two universities.
ggplot(rexam_tidy_tib, aes(score)) +
geom_histogram(bins = 20, fill = "#C42503FF", colour = "#C42503FF", alpha = 0.2) +
labs(y = "Frequency", x = "Score") +
facet_grid(uni~measure, scale = "free") +
theme_minimal()
The histograms of these variables split according to the university attended show numerous things. The first interesting thing to note is that for exam marks, the distributions are both fairly normal. This seems odd because the overall distribution was bimodal. However, it starts to make sense when you consider that for Duncetown the distribution is centred around a mark of about 40%, but for Sussex the distribution is centred around a mark of about 76%. This illustrates how important it is to look at distributions within groups. If we were interested in comparing Duncetown to Sussex it wouldn’t matter that overall the distribution of scores was bimodal; all that’s important is that each group comes from a normal population, and in this case it appears to be true. When the two samples are combined, these two normal distributions create a bimodal one (one of the modes being around the centre of the Duncetown distribution, and the other being around the centre of the Sussex data!).
For numeracy scores, the distribution is slightly positively skewed (there is a larger concentration at the lower end of scores) in both the Duncetown and Sussex groups. Therefore, the overall positive skew observed before is due to the mixture of universities.
The remaining distributions look vaguely normal. The only exception is the computer literacy scores for the Sussex students. This is a fairly flat distribution apart from a huge peak between 50 and 60%. It looks potentially slightly heavy-tailed. This suggests positive kurtosis and, in fact kurtosis is above 4 in this group (see previous task).
Chapter 7
Task 7.1
A student was interested in whether there was a positive relationship between the time spent doing an essay and the mark received. He got 45 of his friends and timed how long they spent writing an essay (
hours) and the percentage they got in the essay (essay). He also translated these grades into their degree classifications (grade): in the UK, a student can get a first-class mark (the best), an upper-second-class mark, a lower second, a third, a pass or a fail (the worst). Using the data in the fileessay_marks.csvfind out what the relationship was between the time spent doing an essay and the eventual mark in terms of percentage and degree class (draw a scatterplot too).
Load the data using (see Tip 2):
essay_tib <- discovr::essay_marksWe’re interested in looking at the relationship between hours spent on an essay and the grade obtained. We could create a scatterplot of hours spent on the essay (x-axis) and essay mark (y-axis). In Figure 19 I’ve chosen to highlight the degree classification grades using different colours.
ggplot(essay_tib, aes(hours, essay, colour = grade)) +
geom_point() +
geom_smooth(method = "lm", alpha = 0.1) +
coord_cartesian(xlim = c(0, 15), ylim = c(20, 80)) +
scale_x_continuous(breaks = seq(0, 15, 1)) +
scale_y_continuous(breaks = seq(20, 80, 5)) +
labs(x = "Time spent on essay (hours)", y = "Essay mark (%)", colour = "Grade", fill = "Grade") +
theme_minimal()
To get the correlation:
essay_r <- correlation(data = essay_tib, select = c("hours", "essay"))Table 13 indicate that the relationship between time spent writing an essay and grade awarded was not significant, \(r\) = 0.27 (-0.03, 0.52), t(43) = 1.81, p = 0.077.
display(essay_r)| Parameter1 | Parameter2 | r | 95% CI | t(43) | p |
|---|---|---|---|---|---|
| hours | essay | 0.27 | (-0.03, 0.52) | 1.81 | 0.077 |
p-value adjustment method: Holm (1979) Observations: 45
The second part of the question asks us to do the same analysis but when the percentages are recoded into degree classifications. The degree classifications are ordinal data (not interval): they are ordered categories. So we shouldn’t use Pearson’s test statistic, but Spearman’s and Kendall’s ones instead. grade is a factor:
essay_tib$grade [1] Upper second class First class Third class First class
[5] Lower second class Upper second class Upper second class Upper second class
[9] Upper second class Third class Upper second class First class
[13] First class Lower second class Lower second class Upper second class
[17] Upper second class Upper second class Upper second class Lower second class
[21] Lower second class First class Upper second class Upper second class
[25] First class Upper second class Lower second class Lower second class
[29] Lower second class First class Upper second class Upper second class
[33] Upper second class First class First class Upper second class
[37] Upper second class Upper second class Upper second class Upper second class
[41] Upper second class Upper second class Lower second class Lower second class
[45] First class
Levels: First class Upper second class Lower second class Third class
We need to convert it into numbers using as.numeric:
essay_tib <- essay_tib |>
mutate(
grade_num = as.numeric(grade)
)Let’s check the numeric grade matches the description (i.e. first = 1, upper second = 2 etc.):
| id | essay | hours | grade | grade_num |
|---|---|---|---|---|
| qxbcjg | 61.68 | 10.63 | Upper second class | 2 |
| csykgu | 69.55 | 7.29 | First class | 1 |
| npxgpl | 48.23 | 5.05 | Third class | 4 |
| gomiwl | 70.68 | 2.89 | First class | 1 |
| qdywaa | 59.90 | 9.55 | Lower second class | 3 |
| cmaprb | 61.16 | 11.31 | Upper second class | 2 |
| nrvhix | 67.62 | 7.47 | Upper second class | 2 |
| ybytlf | 64.78 | 8.47 | Upper second class | 2 |
| talwvg | 63.21 | 8.72 | Upper second class | 2 |
| ceylea | 49.69 | 6.20 | Third class | 4 |
| berjky | 63.72 | 4.77 | Upper second class | 2 |
| cjxgpl | 72.24 | 10.98 | First class | 1 |
| gufenh | 70.59 | 5.22 | First class | 1 |
| uaxxdm | 58.64 | 6.86 | Lower second class | 3 |
| unnhpl | 58.72 | 9.80 | Lower second class | 3 |
Now the correlation:
grade_r <- correlation(data = essay_tib, select = c("hours", "grade_num"), method = "spearman")Table 15 shows no significant relationship between degree grade classification for an essay and the time spent doing it, \(r_\text{s}\) = -0.19 (-0.47, 0.12), p = 0.204. Note that the direction of the relationship has reversed. This has happened because the essay marks were recoded as 1 (first), 2 (upper second), 3 (lower second), and 4 (third), so high grades were represented by low numbers.
display(grade_r)| Parameter1 | Parameter2 | rho | 95% CI | S | p |
|---|---|---|---|---|---|
| hours | grade_num | -0.19 | (-0.47, 0.12) | 18110.93 | 0.204 |
p-value adjustment method: Holm (1979) Observations: 45
Task 7.2
Using the
notebook.csvdata from Chapter 3, find out if the size of relationship between the participant’s gender identity and arousal.
Load the data using (see Tip 2):
notebook_tib <- discovr::notebookgender_identity is a categorical variable with two categories (in this dataset), therefore, we need to quantify this relationship using a point-biserial correlation. First, I’ll recode gender_identity into a new variable vcalled gi_bin and represent ‘Man’ with the number 0 and ‘woman’ with the number 1.
Now the correlation:
notebook_r <- correlation(data = notebook_tib, select = c("arousal", "gi_bin"))Table 17 shows that there was no significant relationship between gender identity and arousal, \(r_\text{pb}\) = -0.20 (-0.48, 0.12), t(38) = -1.23, p = 0.226.
display(notebook_r)| Parameter1 | Parameter2 | r | 95% CI | t(38) | p |
|---|---|---|---|---|---|
| arousal | gi_bin | -0.20 | (-0.48, 0.12) | -1.23 | 0.226 |
p-value adjustment method: Holm (1979) Observations: 40
Task 7.3
Using the notebook data again, quantify the relationship between the film watched and arousal.
film is a categorical variable with two categories (in this dataset), so as with the previous task I’ll recode into a new variable and represent ‘The Notebook’ with the number 0 and ‘woman’ with the number 1.
Now the correlation:
film_r <- correlation(data = notebook_tib, select = c("arousal", "film_bin"))Table 18 shows that there was a significant relationship between the film watched and arousal, \(r_\text{pb}\) = -0.86 (-0.93, -0.76), t(38) = -10.61, p < 0.001. Looking at how the groups were coded, you should see that The Notebook had a code of 0, and the documentary about notebooks had a code of 1, therefore the negative coefficient reflects the fact that as film goes up (changes from 0 to 1) arousal goes down. Put another way, as the film changes from The Notebook to a documentary about notebooks, arousal decreases. So The Notebook gave rise to the greater arousal levels.
display(film_r)| Parameter1 | Parameter2 | r | 95% CI | t(38) | p |
|---|---|---|---|---|---|
| arousal | film_bin | -0.86 | (-0.93, -0.76) | -10.61 | < .001*** |
p-value adjustment method: Holm (1979) Observations: 40
Task 7.4
As a statistics lecturer I am interested in the factors that determine whether a student will do well on a statistics course. Imagine I took 25 students and looked at their grades for my statistics course at the end of their first year at university: first, upper second, lower second and third class (see Task 1). I also asked these students what grade they got in their high school maths exams. In the UK GCSEs are school exams taken at age 16 that are graded A, B, C, D, E or F (an A grade is the best). The data for this study are in the file
grades.csv. To what degree does GCSE maths grade correlate with first-year statistics grade?
Load the data using Tip 2. The categories in the data file have a meaningful order, so after importing the data we transform stats and gcse to factors but also order the factor levels in the correct order (this is important!).
grade_tib <- read_csv("../data/grades.csv") |>
mutate(
stats = as_factor(stats) |> fct_relevel("First class", "Upper second class", "Lower second class", "Third class", "Pass", "Fail"),
gcse = as_factor(gcse) |> fct_relevel("A", "B", "C", "D", "E", "F")
)Alternative, load the data directly from the discovr package:
grade_tib <- discovr::gradesLet’s look at these variables. In the UK, GCSEs are school exams taken at age 16 that are graded A, B, C, D, E or F. These grades are categories that have an order of importance (an A grade is better than all of the lower grades). In the UK, a university student can get a first-class mark, an upper second, a lower second, a third, a pass or a fail. These grades are categories, but they have an order to them (an upper second is better than a lower second). When you have categories like these that can be ordered in a meaningful way, the data are said to be ordinal. The data are not interval, because a first-class degree encompasses a 30% range (70–100%), whereas an upper second only covers a 10% range (60–70%). When data have been measured at only the ordinal level they are said to be non-parametric and Pearson’s correlation is not appropriate. Therefore, the Spearman correlation coefficient is used.
In the file, the scores are in two columns: one labelled stats and one labelled gcse (see Table 19).
DT::datatable(grade_tib)The first thing we need to do is to convert stats and gcse to numbers using as.numeric, which will return the numeric value attached to each factor level:
grade_tib <- grade_tib |>
mutate(
stat_num = as.numeric(stats),
gcse_num = as.numeric(gcse)
)Let’s check the numeric grade matches the description (i.e. first = 1, upper second = 2 etc.) by looking at Table 20.
| id | stats | gcse | stat_num | gcse_num |
|---|---|---|---|---|
| wbmew | First class | A | 1 | 1 |
| ypusy | First class | A | 1 | 1 |
| sbumv | First class | C | 1 | 3 |
| wqicp | Upper second class | C | 2 | 3 |
| xnobi | Upper second class | C | 2 | 3 |
| glxdw | Upper second class | D | 2 | 4 |
| eomcn | Upper second class | E | 2 | 5 |
| bjhcl | Lower second class | A | 3 | 1 |
| wcype | Lower second class | B | 3 | 2 |
| grkvu | Lower second class | C | 3 | 3 |
Now the correlation
statgrade_r <- correlation(data = grade_tib, select = c("stat_num", "gcse_num"), method = "spearman")Table 21 shows the result. In the question I predicted that better grades in GCSE maths would correlate with better degree grades for my statistics course. This hypothesis is directional and so a one-tailed test could be selected; however, in the chapter I advised against one-tailed tests so I have done two-tailed. We could report these results as follows:
- There was a significant positive relationship between a person’s statistics grade and their GCSE maths grade, \(r_\text{s}\) = 0.45 (0.06, 0.73), p = 0.022 $r_.
display(statgrade_r)| Parameter1 | Parameter2 | rho | 95% CI | S | p |
|---|---|---|---|---|---|
| stat_num | gcse_num | 0.45 | (0.06, 0.73) | 1418.03 | 0.022* |
p-value adjustment method: Holm (1979) Observations: 25
Task 7.5
In Figure 3.4 we saw some data relating to people’s ratings of dishonest acts and the likeableness of the perpetrator (for a full description see Jane Superbrain Box 3.1). Compute the Spearman correlation between ratings of dishonesty and likeableness of the perpetrator. The data are in
honesty_lab.csv.
Load the data using (see Tip 2):
honesty_tib <- discovr::honesty_labEstimate the correlation:
honest_r <- correlation(data = honesty_tib, select = c("deed", "likeableness"), method = "spearman")Table 22 shows the result. The relationship between ratings of dishonesty and likeableness of the perpetrator was significant because the p-value is less than 0.05 (p = 0.000). The value of Spearman’s correlation coefficient is quite large and positive, indicating a large positive effect: the more likeable the perpetrator was, the more positively their dishonest acts were viewed.
We could report the results as:
- There was a large and significant positive relationship between the likeableness of a perpetrator and how positively their dishonest acts were viewed, \(r_\text{s}\) = 0.84 (0.77, 0.89), p < 0.001.
display(honest_r)| Parameter1 | Parameter2 | rho | 95% CI | S | p |
|---|---|---|---|---|---|
| deed | likeableness | 0.84 | (0.77, 0.89) | 26054.24 | < .001*** |
p-value adjustment method: Holm (1979) Observations: 100
Task 7.6
In Chapter 4 (Task 6) we looked at data from people who had fish or cats as pets and measured their life satisfaction and, also, how much they like animals (
pets.csv). Is there a significant correlation between life satisfaction and the type of animal the person had as a pet?
Load the data using (see Tip 2):
pet_tib <- discovr::petsPet is a categorical variable with two categories (cat or fish). Therefore, we need to look at this relationship using a point-biserial correlation. The code below creates the variable pet_bin within pet_tib by recoding ‘Fish’ to a 0 and ’Cat` to a 1.The correlation coefficients is estimated using these numeric values.
| Parameter1 | Parameter2 | r | 95% CI | t(18) | p |
|---|---|---|---|---|---|
| life_satisfaction | pet_bin | 0.63 | (0.26, 0.84) | 3.45 | 0.003** |
p-value adjustment method: Holm (1979) Observations: 20
I used a two-tailed test because one-tailed tests should never really be used (see book chapter for more explanation). As you can see (Table 23) there was a significant relationship between type of pet and life satisfaction because our p-value is less than 0.05, \(r_\text{pb}\) = 0.63 (0.26, 0.84), t(18) = 3.45, p = 0.003. Looking at how the groups were coded, you should see that fish had a code of 0 and cat had a code of 1, therefore this result reflects the fact that as pet ‘goes up’ one unit (changes from 0 to 1) life satisfaction goes up. Put another way, as the type of pet changes from fish to cat, life satisfaction increases. So, having a cat as a pet was associated with greater life satisfaction than having a fish.
Task 7.7
Repeat the analysis above taking account of animal liking when computing the correlation between life satisfaction and the animal to which a person was married.
We can conduct a partial correlation between life satisfaction and the type of pet while ‘adjusting’ for the effect of liking animals. Remember that we added the variable pet_bin to pet_tib in the previous solution. The results are in Table 24. Notice that the partial correlation between pet and life satisfaction is 0.70, which is greater than the correlation when the effect of animal liking is not adjusted for (r = 0.630 from the previous task). The correlation has become more statistically significant (its p-value has decreased from 0.003 to 0.002).
Running this analysis has shown us that type of pet alone explains a large portion of the variation in life satisfaction. In other words, the relationship between pet and life satisfaction is not due to animal liking.
| Parameter1 | Parameter2 | r | 95% CI | t(18) | p |
|---|---|---|---|---|---|
| life_satisfaction | pet_bin | 0.701 | (0.375, 0.873) | 4.173 | 0.002** |
| life_satisfaction | animal | 0.615 | (0.236, 0.831) | 3.306 | 0.008** |
| pet_bin | animal | -0.399 | (-0.715, 0.053) | -1.846 | 0.081 |
p-value adjustment method: Holm (1979) Observations: 20
Task 7.8
In Chapter 4 (Task 7) we looked at data based on findings that the number of cups of tea drunk was related to cognitive functioning Feng et al. (2010). The data are in the file
tea_15.csv. What is the correlation between tea drinking and cognitive functioning? Is there a significant effect?
Load the data using (see Tip 2):
tea15_tib <- discovr::tea_15Because the numbers of cups of tea and cognitive function are both interval variables, we can conduct a Pearson’s correlation coefficient.
tea15_r <- correlation(data = tea15_tib, select = c("tea", "cog_fun"))Table 25 shows the result. The relationship between number of cups of tea drunk per day and cognitive function was not significant, \(r\) = 0.08 (-0.45, 0.57), t(13) = 0.28, p = 0.783. We can tell this because our p-value is greater than 0.05. Note, the correlation itself is small.
display(tea15_r)| Parameter1 | Parameter2 | r | 95% CI | t(13) | p |
|---|---|---|---|---|---|
| tea | cog_fun | 0.08 | (-0.45, 0.57) | 0.28 | 0.783 |
p-value adjustment method: Holm (1979) Observations: 15
Task 7.9
The research in the previous task was replicated but in a larger sample (N = 716), which is the same as the sample size in Feng et al.’s research (tea_716.csv). Conduct a correlation between tea drinking and cognitive functioning. Compare the correlation coefficient and significance in this large sample, with the previous task. What statistical point do the results illustrate?
Load the data using (see Tip 2):
tea716_tib <- discovr::tea_716Because the numbers of cups of tea and cognitive function are both interval variables, we can conduct a Pearson’s correlation coefficient.
tea716_r <- correlation(data = tea716_tib, select = c("tea", "cog_fun"))Table 26 shows the result. We can see that although the value of Pearson’s r has not changed, the relationship between the number of cups of tea drunk per day and cognitive function is now just significant, \(r\) = 0.08 (0.00, 0.15), t(714) = 2.08, p = 0.038 (p is now less than 0.05). This example indicates one of the downfalls of significance testing; you can get significant results when you have large sample sizes even if the effect is very small. Basically, whether you get a significant result or not is entirely subject to the sample size.
display(tea716_r)| Parameter1 | Parameter2 | r | 95% CI | t(714) | p |
|---|---|---|---|---|---|
| tea | cog_fun | 0.08 | (4.41e-03, 0.15) | 2.08 | 0.038* |
p-value adjustment method: Holm (1979) Observations: 716
Task 7.10
In Chapter 6 we looked at hygiene scores over three days of a rock music festival (
download_festival.csv). Using Spearman’s correlation, were hygiene scores on day 1 of the festival significantly correlated with those on day 3?
Load the data using (see Tip 2):
download_tib <- discovr::downloadEstimate the correlation:
download_r <- correlation(data = download_tib, select = c("day_1", "day_3"), method = "spearman")Table 27 shows the result. The hygiene scores on day 1 of the festival correlated significantly with hygiene scores on day 3, \(r_\text{s}\) = 0.34 (0.17, 0.50), p < 0.001. The value of Spearman’s correlation coefficient is a positive value suggesting that the smellier you are on day 1, the smellier you will be on day 3.
display(honest_r)| Parameter1 | Parameter2 | rho | 95% CI | S | p |
|---|---|---|---|---|---|
| deed | likeableness | 0.84 | (0.77, 0.89) | 26054.24 | < .001*** |
p-value adjustment method: Holm (1979) Observations: 100
Task 7.11
Using the data in
shopping_exercise.csv(Chapter 4, Task 5), find out if there is a significant relationship between the time spent shopping and the distance covered.
Load the data using (see Tip 2):
shopping_tib <- discovr::shoppingThe variables time and distance are both interval. Therefore, we can conduct a Pearson’s correlation. I chose a two-tailed test because it is never really appropriate to conduct a one-tailed test (see the book chapter).
shop_r <- correlation(data = shopping_tib, select = c("distance", "time"))Table 28 shows that there was a significant positive relationship between time spent shopping and distance covered, \(r\) = 0.83 (0.42, 0.96), t(8) = 4.21, p = 0.003. We can tell that the relationship was significant because the p-value is smaller than 0.05. Also, our value for Pearson’s r is very large.
display(shop_r)| Parameter1 | Parameter2 | r | 95% CI | t(8) | p |
|---|---|---|---|---|---|
| distance | time | 0.83 | (0.42, 0.96) | 4.21 | 0.003** |
p-value adjustment method: Holm (1979) Observations: 10
Task 7.12
What effect does accounting for the participant’s biological sex have on the relationship between the time spent shopping and the distance covered?
To answer this question, we need to conduct a partial correlation between the time spent shopping (interval variable) and the distance covered (interval variable) while ‘adjusting’ for the effect of sex (dichotomous variable). First, let’s turn sex into a numeric variable:
The data look like this:
| sex | distance | time | sex_bin |
|---|---|---|---|
| Male | 0.16 | 15 | 0 |
| Male | 0.40 | 30 | 0 |
| Male | 1.36 | 37 | 0 |
| Male | 1.99 | 65 | 0 |
| Male | 3.61 | 103 | 0 |
| Female | 1.40 | 22 | 1 |
| Female | 1.81 | 140 | 1 |
| Female | 1.96 | 160 | 1 |
| Female | 3.02 | 183 | 1 |
| Female | 4.82 | 245 | 1 |
Now lets estimate the partial correlation:
shop_rp <- correlation(data = shopping_tib, select = c("distance", "time", "sex_bin"), partial = TRUE)Table 30 shows that there was still a significant positive relationship between time spent shopping and distance covered, \(r\) = 0.82 (0.39, 0.96), t(8) = 4.06, p = 0.011. The partial correlation between time and distance is 0.82, which is slightly smaller than the correlation when the effect of sex is not adjusted for (r = 0.83). The correlation has become slightly less statistically significant (its p-value has increased from p = 0.003 to p = 0.011). However, not a lot has changed, so even after adjusting for biological sex, the relationship between distance covered and time spent shopping is very strong.
display(shop_rp)| Parameter1 | Parameter2 | r | 95% CI | t(8) | p |
|---|---|---|---|---|---|
| distance | time | 0.82 | (0.39, 0.96) | 4.06 | 0.011* |
| distance | sex_bin | -0.35 | (-0.80, 0.36) | -1.06 | 0.320 |
| time | sex_bin | 0.64 | (0.02, 0.91) | 2.38 | 0.089 |
p-value adjustment method: Holm (1979) Observations: 10
Chapter 8
Task 8.1
In Chapter 7 (Task 9) we looked at data based on findings that the number of cups of tea drunk was related to cognitive functioning Feng et al. (2010). Using a linear model that predicts cognitive functioning from tea drinking, what would cognitive functioning be if someone drank 10 cups of tea? Is there a significant effect? (
tea_716.csv)
Load the data using (see Tip 2):
tea716_tib <- discovr::tea_716Fit the model and display the parameter estimates
| Parameter | Coefficient | SE | 95% CI | t(714) | p |
|---|---|---|---|---|---|
| (Intercept) | 49.22 | 0.76 | (47.72, 50.72) | 64.38 | < .001 |
| tea | 0.46 | 0.22 | (0.03, 0.89) | 2.08 | 0.038 |
Looking at Table 31, we can see that we have a model that significantly improves our ability to predict cognitive functioning. The positive beta value (0.46) indicates a positive relationship between number of cups of tea drunk per day and level of cognitive functioning, in that the more tea drunk, the higher your level of cognitive functioning. We can then use the model to predict level of cognitive functioning after drinking 10 cups of tea per day. The first stage is to define the model by replacing the b-values in the equation below with the values from Table 31. In addition, we can replace the X and Y with the variable names so that the model becomes:
\[ \begin{aligned} \text{Cognitive functioning}_i &= b_0 + b_1 \text{Tea drinking}_i \\\\ &= 49.22 +(0.46 \times \text{Tea drinking}_i) \end{aligned} \]
We can predict cognitive functioning, by replacing Tea drinking in the equation with the value 10:
\[ \begin{aligned} \text{Cognitive functioning}_i &= 49.22 +(0.460 \times \text{Tea drinking}_i) \\ &= 49.22 +(0.460 \times 10) \\ &= 53.82 \end{aligned} \]
Therefore, if you drank 10 cups of tea per day, your level of cognitive functioning would be 53.82.
Task 8.2
Estimate a linear model for the
pubs.csvdata in Jane Superbrain Box 8.1 predicting mortality from the number of pubs. Try repeating the analysis but bootstrapping the confidence intervals.
Load the data using (see Tip 2):
pubs_tib <- discovr::pubsFit the model
pub_lm <- lm(mortality ~ pubs, data = pubs_tib, na.action = na.exclude)Table 33 shows the parameter estimates. Looking at the output, we can see that the number of pubs significantly predicts mortality, \(\hat{b}\) = 14.34 (3.82, 24.86), t(6) = 3.33, p = 0.016. The positive beta value indicates a positive relationship between number of pubs and death rate in that, the more pubs in an area, the higher the rate of mortality (as we would expect). The value of \(R^2\) (Table 32) tells us that number of pubs accounts for 64.9% of the variance in mortality rate – that’s over half!
model_performance(pub_lm) |>
display()| AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|---|
| 146.9 | 152.9 | 147.1 | 0.65 | 0.59 | 1614.64 | 1864.43 |
model_parameters(pub_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(6) | p |
|---|---|---|---|---|---|
| (Intercept) | 3351.96 | 781.24 | (1440.34, 5263.57) | 4.29 | 0.005 |
| pubs | 14.34 | 4.30 | (3.82, 24.86) | 3.33 | 0.016 |
The bootstrapped confidence intervals (Table 34) are both positive values – they do not cross zero – then assuming this interval is one of the 95% that contain the population value we can gain confidence that there is a positive and non-zero relationship between number of pubs in an area and its mortality rate.
model_parameters(pub_lm, bootstrap = TRUE) |>
display()| Parameter | Coefficient | 95% CI | p |
|---|---|---|---|
| (Intercept) | 2861.83 | (-1.93e-12, 4794.23) | 0.504 |
| pubs | 14.99 | (10.72, 100.00) | < .001 |
Task 8.3
In Jane Superbrain Box 2.1 we encountered data (
honesty_lab.csv) relating to people’s ratings of dishonest acts and the likeableness of the perpetrator. Run a linear model with bootstrapping to predict ratings of dishonesty from the likeableness of the perpetrator.
Load the data using (see Tip 2):
honesty_tib <- discovr::honesty_labFit the model
honest_lm <- lm(deed ~ likeableness, data = honesty_tib, na.action = na.exclude)Looking at Table 36 we can see that the likeableness of the perpetrator significantly predicts ratings of dishonest acts, \(\hat{b}\) = 0.94 (0.81, 1.07), t(98) = 14.80, p < 0.001. The positive beta value indicates a positive relationship between likeableness of the perpetrator and ratings of dishonesty, in that, the more likeable the perpetrator, the more positively their dishonest acts were viewed (remember that dishonest acts were measured on a scale from 0 = appalling behaviour to 10 = it’s OK really). The value of \(R^2\) (Table 35) tells us that likeableness of the perpetrator accounts for 69.1% of the variance in the rating of dishonesty, which is over half.
model_performance(honest_lm) |>
display()| AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|---|
| 324.8 | 325.1 | 332.6 | 0.69 | 0.69 | 1.19 | 1.20 |
model_parameters(honest_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(98) | p |
|---|---|---|---|---|---|
| (Intercept) | -1.86 | 0.33 | (-2.51, -1.21) | -5.67 | < .001 |
| likeableness | 0.94 | 0.06 | (0.81, 1.07) | 14.80 | < .001 |
The bootstrapped confidence intervals (Table 37) do not cross zero, then assuming this interval is one of the 95% that contain the population value we can gain confidence that there is a non-zero relationship between the likeableness of the perpetrator and ratings of dishonest acts.
model_parameters(honest_lm, bootstrap = TRUE) |>
display()| Parameter | Coefficient | 95% CI | p |
|---|---|---|---|
| (Intercept) | -1.86 | (-2.45, -1.31) | < .001 |
| likeableness | 0.94 | (0.81, 1.06) | < .001 |
Task 8.4
A fashion student was interested in factors that predicted the salaries of catwalk models. She collected data from 231 models (supermodel.csv). For each model she asked them their salary per day (
salary), their age (age), their length of experience as models (years) and their industry status as a model as reflected in their percentile position rated by a panel of experts (status). Use a linear model to see which variables predict a model’s salary. How valid is the model?
Load the data using (see Tip 2):
model_tib <- discovr::supermodelFit the model:
model_lm <- lm(salary ~ age + years + status, data = model_tib, na.action = na.exclude)To begin with, a sample size of 231 with three predictors seems reasonable because this would easily detect medium to large effects (see the diagram in the chapter). Overall, the model accounts for 18.4% of the variance in salaries and is a significant fit to the data, F(3, 227) = 17.07, p < 0.001. The adjusted \(R^2\) (0.17) shows some shrinkage from the unadjusted value (0.18), indicating that the model may not generalize well ((Table 38, Table 39).
Table 40 shows the parameter estimates. It seems as though salaries are significantly predicted by the age of the model. This is a positive relationship (look at the sign of the \(\hat{b}\)), indicating that as age increases, salaries increase too. The number of years spent as a model also seems to significantly predict salaries, but this is a negative relationship indicating that the more years you’ve spent as a model, the lower your salary. This finding seems very counter-intuitive, but we’ll come back to it later. Finally, the status of the model doesn’t seem to predict salaries significantly.
model_performance(model_lm) |>
display()| AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|---|
| 1899.3 | 1899.5 | 1916.5 | 0.18 | 0.17 | 14.45 | 14.57 |
test_wald(model_lm) |>
display()| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| Null model | lm | 230 | |||
| Full model | lm | 227 | 3 | 17.07 | < .001 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
model_parameters(model_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(227) | p |
|---|---|---|---|---|---|
| (Intercept) | -60.89 | 16.50 | (-93.40, -28.38) | -3.69 | < .001 |
| age | 6.23 | 1.41 | (3.45, 9.02) | 4.42 | < .001 |
| years | -5.56 | 2.12 | (-9.74, -1.38) | -2.62 | 0.009 |
| status | -0.20 | 0.15 | (-0.50, 0.10) | -1.29 | 0.199 |
The next part of the question asks whether this model is valid. Figure 20 shows the model residuals.
- The scatterplots of fitted values vs residuals do not show a random pattern. There is a distinct funnelling, and the trend line is not flat, indicating heteroscedasticity.
- The normal Q-Q plot shows a highly non-normal distribution of residuals because the dots deviate considerably from the straight line (which indicates what you’d get from normally distributed residuals).
- For the
ageandexperiencevariables in the model, VIF values are above 10, indicating multicollinearity in the data. In fact, the correlation between these two variables is around .9! So, these two variables are measuring very similar things. Of course, this makes perfect sense because the older a model is, the more years she would’ve spent modelling! So, it was fairly daft to measure both of these things! This also explains the weird result that the number of years spent modelling negatively predicted salary (i.e. more experience = less salary!): in fact if you fit a model with experience as the only predictor of salary you’ll find it has the expected positive relationship. This hopefully demonstrates why multicollinearity can bias the regression model.
All in all, several assumptions have not been met and so this model is probably fairly unreliable.
check_model(model_lm)
Task 8.5
A study was carried out to explore the relationship between aggression and several potential predicting factors in 666 children who had an older sibling. Variables measured were
parenting_style(high score = bad parenting practices),computer_games(high score = more time spent playing computer games),television(high score = more time spent watching television),diet(high score = the child has a good diet low in harmful additives) andsibling_aggression(high score = more aggression seen in their older sibling). Past research indicated that parenting style and sibling aggression were good predictors of the level of aggression in the younger child. All other variables were treated in an exploratory fashion. Analyse them with a linear model (child_aggression.csv).
Load the data using (see Tip 2):
aggress_tib <- discovr::child_aggressionFit the models. We need to conduct this analysis hierarchically, entering parenting_style and sibling_aggression in the first step and the remaining variables in a second step.
The second model is a significantly better fit than the first, F(3, 66) = 7.03, p < 0.001, so we’ll retain that model (Table 41). \(R^2\) is the squared correlation between the observed values of aggression and the values of aggression predicted by the model (Table 41). With all four of these predictors in the model only 8.3% of the variance in aggression can be explained.
test_wald(aggress_lm, aggress_full_lm) |>
display()| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| aggress_lm | lm | 663 | |||
| aggress_full_lm | lm | 660 | 3 | 7.03 | < .001 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
model_performance(aggress_full_lm) |>
display()| AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|---|
| 325.4 | 325.6 | 356.9 | 0.08 | 0.08 | 0.31 | 0.31 |
Table 43 shows the parameter estimates. Based on the final model (which is actually all we’re interested in) the following variables predict aggression:
- Parenting style significantly predicted aggression, \(\hat{b}\) = 0.06 (0.03, 0.09), t(66) = 3.89, p < 0.001. The beta value indicates that as parenting increases (i.e. as bad practices increase), aggression increases also.
- Sibling aggression significantly predicted aggression, \(\hat{b}\) = 0.08 (0.01, 0.16), t(66) = 2.11, p = 0.036. The beta value indicates that as sibling aggression increases (became more aggressive), aggression increases also.
- Television did not significantly predict aggression, \(\hat{b}\) = 0.03 (-0.06, 0.12), t(66) = 0.71, p = 0.475.
- Computer games significantly predicted aggression, \(\hat{b}\) = 0.14 (0.07, 0.21), t(66) = 3.85, p < 0.001. The beta value indicates that as the time spent playing computer games increases, aggression increases also.
- Good diet significantly predicted aggression, \(\hat{b}\) = -0.11 (-0.18, -0.03), t(66) = -2.86, p = 0.004. The beta value indicates that as the diet improved, aggression decreased.
model_parameters(agress_full_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(660) | p |
|---|---|---|---|---|---|
| (Intercept) | -4.99e-03 | 0.01 | (-0.03, 0.02) | -0.42 | 0.677 |
| parenting style | 0.06 | 0.01 | (0.03, 0.09) | 3.89 | < .001 |
| sibling aggression | 0.08 | 0.04 | (5.54e-03, 0.16) | 2.11 | 0.036 |
| television | 0.03 | 0.05 | (-0.06, 0.12) | 0.71 | 0.475 |
| computer games | 0.14 | 0.04 | (0.07, 0.21) | 3.85 | < .001 |
| diet | -0.11 | 0.04 | (-0.18, -0.03) | -2.86 | 0.004 |
Figure 21 shows the residual plots.
- The scatterplot of predicted values vs residuals helps us to assess both homoscedasticity and independence of errors. The plot does show a random scatter, but the trend line is not completely flat. It would be useful to run a robust model. Using robust standard errors, sibling aggression is no longer significant (Table 44).
- The Q-Q plot suggests that errors may well deviate from normally distributed.
- The influence plot looks fine. There are no extreme Cook’s distances and the trend line is fairly flat.
check_model(agress_full_lm)
model_parameters(agress_full_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(660) | p |
|---|---|---|---|---|---|
| (Intercept) | -4.99e-03 | 0.01 | (-0.03, 0.02) | -0.42 | 0.675 |
| parenting style | 0.06 | 0.02 | (0.02, 0.09) | 2.93 | 0.003 |
| sibling aggression | 0.08 | 0.05 | (-7.26e-03, 0.17) | 1.80 | 0.072 |
| television | 0.03 | 0.06 | (-0.08, 0.15) | 0.55 | 0.581 |
| computer games | 0.14 | 0.05 | (0.05, 0.24) | 2.96 | 0.003 |
| diet | -0.11 | 0.05 | (-0.21, -0.01) | -2.19 | 0.029 |
Task 8.6
Repeat the analysis in Labcoat Leni’s Real Research 8.1 using bootstrapping for the confidence intervals. What are the confidence intervals for the regression parameters?
To recap, the models are fit as follows (see also the Labcoat Leni answers). Load the data using (see Tip 2):
ong_tib <- discovr::ong_2011Facebook status update frequency
Fit the models that look at whether narcissism predicts, above and beyond the other variables, the frequency of status updates.
Table 45 compares the models and shows that both adding extraversion and narcissism significantly improve the fit of the model.
test_wald(ong_base_lm, ong_ext_lm, ong_ncs_lm) |>
display()| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| ong_base_lm | lm | 246 | |||
| ong_ext_lm | lm | 245 | 1 | 4.02 | 0.046 |
| ong_ncs_lm | lm | 244 | 1 | 8.98 | 0.003 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 47 shows the parameter estimates and bootstrap confidence intervals and significance tests. The main benefit of the bootstrap confidence intervals and significance values is that they do not rely on assumptions of normality or homoscedasticity, so they give us an accurate estimate of the true population value of b for each predictor. The bootstrapped confidence intervals in the output do not affect the conclusions reported in Ong et al. (2011). Ong et al.’s prediction was still supported in that, after controlling for age, grade and sex, narcissism significantly predicted the frequency of Facebook status updates over and above extroversion, \(\hat{b}\) = 0.07 (0.02, 0.11), p < 0.001.
model_parameters(model = ong_ncs_lm, bootstrap = TRUE) |>
display()| Parameter | Coefficient | 95% CI | p |
|---|---|---|---|
| (Intercept) | 0.03 | (-5.77, 5.59) | 0.994 |
| sex (Male) | -0.96 | (-1.60, -0.36) | < .001 |
| age | -0.01 | (-0.34, 0.39) | 0.924 |
| grade (Sec 2) | -0.42 | (-1.30, 0.55) | 0.384 |
| grade (Sec 3) | -1.04 | (-2.16, -2.20e-03) | 0.050 |
| extraversion | 0.01 | (-0.05, 0.07) | 0.650 |
| narcissism | 0.07 | (0.02, 0.11) | < .001 |
Facebook profile picture rating
Fit the models that look at whether narcissism predicts, above and beyond the other variables, the Facebook profile picture ratings.
Table 48 compares the models and shows that both adding extraversion and narcissism significantly improve the fit of the model.
test_wald(prof_base_lm, prof_ext_lm, prof_ncs_lm) |>
display()| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| prof_base_lm | lm | 188 | |||
| prof_ext_lm | lm | 187 | 1 | 29.61 | < .001 |
| prof_ncs_lm | lm | 186 | 1 | 21.60 | < .001 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 50 shows the parameter estimates and bootstrap confidence intervals and significance tests. The bootstrapped confidence intervals for the final model are consistent with the conclusions reported in Ong et al. (2011). That is, after adjusting for age, grade and sex, narcissism significantly predicted the Facebook profile picture ratings over and above extroversion, \(\hat{b}\) = 0.17 (0.10, 0.24), p < 0.001.
model_parameters(model = prof_ncs_lm, bootstrap = TRUE) |>
display()| Parameter | Coefficient | 95% CI | p |
|---|---|---|---|
| (Intercept) | -3.79 | (-19.80, 8.67) | 0.612 |
| sex (Male) | 0.64 | (-0.53, 1.90) | 0.284 |
| age | 0.35 | (-0.52, 1.45) | 0.480 |
| grade (Sec 2) | -0.56 | (-2.21, 1.00) | 0.472 |
| grade (Sec 3) | -0.59 | (-2.94, 1.50) | 0.572 |
| extraversion | 0.11 | (0.02, 0.21) | 0.014 |
| narcissism | 0.17 | (0.10, 0.24) | < .001 |
Task 8.7
Coldwell et al. (2006) investigated whether household chaos predicted children’s problem behaviour over and above parenting. From 118 families they recorded the age and gender of the youngest child (
child_ageandchild_gender). They measured dimensions of the child’s perceived relationship with their mum: (1) warmth/enjoyment (child_warmth) and (2) anger/hostility (child_anger). Higher scores indicate more warmth/enjoyment and anger/hostility, respectively. They measured the mum’s perceived relationship with her child, resulting in dimensions of positivity (mum_pos) and negativity (mum_neg). Household chaos (chaos) was assessed. The outcome variable was the child’s adjustment (sdq): the higher the score, the more problem behaviour the child was reported to be displaying. Conduct a hierarchical linear model in three steps: (1) enter child age and gender; (2) add the variables measuring parent–child positivity, parent–child negativity, parent–child warmth, parent–child anger; and (3) add chaos. Is household chaos predictive of children’s problem behaviour over and above parenting? (coldwell_2006.csv).
Load the data using (see Tip 2):
chaos_tib <- discovr::coldwell_2006Fit the models
# Model 1: sdq predicted from child_age and child_gender:
chaos_base_lm <- lm(sdq ~ child_age + child_age, data = chaos_tib, na.action = na.exclude)
# Model 2: In a new block, add child_anger, child_warmth, mum_pos and mum_neg into the model:
chaos_emo_lm <- update(chaos_base_lm, .~. + child_anger + child_warmth + mum_pos + mum_neg)
# Model 3: In a final block, add chaos to the model:
chaos_chaos_lm <- update(chaos_emo_lm, .~. + chaos)Table 51 shows the fit statistics. Note that the \(R^2\) is increasing as new predictors are added (althopugh the variance explained is small).
We can’t compare the models using test_wald() because they’re based on different amounts of data because of missing values.
compare_performance(chaos_base_lm, chaos_emo_lm, chaos_chaos_lm) |>
display()| Name | Model | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|
| chaos_base_lm | lm | 9.04e-04 | -8.61e-03 | 0.17 | 0.17 |
| chaos_emo_lm | lm | 0.06 | 8.31e-03 | 0.17 | 0.17 |
| chaos_chaos_lm | lm | 0.10 | 0.04 | 0.16 | 0.17 |
Table 52 shows the parameter estimates. (Note that in the paper they report standardized betas so I have used standardize = "refit".) Household chaos significantly predicted younger sibling’s problem behaviour over and above maternal parenting, child age and gender, \(\hat{\beta}\) = 0.22 (0.01, 0.42), t(89) = 2.08, p = 0.040. The positive standardized beta value indicates that there is a positive relationship between household chaos and child’s problem behaviour. In other words, the higher the level of household chaos, the more problem behaviours the child displayed. The change in \(R^2\) (0.104 - 0.061 = 0.044) in the model with and without chaos as a predictor tells us that that household chaos accounts for 4.4% of the variance in child problem behaviour.
model_parameters(chaos_chaos_lm, standardize = "refit") |>
display()| Parameter | Coefficient | SE | 95% CI | t(89) | p |
|---|---|---|---|---|---|
| (Intercept) | 2.42e-16 | 0.10 | (-0.20, 0.20) | 2.42e-15 | > .999 |
| child age | 0.07 | 0.10 | (-0.14, 0.27) | 0.66 | 0.514 |
| child anger | 0.02 | 0.10 | (-0.19, 0.23) | 0.19 | 0.850 |
| child warmth | -0.01 | 0.11 | (-0.23, 0.21) | -0.09 | 0.926 |
| mum pos | 0.05 | 0.10 | (-0.15, 0.26) | 0.51 | 0.613 |
| mum neg | 0.20 | 0.11 | (-0.01, 0.41) | 1.86 | 0.066 |
| chaos | 0.22 | 0.10 | (9.90e-03, 0.42) | 2.08 | 0.040 |
Chapter 9
I had a massive brain fart with the Smart Alex questions in this chapter. Although I adapted the tasks from the IBM SPSS version of this book, for some tasks this adapted text didn’t find its way into the final manuscript. What Can I say, it was an intense time. The Tasks below reflect what should have been in the book (see also the errata page)
Task 9.1
Is arachnophobia (fear of spiders) specific to real spiders or will pictures of spiders evoke similar levels of anxiety? Twelve arachnophobes were asked to play with a big hairy tarantula with big fangs and an evil look in its eight eyes, and at a different point in time were shown only pictures of the same spider. The participants’ anxiety was measured in each case. Do a t-test to see whether anxiety is higher for real spiders than pictures (
big_hairy_spider.csv).
Load the data using (see Tip 2):
racnoss_tib <- discovr::big_hairy_spiderLet’s get the summary statistics in Table 53.
| spider_type | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Picture | anxiety | 40 | (35.00, 45.00) | 9.29 | 17.50 | (25.00, 55.00) | 0.00 | -1.00 | 12 | 0 |
| Real | anxiety | 47 | (41.00, 52.84) | 11.03 | 19.00 | (30.00, 65.00) | -7.32e-03 | -1.12 | 12 | 0 |
We need to run a paired samples t-test on these data, and to do so we need the data in messy format as in Table 54:
# Convert to messy
messy_racnoss_tib <- racnoss_tib |>
pivot_wider(
id_cols = id,
values_from = anxiety,
names_from = spider_type
) |>
rename_with(.data = _, .fn = ~tolower(.x))
# show the data
DT::datatable(messy_racnoss_tib)Fit the model
Table 55 shows the parameter estimates.
model_parameters(racnoss_mod) |>
display()| Parameter | Difference | t(11) | p | 95% CI |
|---|---|---|---|---|
| Pair(real, picture) | 7 | 2.47 | 0.031 | (0.77, 13.23) |
Alternative hypothesis: true mean difference is not equal to 0
Table 56 shows the effect size estimate.
Task 9.2
Plot an error bar graph of the data in Task 1
Figure 22 shows an error bar plot.
ggplot(racnoss_tib, aes(spider_type, anxiety)) +
geom_point(colour = "#46F884FF", position = position_jitter(width = 0.1)) +
stat_summary(fun.data = "mean_cl_normal", colour = "#3E9BFEFF") +
coord_cartesian(ylim = c(0, 60)) +
scale_y_continuous(breaks = seq(0, 60, 5)) +
labs(x = "Type of spider", y = "Anxiety") +
theme_minimal()
Task 9.3
‘Pop psychology’ books sometimes spout nonsense that is unsubstantiated by science. As part of my plan to rid the world of pop psychology, I took 20 people in relationships and randomly assigned them to one of two groups. One group read the famous popular psychology book Women are from Bras and Men are from Penis, and the other read Marie Claire. The outcome variable was their relationship happiness after their assigned reading. Were people happier with their relationship after reading the pop psychology book? (
self_help.csv).
Load the data using (see Tip 2):
selfhelp_tib <- discovr::self_helpLet’s get the summary statistics in Table 57.
| book | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Women are from Bras, Men are from Penis | happy | 20.00 | (17.60, 22.60) | 4.11 | 4.75 | (13.00, 28.00) | 0.28 | 0.93 | 10 | 0 |
| Marie Claire | happy | 24.20 | (21.20, 26.90) | 4.71 | 8.00 | (15.00, 30.00) | -0.77 | -0.14 | 10 | 0 |
Fit the model
book_mod <- t.test(happy ~ book, data = selfhelp_tib)Table 58 shows the parameter estimates.
model_parameters(book_mod) |>
display()| Parameter | Group | book = Women are from Bras, Men are from Penis | book = Marie Claire | Difference | 95% CI | t(17.68) | p |
|---|---|---|---|---|---|---|---|
| happy | book | 20 | 24.20 | -4.20 | (-8.36, -0.04) | -2.12 | 0.048 |
Alternative hypothesis: true difference in means between group Women are from Bras, Men are from Penis and group Marie Claire is not equal to 0
Table 59 shows the effect size estimate.
cohens_d(happy ~ book, data = selfhelp_tib) |>
display()| Cohen’s d | 95% CI |
|---|---|
| -0.95 | [-1.87, -0.01] |
Estimated using pooled SD.
Task 9.4
Twaddle and Sons, the publishers of Women are from Bras and Men are from Penis, were upset about my claims that their book was as useful as a paper umbrella. They ran their own experiment (N = 500) in which relationship happiness was measured after participants had read their book and after reading this one. (Participants read the books in counterbalanced order with a sixmonth delay.) Was relationship happiness greater after reading their wonderful contribution to pop psychology than after reading my tedious tome about experiments? (
self_help_dsur.csv).
Load the data using (see Tip 2):
self_help_dsur_tib <- discovr::self_help_dsurLet’s get the summary statistics in Table 57.
| book | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Self-help book | happiness | 20.02 | (19.19, 20.96) | 9.98 | 12 | (-18.00, 45.00) | -0.07 | 0.48 | 500 | 0 |
| Discovering statistics using R | happiness | 18.49 | (17.73, 19.21) | 8.99 | 12 | (-8.00, 42.00) | 0.06 | -0.12 | 500 | 0 |
We need to run a paired samples t-test on these data, and to do so we need the data in messy format as in Table 61:
# Convert to messy
messy_book_tib <- self_help_dsur_tib |>
pivot_wider(
id_cols = id,
values_from = happiness,
names_from = book
) |>
rename(
self_help = `Self-help book`,
dsur = `Discovering statistics using R`
)
# show the data
DT::datatable(messy_book_tib)Fit the model
Table 62 shows the parameter estimates.
model_parameters(bookr_mod) |>
display()| Parameter | Difference | t(499) | p | 95% CI |
|---|---|---|---|---|
| Pair(self_help, dsur) | 1.53 | 2.71 | 0.007 | (0.42, 2.64) |
Alternative hypothesis: true mean difference is not equal to 0
Table 63 shows the effect size estimate.
| d (rm) | 95% CI |
|---|---|
| 0.16 | [0.04, 0.28] |
Adjusted for small sample bias.
Although this effect is highly statistically significant, the size of the effect is small. In this example, it would be tempting for Twaddle and Sons to conclude that their book produced significantly greater relationship happiness than my book. In fact, many researchers would write conclusions like this:
However, to reach such a conclusion is to confuse statistical significance with the importance of the effect. By calculating the effect size we’ve discovered that although the difference in happiness after reading the two books is statistically different, the size of effect that this represents is small. A more balanced interpretation might, therefore, be:
Of course, this latter interpretation would be unpopular with Twaddle and Sons who would like to believe that their book had a huge effect on relationship happiness.
Task 9.5
In Chapter 4 (Task 6) we looked at data from people who had fish or cats as pets and measured their life satisfaction as well as how much they like animals (
pets.csv). Conduct a t-test to see whether life satisfaction depends upon the type of pet a person has.
Load the data using (see Tip 2):
pet_tib <- discovr::petsLet’s get the summary statistics in Table 64.
| pet | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Fish | life_satisfaction | 38.17 | (28.67, 45.84) | 15.51 | 19.25 | (6.00, 56.00) | -1.17 | 0.44 | 12 | 0 |
| Cat | life_satisfaction | 60.12 | (52.62, 66.13) | 11.10 | 13.50 | (37.00, 72.00) | -1.45 | 2.23 | 8 | 0 |
Fit the model
pet_mod <- t.test(life_satisfaction ~ pet, data = pet_tib)Table 65 shows the parameter estimates.
model_parameters(pet_mod) |>
display()| Parameter | Group | pet = Fish | pet = Cat | Difference | 95% CI | t(17.84) | p |
|---|---|---|---|---|---|---|---|
| life_satisfaction | pet | 38.17 | 60.12 | -21.96 | (-34.48, -9.44) | -3.69 | 0.002 |
Alternative hypothesis: true difference in means between group Fish and group Cat is not equal to 0
Table 66 shows the effect size estimate.
cohens_d(life_satisfaction ~ pet, data = pet_tib) |>
display()| Cohen’s d | 95% CI |
|---|---|
| -1.57 | [-2.59, -0.53] |
Estimated using pooled SD.
As well as being statistically significant, this effect is very large and so represents a substantive finding. We could report:
Task 9.6
Repeat the t-test from Task 5 but turn of Welch’s correction. Then fit a linear model to the data in Task 5 to see whether life satisfaction is significantly predicted from the type of animal. What do you notice about the t-value and significance in this model compared to Task 5?
First let’s fit the model without applying the default Welch correction by including var.equal = TRUE. The results are in Table 67.
| Parameter | Group | pet = Fish | pet = Cat | Difference | 95% CI | t(18) | p |
|---|---|---|---|---|---|---|---|
| life_satisfaction | pet | 38.17 | 60.12 | -21.96 | (-35.35, -8.57) | -3.45 | 0.003 |
Alternative hypothesis: true difference in means between group Fish and group Cat is not equal to 0
Now let’s fit a linear model. The results are in Table 68.
| Parameter | Coefficient | SE | 95% CI | t(18) | p |
|---|---|---|---|---|---|
| (Intercept) | 38.17 | 4.03 | (29.70, 46.63) | 9.47 | < .001 |
| pet (Cat) | 21.96 | 6.37 | (8.57, 35.35) | 3.45 | 0.003 |
The values of t and p are the same. (Technically, t is different because for the linear model it is a positive value and for the t-test it is negative However, the sign of t merely reflects which way around you coded the cat and fish groups. The linear model, by default, has coded the groups the opposite way around to the t-test.) The main point I wanted to make here is that whether you run these data through the lm() or t.test() functions, the results are identical.
Task 9.7
In Chapter 6 we looked at hygiene scores over three days of a rock music festival (
download.csv). Do a paired-samples t-test to see whether hygiene scores on day 1 differed from those on day 3.
Load the data using (see Tip 2):
download_tib <- discovr::downloadLet’s get the summary statistics in Table 69.
| Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|
| day_1 | 1.79 | (1.73, 1.86) | 0.94 | 0.93 | (0.02, 20.02) | 8.87 | 170.45 | 810 | 0 |
| day_3 | 0.98 | (0.86, 1.10) | 0.71 | 1.11 | (0.02, 3.41) | 1.03 | 0.73 | 123 | 687 |
We need to run a paired samples t-test on these data, and so the data need to be in messy format, which they are.
Fit the model:
Table 70 shows the parameter estimates.
model_parameters(download_mod) |>
display()| Parameter | Difference | t(122) | p | 95% CI |
|---|---|---|---|---|
| Pair(day_1, day_3) | 0.67 | 10.59 | < .001 | (0.55, 0.80) |
Alternative hypothesis: true mean difference is not equal to 0
Table 71 shows the effect size estimate.
Task 9.8
A psychologist was interested in the cross-species differences between men and dogs. She observed a group of dogs and a group of men in a naturalistic setting (20 of each). She classified several behaviours as being dog-like (urinating against trees and lampposts, attempts to copulate with anything that moved, and attempts to lick their own genitals). For each man and dog she counted the number of dog-like behaviours displayed in a 24-hour period. It was hypothesized that dogs would display more dog-like behaviours than men. Test this hypothesis using a robust test. (
men_dogs.csv)
Load the data using (see Tip 2):
mandog_tib <- discovr::men_dogsLet’s get the summary statistics in Table 72.
| species | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Dog | behaviour | 28.05 | (23.55, 33.00) | 10.98 | 17.25 | (15.00, 51.00) | 0.74 | -0.57 | 20 | 0 |
| Man | behaviour | 26.85 | (22.95, 31.65) | 9.90 | 11.25 | (12.00, 54.00) | 0.94 | 1.66 | 20 | 0 |
Fit the model
mandog_mod <- t.test(behaviour ~ species, data = mandog_tib)Table 73 shows the parameter estimates.
model_parameters(mandog_mod) |>
display()| Parameter | Group | species = Dog | species = Man | Difference | 95% CI | t(37.60) | p |
|---|---|---|---|---|---|---|---|
| behaviour | species | 28.05 | 26.85 | 1.20 | (-5.50, 7.90) | 0.36 | 0.719 |
Alternative hypothesis: true difference in means between group Dog and group Man is not equal to 0
Table 74 shows the effect size estimate.
cohens_d(behaviour ~ species, data = mandog_tib) |>
display()| Cohen’s d | 95% CI |
|---|---|
| 0.11 | [-0.51, 0.73] |
Estimated using pooled SD.
The effect is not statistically significant, and is a small effect. We could report:
Task 9.9
Both Ozzy Osbourne and Judas Priest have been accused of putting backward masked messages on their albums that subliminally influence poor unsuspecting teenagers into doing things like blowing their heads off with shotguns. A psychologist was interested in whether backward masked messages could have an effect. He created a version of Taylor Swifts’ ‘Shake it off’ that contained the masked message ‘deliver your soul to the dark lord’ repeated in the chorus. He took this version, and the original, and played one version (randomly) to a group of 32 people. Six months later he played them whatever version they hadn’t heard the time before. So, each person heard both the original and the version with the masked message, but at different points in time. The psychologist measured the number of satanic intrusions the person had in the week after listening to each version. Test the hypothesis that the backward message would lead to more intrusions (
dark_lord.csv) using a robust test.
Load the data using (see Tip 2):
darklord_tib <- discovr::dark_lordLet’s get the summary statistics in Table 75.
| message | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Message | intrusions | 9.16 | (7.91, 10.38) | 3.55 | 4 | (2.00, 17.00) | 0.08 | 0.24 | 32 | 0 |
| No message | intrusions | 11.50 | (10.09, 13.03) | 4.38 | 3 | (4.00, 21.00) | 0.79 | 0.23 | 32 | 0 |
We need to run a robust paired samples t-test on these data. Therefore, we need to make the data messy (as in Table 76):
# Convert to messy
dark_mess_tib <- darklord_tib |>
pivot_wider(
id_cols = id,
values_from = intrusions,
names_from = message
) |>
rename_with(.data = _, ~ tolower(str_replace(.x, pattern = " ", replacement = "_")))
# show the data
DT::datatable(dark_mess_tib)Fit the robust model
darklord_rob <- WRS2::yuend(x = dark_mess_tib$message, y = dark_mess_tib$no_message)Table 77 shows the parameter estimates.
model_parameters(darklord_rob) |>
display()| t(19) | p | Difference | Difference 95% CI | Estimate | Effectsize |
|---|---|---|---|---|---|
| -1.96 | 0.064 | -1.55 | (-3.20, 0.10) | 0.36 | Explanatory measure of effect size |
Task 9.10
Thinking back to Labcoat Leni’s Real Research 4.1, test whether the number of offers was significantly different in people listening to Bon Scott than in those listening to Brian Johnson (
acdc.csv) using a robust independent t-test and bootstrapping. Do your results differ from Oxoby (2008)?
Load the data using (see Tip 2):
acdc_tib <- discovr::acdcLet’s get the summary statistics in Table 78.
| singer | Variable | Mean | 95% CI (Mean) | SD | IQR | Range | Skewness | Kurtosis | n | n_Missing |
|---|---|---|---|---|---|---|---|---|---|---|
| Bon Scott | offer | 3.28 | (2.72, 3.78) | 1.18 | 2 | (1.00, 5.00) | -0.12 | -0.77 | 18 | 0 |
| Brian Johnson | offer | 4.00 | (3.56, 4.50) | 0.97 | 2 | (2.00, 5.00) | -0.43 | -0.96 | 18 | 0 |
Fit the model
acdc_rob <- WRS2::yuenbt(offer ~ singer, data = acdc_tib, nboot = 1000, side = TRUE)Table 80 shows the parameter estimates.
model_parameters(acdc_rob) |>
display()display(acdc_pe)| t | p | Difference | Difference 95% CI |
|---|---|---|---|
| -1.73 | 0.086 | -0.83 | (-1.79, 0.12) |
Table 81 shows the effect size estimate.
cohens_d(offer ~ singer, data = acdc_tib) |>
display()| Cohen’s d | 95% CI |
|---|---|
| -0.67 | [-1.34, 0.01] |
Estimated using pooled SD.
The bootstrap confidence interval ranged from -1.79 to 0.12, which just about crosses zero suggesting that (if we assume that it is one of the 95% of confidence intervals that contain the true value) that the effect in the population could be zero.
Chapter 10
Task 10.1
McNulty et al. (2008) found a relationship between a person’s
attractivenessand how muchsupportthey give their partner among newlywed heterosexual couples. The data are inmcnulty_2008.csv, Is this relationship moderated byspouse(i.e., whether the data were from the husband or wife)?
Load the data using (see Tip 2):
mcnulty_tib <- discovr::mcnulty_2008Centre the predictors
mcnulty_ctib <- center(x = mcnulty_tib, select = c(attractiveness , support, satisfaction))Fit the model
support_lm <- lm(support ~ attractiveness*spouse, data = mcnulty_ctib)Table 82 shows the fit statistics
model_performance(support_lm) |>
display()| AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|---|
| -52.2 | -51.9 | -36.7 | 0.09 | 0.07 | 0.20 | 0.20 |
Evaluate the model assumptions in Figure 23. There’s potential heteroscedasticity.
check_model(support_lm)
Table 83 shows the robust parameter estimates
model_parameters(support_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(160) | p |
|---|---|---|---|---|---|
| (Intercept) | -0.01 | 0.02 | (-0.06, 0.03) | -0.50 | 0.619 |
| attractiveness | -0.06 | 0.02 | (-0.10, -0.02) | -2.93 | 0.004 |
| spouse (Wife) | 0.02 | 0.03 | (-0.04, 0.09) | 0.75 | 0.455 |
| attractiveness × spouse (Wife) | 0.11 | 0.03 | (0.05, 0.16) | 3.54 | < .001 |
Task 10.2
Produce the simple slopes analysis for Task 1.
Table 84 tells us that the relationship between attractiveness of a person and amount of support given to their spouse is different for husbands and wives. Specifically, for wives, as attractiveness increases the level of support that they give to their husbands increases, whereas for husbands, as attractiveness increases the amount of support they give to their wives decreases.
support_ss <- estimate_slopes(model = support_lm,
trend = "attractiveness",
by = "spouse")
display(support_ss)| spouse | Slope | SE | 95% CI | t(160) | p |
|---|---|---|---|---|---|
| Husband | -0.06 | 0.02 | (-0.10, -0.02) | -2.97 | 0.003 |
| Wife | 0.05 | 0.02 | ( 0.01, 0.08) | 2.35 | 0.020 |
Marginal effects estimated for attractiveness; Type of slope was dY/dX
Task 10.3
McNulty et al. (2008) also found a relationship between a person’s
attractivenessand their relationshipsatisfactionamong newlyweds. Using the same data as in Tasks 1 and 2, find out if this relationship is moderated byspouse.
The data were loaded and variables centred in Task 1. To fit this model use:
satisfaction_lm <- lm(satisfaction ~ attractiveness*spouse, data = mcnulty_ctib)Table 85 shows the fit statistics
model_performance(satisfaction_lm) |>
display()| AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma |
|---|---|---|---|---|---|---|
| 960.6 | 961.0 | 976.1 | 0.03 | 9.97e-03 | 4.39 | 4.44 |
Evaluate the model assumptions in Figure 24. There’s potential heteroscedasticity.
check_model(satisfaction_lm)
Table 86 shows the robust parameter estimates
model_parameters(satisfaction_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(160) | p |
|---|---|---|---|---|---|
| (Intercept) | 0.01 | 0.41 | (-0.79, 0.82) | 0.04 | 0.972 |
| attractiveness | -0.88 | 0.38 | (-1.64, -0.13) | -2.32 | 0.021 |
| spouse (Wife) | -0.02 | 0.70 | (-1.40, 1.35) | -0.03 | 0.973 |
| attractiveness × spouse (Wife) | 0.55 | 0.58 | (-0.59, 1.68) | 0.95 | 0.343 |
Task 10.4:
In this chapter we tested a mediation model of infidelity for Lambert et al.’s data (Lambert et al., 2012). Repeat this analysis but using
hook_upsas the measure of infidelity.
Load the data using (see Tip 2):
infidelity_tib <- discovr::lambert_2012Specify the model by changing the outcome in the code within the book chapter:
hookup_mod <- 'hook_ups ~ c*ln_porn + b*commit
commit ~ a*ln_porn
indirect_effect := a*b
total_effect := c + (a*b)
'Fit the model
hookup_fit <- lavaan::sem(hookup_mod,
data = infidelity_tib,
missing = "FIML",
estimator = "MLR")Table 87 shows the parameter estimates
model_parameters(hookup_fit) |>
display()| Link | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| hook_ups ~ ln_porn (c) | 1.29 | 0.52 | (0.28, 2.30) | 2.50 | 0.013 |
| hook_ups ~ commit (b) | -0.62 | 0.13 | (-0.88, -0.36) | -4.72 | < .001 |
| commit ~ ln_porn (a) | -0.47 | 0.23 | (-0.92, -0.02) | -2.06 | 0.040 |
| To | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (indirect_effect) | 0.29 | 0.15 | (-7.47e-03, 0.59) | 1.91 | 0.056 |
| (total_effect) | 1.58 | 0.55 | (0.51, 2.65) | 2.89 | 0.004 |
Is there evidence for mediation? As such, there is not significant mediation (although there nearly is …).
Task 10.5:
Tablets like the iPad are very popular. A company owner was interested in how to make his brand of tablets more desirable. He collected data on how cool people perceived a product’s advertising to be (
advert_cool), how cool they thought the product was (product_cool) and how desirable they found the product (desirability). Test his theory that the relationship between cool advertising and product desirability is mediated by how cool people think the product is (tablets.csv). Am I showing my age by using the word ‘cool’?
Load the data using (see Tip 2):
tablet_tib <- discovr::tabletsSpecify the model:
tablet_mod <- 'desirability ~ c*advert_cool + b*product_cool
product_cool ~ a*advert_cool
indirect_effect := a*b
total_effect := c + (a*b)
'Fit the model
tablet_fit <- lavaan::sem(tablet_mod,
data = tablet_tib,
estimator = "MLR")Table 88 shows the parameter estimates
model_parameters(tablet_fit) |>
display()| Link | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| desirability ~ advert_cool (c) | 0.20 | 0.06 | (0.07, 0.33) | 3.11 | 0.002 |
| desirability ~ product_cool (b) | 0.23 | 0.06 | (0.12, 0.35) | 3.96 | < .001 |
| product_cool ~ advert_cool (a) | 0.15 | 0.06 | (0.03, 0.27) | 2.43 | 0.015 |
| To | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (indirect_effect) | 0.04 | 0.02 | (1.23e-03, 0.07) | 2.03 | 0.042 |
| (total_effect) | 0.24 | 0.06 | (0.11, 0.36) | 3.68 | < .001 |
- Advert coolness significantly predicted desirability, \(\hat{b}\) = 0.20 (0.07, 0.33), z = 3.11, p = 0.002. As advert coolness increased, desirability increases also.
- Advert coolness significantly predicted perceptions of a product, \(\hat{b}\) = 0.15 (0.03, 0.27), z = 2.43, p = 0.015. As advert coolness increased, product coolness increased also.
- Product coolness significantly predicted desirability, \(\hat{b}\) = 0.23 (0.12, 0.35), z = 3.96, p < 0.001. Products perceived to be more cool were more desirable.
- The indirect effect of advert coolness on desirability through product coolness was significant, \(\hat{b}\) = 0.04 (0.00, 0.07), z = 2.03, p = 0.042. As such, there was evidence of significant mediation.
You could report this as:
Chapter 11
Task 11.1
To test how different teaching methods affected students’ knowledge I took three statistics modules (
group) where I taught the same material. For one module I wandered around with a large cane and beat anyone who asked daft questions or got questions wrong (punish). In the second I encouraged students to discuss things that they found difficult and gave anyone working hard a nice sweet (reward). In the final course I neither punished nor rewarded students’ efforts (indifferent). I measured the students’ exam marks (percentage). The data are in the fileteach_method.csv. Fit a model with planned contrasts to test the hypotheses that: (1) reward results in better exam results than either punishment or indifference; and (2) indifference will lead to significantly better exam results than punishment.
Load the data using (see Tip 2):
teach_tib <- discovr::teach_methodLet’s do the plot first. We can get an error bar plot in Figure 25 as follows:
ggplot(teach_tib, aes(group, exam)) +
stat_summary(fun.data = "mean_cl_boot", colour = "#3E9BFEFF") +
coord_cartesian(ylim = c(0, 70)) +
scale_y_continuous(breaks = seq(0, 70, 10)) +
labs(x = "Method of teaching", y = "Percentage on exam") +
theme_minimal()
The plot shows that marks are highest after reward and lowest after punishment.
To fit the model we need to think about the hypotheses:
- Reward results in better exam results than either punishment or indifference
- Indifference will lead to significantly better exam results than punishment
To test our hypotheses we need to first enter the codes for the contrasts in Table 89.
| Group | Reward vs. other | Indifference vs. punish |
|---|---|---|
| Punish | 1/3 | 1/2 |
| Indifferent | 1/3 | -1/2 |
| Reward | -2/3 | 0 |
Contrast 1 tests hypothesis 1: reward results in better exam results than either punishment or indifference. To test this we compare the reward condition against the other two. So we compare chunk 1 (punish, indifferent) to chunk 2 (reward). In the table, the numbers assigned to the groups are the number of groups in the opposite chunk divided by the number of groups that have non-zero codes, and we randomly assigned one chunk to be a negative value. (The codes -1/3, -1/3, 2/3 would work fine as well.)
Contrast 2 tests hypothesis 2: indifference will lead to significantly better exam results than punishment. First we get rid of the reward condition by assigning a code of 0; next we compare chunk 1 (indifferent) to chunk 2 (punish). The numbers assigned to the groups are the number of groups in the opposite chunk divided by the number of groups that have non-zero codes, and then we randomly assigned one chunk to be a negative value. (The codes -1/2, 1/2, 0 would work fine as well.)
We set these contrasts for the variable group as follows:
reward_vs_other indifferent_vs_punish
Punish 0.3333333 0.5
Indifferent 0.3333333 -0.5
Reward -0.6666667 0.0
Next we fit the model using this code:
| Parameter | Sum_Squares | df | Mean_Square | F | p | Omega2 | Omega2 95% CI |
|---|---|---|---|---|---|---|---|
| group | 1205.07 | 2 | 602.53 | 21.01 | < .001 | 0.57 | (0.34, 1.00) |
| Residuals | 774.40 | 27 | 28.68 |
Anova Table (Type 1 tests)
Table 90 tells us that there was a significant effect of the teaching method on exam scores, F(2, 27) = 21.01, p < 0.001, \(\hat{\omega}^2\) = 0.57 (0.34, 1.00). View the parameters using:
model_parameters(teach_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(27) | p |
|---|---|---|---|---|---|
| (Intercept) | 57.13 | 0.98 | (55.13, 59.14) | 58.43 | < .001 |
| group (reward_vs_other) | -12.40 | 2.07 | (-16.66, -8.14) | -5.98 | < .001 |
| group (indifferent_vs_punish) | -6.00 | 2.40 | (-10.91, -1.09) | -2.51 | 0.019 |
Table 91 shows that hypothesis 1 is supported (groupreward_vs_other): teaching with rewards led to significantly higher exam scores than the other methods, \(\hat{b}\) = -12.40 (-16.66, -8.14), t(27) = -5.98, p < 0.001. Hypothesis 2 is also supported (groupindifferent_vs_punish): teaching with indifference was associated with higher exam scores than teaching with punishment, \(\hat{b}\) = -6.00 (-10.91, -1.09), t(27) = -2.51, p = 0.019.
We can get some basic diagnostic plots (Figure 26). There are no large Cook’s distances, the Q-Q plot suggests normal residuals but the residual vs. fitted plot and the scale-location plot suggest heterogeneity of variance (the columns of dots are different lengths and the red line is not flat especially in the scale location plot), which brings us to Task 2!
check_model(teach_lm)
Task 11.2
Fit a robust model for Task 1.
Table 92 shows a robust F for the teaching data. The Welch F is highly significant like the non-robust F.
oneway.test(exam ~ group, data = teach_tib) |>
model_parameters() |>
display()| F | df | df (error) | p |
|---|---|---|---|
| 32.24 | 2 | 17.34 | < .001 |
Table 93 The first contrast is still highly significant and the second contrast is also still significant. As such, our conclusions are unchanged when fitting a model that is robust to heteroscedasticity.
model_parameters(teach_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(27) | p |
|---|---|---|---|---|---|
| (Intercept) | 57.13 | 0.98 | (55.13, 59.14) | 58.43 | < .001 |
| group (reward_vs_other) | -12.40 | 1.88 | (-16.26, -8.54) | -6.59 | < .001 |
| group (indifferent_vs_punish) | -6.00 | 2.60 | (-11.33, -0.67) | -2.31 | 0.029 |
Task 11.3
Children wearing superhero costumes are more likely to harm themselves because of the unrealistic impression of invincibility that these costumes could create. For example, children have reported to hospital with severe injuries because of trying ‘to initiate flight without having planned for landing strategies’ (Davies, Surridge, Hole, & Munro-Davies, 2007). I can relate to the imagined power that a costume bestows upon you; indeed, I have been known to dress up as Fisher by donning a beard and glasses and trailing a goat around on a lead in the hope that it might make me more knowledgeable about statistics. Imagine we had data (
superhero.csv) about the severity ofinjury(on a scale from 0, no injury, to 100, death) for children reporting to the accident and emergency department at hospitals, and information on which superhero costume they were wearing (hero): Spiderman, Superman, the Hulk or a teenage mutant ninja turtle. Fit a model with planned contrasts to test the hypothesis that those wearing costumes of flying superheroes (Superman and Spiderman) have more severe injuries.
To get you into the mood for hulk-related data analysis, Figure 1 shows my wife and I on the Hulk rollercoaster at the islands of adventure in Orlando, on our honeymoon.

Not in the mood yet? OK, let’s ramp up the ‘weird’ with a photo (Figure 2) of some complete strangers reading one of my textbooks on the same Hulk roller-coaster that my wife and I rode (not at the same time as the book readers I should add). Nothing says ‘I love your textbook’ like taking it on a stomach-churning high speed ride. I dearly wish that reading my books on roller coasters would become a ‘thing’.

Load the data using (see Tip 2):
hero_tib <- discovr::superheroLets start with a violin plot of the data with some error bars (Figure 27):
ggplot(hero_tib, aes(hero, injury)) +
geom_violin(colour = "#3E9BFEFF", fill = "#3E9BFEFF", alpha = 0.1) +
stat_summary(fun.data = "mean_cl_boot", colour = "#3E9BFEFF") +
coord_cartesian(ylim = c(0, 90)) +
scale_y_continuous(breaks = seq(0, 90, 10)) +
labs(x = "Costume worn", y = "Injury severity (0-100") +
theme_minimal()
To test our main hypotheses we need to first work out some contrasts. In the book, we learned rules to generate contrasts. Figure 3 shows how we would apply these to the Superhero example to plan the contrasts that we want. We’re told that we want to compare flying superheroes (i.e. Superman and Spiderman) against non-flying ones (the Hulk and Ninja Turtles) in the first instance. That will be contrast 1. However, because each of these chunks is made up of two groups (e.g., the flying superheroes chunk comprises both children wearing Spiderman and those wearing Superman costumes), we need a second and third contrast that breaks each of these chunks down into their constituent parts.

Contrast 1 compares flying (Superman, Spiderman) to non-flying (Hulk, Turtle) superheroes. Each chunk contains two groups, so the weights for the opposite chunks are both 2 divided by the number of groups with non-zero weights. This gives us \(\frac{2}{4} = \frac{1}{2}\).We assign one chunk positive weights and the other negative weights (in Table 94 I’ve chosen the flying superheroes to have positive weights, but you could do it the other way around).
Contrast two then compares the two flying superheroes to each other. First we assign both non-flying superheroes a 0 weight to remove them from the contrast. We’re left with two chunks: one containing the Superman group and the other containing the Spiderman group. Each chunk contains one group, so the weights for the opposite chunks are both 1 divided by the number of groups with non-zero weights (i.e. 1/2). We assign one chunk positive weights and the other negative weights (in Table 94 I’ve given Superman the positive weight, but you could do it the other way around).
Finally, Contrast three compares the two non-flying superheroes to each other. First we assign both flying superheroes a 0 weight to remove them from the contrast. We’re left with two chunks: one containing the Hulk group and the other containing the Turtle group. Each chunk contains one group, so the weights for the opposite chunks are both 1/2 (for the same reason as contrast 2). We assign one chunk positive weights and the other negative weights (in Table 94 I’ve given the Hulk the positive weight, but you could do it the other way around).
| Group | Flying vs. non | Superman vs. Spiderman | Hulk vs. Ninja turtle |
|---|---|---|---|
| Superman | 1/2 | 1/2 | 0 |
| Spiderman | 1/2 | -1/2 | 0 |
| Hulk | -1/2 | 0 | 1/2 |
| Ninja Turtle | -1/2 | 0 | -1/2 |
We set these contrasts for the variable hero as follows:
flying_vs_not super_vs_spider hulk_vs_ninja
Superman 0.5 0.5 0.0
Spiderman 0.5 -0.5 0.0
Hulk -0.5 0.0 0.5
Ninja Turtle -0.5 0.0 -0.5
Next we fit the model using this code:
| Parameter | Sum_Squares | df | Mean_Square | F | p | Omega2 | Omega2 95% CI |
|---|---|---|---|---|---|---|---|
| hero | 4180.62 | 3 | 1393.54 | 8.32 | < .001 | 0.42 | (0.14, 1.00) |
| Residuals | 4356.58 | 26 | 167.56 |
Anova Table (Type 1 tests)
Table 95 tells us that there was a significant effect of the teaching method on exam scores, F(3, 26) = 8.32, p < 0.001, \(\hat{\omega}^2\) = 0.42 (0.14, 1.00).
We can get evaluate the model by looking at Figure 28. There are no large Cook’s distances, the Q-Q plot suggests non-normal residuals and the residual vs fitted plot and the scale-location plot suggest heterogeneity of variance (the columns of dots are different lengths and the line is not flat especially in the scale location plot).
check_model(hero_lm)
Let’s fit a robust model as a sensitivity check:
oneway.test(injury ~ hero, data = hero_tib) |>
model_parameters() |>
display()| F | df | df (error) | p |
|---|---|---|---|
| 7.10 | 3 | 13.02 | 0.005 |
The Welch F is highly significant still. Now view the (robust) parameter estimates table using:
model_parameters(hero_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(26) | p |
|---|---|---|---|---|---|
| (Intercept) | 40.90 | 2.56 | (35.64, 46.15) | 16.00 | < .001 |
| hero (flying_vs_not) | 20.17 | 5.11 | (9.66, 30.67) | 3.95 | < .001 |
| hero (super_vs_spider) | 18.71 | 8.61 | (1.02, 36.40) | 2.17 | 0.039 |
| hero (hulk_vs_ninja) | 9.12 | 5.52 | (-2.22, 20.47) | 1.65 | 0.110 |
Table 96 shows that hypothesis 1 is supported: flying costumes were associated with more severe injuries than non-flying ones, \(\hat{b}\) = 20.17 (9.66, 30.67), t(26) = 3.95, p < 0.001. The other contrasts tell us that Superman costumes were associated with more sever injuries than Spiderman costumes, \(\hat{b}\) = 18.71 (1.02, 36.40), t(26) = 2.17, p = 0.039, but that injury severity was not significantly different between Hulk and Ninja turtle costumes, \(\hat{b}\) = 9.12 (-2.22, 20.47), t(26) = 1.65, p = 0.110.
Task 11.4
I read a story in a newspaper (yes, back when they existed) claiming that the chemical genistein, which is naturally occurring in soya, was linked to lowered sperm counts in Western males. When you read the actual study, it had been conducted on rats, it found no link to lowered sperm counts, but there was evidence of abnormal sexual development in male rats (probably because genistein acts like oestrogen). As journalists tend to do, a study showing no link between soya and sperm counts was used as the scientific basis for an article about soya being the cause of declining sperm counts in Western males. Imagine the rat study was enough for us to want to test this idea in humans. We recruit 80 males and split them into four groups that vary in the number of soya ‘meals’ (a dinner containing 75g of soya) they ate per week over a year: no soya meals (i.e., none in the whole year), one per week (52 over the year), four per week (208 over the year), and seven per week (364 over the year). At the end of the year, participants produced some sperm that I could count (when I say ‘I’, I mean someone else in a laboratory as far away from me as humanly possible). The data are in
soya.csv
Load the data using (see Tip 2):
soya_tib <- discovr::soyaLets start with a boxplot of the data (Figure 29):
ggplot(soya_tib, aes(soya, sperm)) +
geom_boxplot(colour = "#F05B12FF", fill = "#F05B12FF", alpha = 0.3) +
coord_cartesian(ylim = c(0, 130)) +
scale_y_continuous(breaks = seq(0, 130, 10)) +
labs(x = "Number of soya meals per week", y = "Sperm count (Millions)") +
theme_minimal()
A boxplot of the data suggests that (1) scores within conditions are skewed; and (2) variability in scores is different across groups. Let’s verify this using some residual plots (Figure 30):
sperm_lm <- lm(sperm ~ soya, data = soya_tib)
check_model(sperm_lm)
Yep, just as I suspected, the residuals are properly fucked. They’re hideously non-normal and the non-flat green line in the middle_left plot suggest heteroscedasticity. No problem, we’ll fit a robust model. By default
Let’s fit a robust model. Table 97 shows that The Welch F is highly significant suggesting that sperm counts are significantly affected by the amount of Soya meals you eat per week, F(3, 34.66) = 6.28, p = 0.002.
sperm_rob <- oneway.test(sperm ~ soya, data = soya_tib)
model_parameters(sperm_rob) |>
display()| F | df | df (error) | p |
|---|---|---|---|
| 6.28 | 3 | 34.66 | 0.002 |
Now the robust parameters:
model_parameters(model = sperm_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(76) | p |
|---|---|---|---|---|---|
| (Intercept) | 49.87 | 11.37 | (27.22, 72.51) | 4.39 | < .001 |
| soya (1 soya meal per week) | -3.82 | 15.44 | (-34.57, 26.94) | -0.25 | 0.806 |
| soya (4 soya meals per week) | -8.77 | 15.05 | (-38.74, 21.21) | -0.58 | 0.562 |
| soya (7 soya meals per week) | -33.34 | 11.64 | (-56.51, -10.16) | -2.86 | 0.005 |
Task 11.5
Mobile phones emit microwaves, and so holding one next to your brain for large parts of the day is a bit like sticking your brain in a microwave oven and pushing the ‘cook until well done’ button. If we wanted to test this experimentally, we could get six groups of people and strap a mobile phone on their heads, then by remote control turn the phones on for a certain amount of time each day. After six months, we measure the size of any tumour (in mm3) close to the site of the phone antenna (just behind the ear). The six groups experienced 0, 1, 2, 3, 4 or 5 hours per day of phone microwaves for six months. Do tumours significantly increase with greater daily exposure? The data are in
tumour.csv.
Load the data using (see Tip 2):
tumour_tib <- discovr::tumourPlot the data
Lets start with an error bar plot of the data (Figure 31):
ggplot(tumour_tib, aes(usage, tumour)) +
stat_summary(fun.data = "mean_cl_boot", colour = "#3E9BFEFF") +
coord_cartesian(ylim = c(0, 7)) +
scale_y_continuous(breaks = 0:7) +
labs(x = "Hours on phone per day", y = bquote(Size~of~tumour~(mm^3))) +
theme_minimal()
Figure 31 shows the mean size of brain tumour in each condition, and the funny ‘I’ shapes show the confidence interval of these means. Note that in the control group (0 hours), the mean size of the tumour is virtually zero (we wouldn’t expect them to have a tumour) and the error bar shows that there was very little variance across samples - this almost certainly means we cannot assume equal variances. Let’s verify this by fitting a model and viewing the residual plots (Figure 32).
phone_lm <- lm(tumour ~ usage, data = tumour_tib)
check_model(phone_lm)
Holy shit, the residuals were bad in the last task, but these ones be like
These residuals are hideously non-normal and the non-flat green lines on the relevant plots suggest heteroscedasticity. However, we can have the last laugh by fitting a robust model. By default
Table 99 shows that The Welch F is highly significant suggesting tumour size is significantly affected by the amount of hours the phone was active per day, F(5, 44.39) = 414.93, p < 0.001.
phone_rob <- oneway.test(tumour ~ usage, data = tumour_tib)
model_parameters(phone_rob) |>
display()| F | df | df (error) | p |
|---|---|---|---|
| 414.93 | 5 | 44.39 | < .001 |
Table 100 shows the parameter estimates and robust confidence intervals and tests.
model_parameters(model = phone_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(114) | p |
|---|---|---|---|---|---|
| (Intercept) | 0.02 | 2.71e-03 | (0.01, 0.02) | 6.47 | < .001 |
| usage (1 hour) | 0.50 | 0.06 | (0.37, 0.62) | 7.82 | < .001 |
| usage (2 hours) | 1.24 | 0.11 | (1.03, 1.46) | 11.30 | < .001 |
| usage (3 hours) | 3.00 | 0.17 | (2.66, 3.34) | 17.55 | < .001 |
| usage (4 hours) | 4.87 | 0.16 | (4.56, 5.18) | 31.28 | < .001 |
| usage (5 hours) | 4.71 | 0.17 | (4.37, 5.06) | 26.96 | < .001 |
Task 11.6
Using the Glastonbury data from Chapter 8 (
glastonbury.csv), fit a model to see if the change in hygiene (change) is significant across people with different musical tastes (music). Use a simple contrast to compare each group against the no affiliation group.
Load the data using (see Tip 2):
glasters_tib <- discovr::glastonburyLets start with a boxplot of the data (Figure 33):
glasters_tib |>
filter(!is.na(change)) |> # filter out cases that have NA (missing values) on the outcome variable
ggplot(data = _, aes(subculture, change)) +
geom_boxplot(colour = "#F05B12FF", fill = "#F05B12FF", alpha = 0.3) +
coord_cartesian(ylim = c(-3, 1.5)) +
scale_y_continuous(breaks = seq(-3, 1.5, 0.5)) +
labs(x = "Subculture", y = "Change in hygiene scores across the festival") +
theme_minimal()
R§eember that negative scores mean that hygiene declines across the festival. Figure 33 suggests that there was the least decline in hygiene for people with no subculture and metalheads, and greater hygiene decline for ravers who had a similar change to hipsters. There looks like a bit of skew and different spreads of scores which might suggest the knock on effects of non-normal residuals and heteroscedasticity but let’s verify this by fitting a model and using it to view residual plots (Figure 34).
stink_lm <- lm(change ~ subculture, data = glasters_tib)
check_model(stink_lm)
The Q-Q plot suggests normal residuals but the linearity and homoscedasticity plots look weird (Figure 34). This is because check_model() sometimes scale the y-axis weirdly. Figure 35 and Figure 36 use the native
plot(stink_lm, which = 1)
plot(stink_lm, which = 3)
Let’s fit the model. By default
Next we fit the model using this code:
| Parameter | Sum_Squares | df | Mean_Square | F | p | Omega2 | Omega2 95% CI |
|---|---|---|---|---|---|---|---|
| subculture | 4.65 | 3 | 1.55 | 3.27 | 0.024 | 0.05 | (0.00, 1.00) |
| Residuals | 56.36 | 119 | 0.47 |
Anova Table (Type 1 tests)
Table 101 tells us that there was a significant effect of the subculture membership on the change in hygiene scores, F(3, 119) = 3.27, p = 0.024, \(\hat{\omega}^2\) = 0.05 (0.00, 1.00). View the parameters using:
model_parameters(stink_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(119) | p |
|---|---|---|---|---|---|
| (Intercept) | -0.55 | 0.09 | (-0.73, -0.38) | -6.13 | < .001 |
| subculture (Raver) | -0.41 | 0.17 | (-0.74, -0.08) | -2.46 | 0.015 |
| subculture (Metalhead) | 0.03 | 0.16 | (-0.29, 0.35) | 0.18 | 0.860 |
| subculture (Hipster) | -0.41 | 0.20 | (-0.82, -4.21e-03) | -2.00 | 0.048 |
Table 102 shows that hypothesis 1 is supported (groupreward_vs_other): teaching with rewards led to significantly higher exam scores than the other methods, \(\hat{b}\) = -12.40 (-16.66, -8.14), t(27) = -5.98, p < 0.001. Hypothesis 2 is also supported (groupindifferent_vs_punish): teaching with indifference was associated with higher exam scores than teaching with punishment, \(\hat{b}\) = -6.00 (-10.91, -1.09), t(27) = -2.51, p = 0.019
Task 11.7
Repeat the analysis in Labcoat Leni’s Real Research 11.2 but using copulatory efficiency as the outcome.
Load the data using (see Tip 2):
eggs_tib <- discovr::cetinkaya_2006Let’s first look at some boxplots (Figure 37):
ggplot(eggs_tib, aes(groups, efficiency)) +
geom_boxplot(colour = "#3E9BFEFF", fill = "#3E9BFEFF", alpha = 0.4) +
coord_cartesian(ylim = c(0, 35)) +
scale_y_continuous(breaks = seq(0, 35, 5)) +
labs(x = "Fetish group", y = "Copulatory efficiency (out of 33)") +
theme_minimal()
There are outliers and skew in the non-fetishistic group and skew in the control group also. This is confirmed by diagnostic plots from a standard linear model (Figure 38).1
egg_lm <- lm(efficiency ~ groups, data = eggs_tib)
check_model(egg_lm)
plot(egg_lm, which = 1)
plot(egg_lm, which = 3)
The authors were wise to fit a nonparametric test. We’ll use a 20% trimmed mean test with post hoc tests.
Table 103 tells us that there was a significant effect, F(2, 19.64) = 20.47, p < 0.001. Although we’ve applied a robust test rather than a nonparametric one the results of the study are confirmed.
| F | df | df (error) | p | Estimate | 95% CI | Effectsize |
|---|---|---|---|---|---|---|
| 20.47 | 2 | 19.64 | < .001 | 0.67 | (0.48, 0.88) | Explanatory measure of effect size |
Let’s look at the post hoc tests in Table 104. There was no significant difference between the control group and the fetishistic group, \(\hat{\Psi}\) = -0.03 (-3.33, 3.27), p = 0.983, but significant differences were found between the control group and the nonfetishistic group, \(\hat{\Psi}\) = 8.04 (4.49, 11.59), p < 0.001, and between the fetishistic group and the non-fetishistic group, \(\hat{\Psi}\) = -8.07 (-11.86, -4.28), p < 0.001. We know by looking at the boxplot (the medians in particular) that the nonfetishistic males yielded significantly higher efficiency scores than both the non-fetishistic male quail and the control male quail. These results confirm the findings reported from the nonparametric tests in the paper (page 430):
A one-way ANOVA yielded a significant main effect of groups, $ F(2, 56) = 6.04, p < .05, ^2 = 0.18$. Paired comparisons (with the Bonferroni correction) indicated that the nonfetishistic male quail copulated with the live female quail (US) more efficiently than both the fetishistic male quail (mean difference = 6.61; 95% CI = 1.41, 11.82; p < .05) and the control male quail (mean difference = 5.83; 95% CI = 1.11, 10.56; p < .05). The difference between the efficiency scores of the fetishistic and the control male quail was not significant (mean difference = 0.78; 95% CI = $ -5.33, 3.77 $; p > .05).
| Group1 | Group2 | Psihat | 95% CI | p |
|---|---|---|---|---|
| Fetishistics | NonFetishistics | -8.07 | [-11.86, -4.28] | < .001 |
| Fetishistics | Control | -0.03 | [ -3.33, 3.27] | 0.983 |
| NonFetishistics | Control | 8.04 | [ 4.49, 11.59] | < .001 |
Task 11.8
A sociologist wanted to compare murder rates (
murder) recorded in each month in a year at three high-profile locations in London (street): Ruskin Avenue, Acacia Avenue and Rue Morgue. Fit a robust model with bootstrapping to see in which streets the most murders happened. The data are inmurder.csv.
Load the data using (see Tip 2):
murder_tib <- discovr::murderLet’s first look at some boxplots (Figure 37):
ggplot(murder_tib, aes(street, murder)) +
geom_boxplot(colour = "#3E9BFEFF", fill = "#3E9BFEFF", alpha = 0.4) +
coord_cartesian(ylim = c(0, 7)) +
scale_y_continuous(breaks = 0:7) +
labs(x = "Street name", y = "Number of murders per month") +
theme_minimal()
It looks like most murders were in the Rue Morgue, but there is skew in all of the groups. This is confirmed by diagnostic plots from a standard linear model (Figure 42):
murder_lm <- lm(murder ~ street, data = murder_tib)
check_model(murder_lm)
The residuals are both non-normal and heteroscedastic. We’ll use a 20% trimmed mean with bootstrap test with post hoc tests.
Call:
WRS2::t1waybt(formula = murder ~ street, data = murder_tib, nboot = 1000)
Effective number of bootstrap samples was 901.
Test statistic: 2.6011
p-value: 0.16315
Variance explained: 0.384
Effect size: 0.62
The output tells us that there was not a significant effect, F = 2.60, p = 0.163, although the effect size is large (38% of variance explained). We don’t need to look at the post hoc tests because the main effect is not significant.
Chapter 12
Task 12.1
A few years back I was stalked. You’d think they could have found someone a bit more interesting to stalk, but apparently times were hard. It wasn’t particularly pleasant, but could have been a lot worse. I imagined a world in which a psychologist tried two different therapies on different groups of stalkers (25 stalkers in each group – this variable is called
therapy). To the first group he gave cruel-to-be-kind therapy (every time the stalkers followed him around, or sent him a letter, the psychologist attacked them with a cattle prod). The second therapy was psychodyshamic therapy, in which stalkers were hypnotized and regressed into their childhood to discuss their penis (or lack of penis), their dog’s penis, the seventh penis of a seventh penis and any other penis that sprang to mind. The psychologist measured the number of hours stalking in one week both before (stalk_pre) and after (stalk_post) treatment (stalker.csv). Analyse the effect of therapy on stalking behaviour after therapy, adjusting for the amount of stalking behaviour before therapy.
Load the data using (see Tip 2):
stalk_tib <- discovr::stalkerThe design of this study warrants an ANCOVA approach to the model. First visualise the data (Figure 43 and Figure 44)
ggplot(stalk_tib, aes(x = therapy, y = stalk_post, colour = therapy)) +
geom_point(position = position_jitter(width = 0.1), alpha = 0.6) +
geom_violin(alpha = 0.2) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_cartesian(ylim = c(0, 100)) +
scale_y_continuous(breaks = seq(0, 100, 10)) +
scale_colour_viridis_d(begin = 0.3, end = 0.8) +
labs(x = "Therapy group", y = "Post-therapy stalking (hours)", colour = "Therapy group") +
theme_minimal() +
theme(legend.position = "none")
ggplot(stalk_tib, aes(x = stalk_pre, y = stalk_post)) +
geom_smooth(method = "lm", colour = "#CC6677", fill = "#CC6677", alpha = 0.2) +
geom_point(colour = "#882255") +
coord_cartesian(ylim = c(0, 100)) +
scale_x_continuous(breaks = seq(0, 100, 10)) +
scale_y_continuous(breaks = seq(0, 100, 10)) +
labs(x = "Pre-treatment stalking (hrs)", y = "Post-treatment stalking (hrs)") +
facet_wrap(~therapy) +
theme_minimal()
With only two therapy groups we don’t need to set contrasts. First let’s test the assumption of parallel slopes/homogeneity of regression slopes. Table 106 shows a non-significant interaction between the therapy type and pre-therapy stalking, F(1, 46) = 0.88, p = 0.353, suggesting that it’s reasonable to fit the parallel slopes model in stalk_lm.
| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| stalk_lm | lm | 47 | |||
| full_lm | lm | 46 | 1 | 0.88 | 0.353 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 107 summarizes the main model. The covariate significantly predicted the outcome variable, F(1, 47) = 50.46, p < 0.001, so the hours spent stalking after therapy depend on the extent of the initial problem (i.e. the hours spent stalking before therapy). More interesting is that after adjusting for the effect of initial stalking behaviour, the effect of therapy is significant, F(1, 47) = 5.49, p = 0.023.
| Parameter | Sum_Squares | df | Mean_Square | F | p | ω² (partial) | ω² 95% CI |
|---|---|---|---|---|---|---|---|
| stalk_pre | 4414.60 | 1 | 4414.60 | 50.46 | < .001 | 0.50 | (0.33, 1.00) |
| therapy | 480.26 | 1 | 480.26 | 5.49 | 0.023 | 0.08 | (9.16e-04, 1.00) |
| Residuals | 4111.72 | 47 | 87.48 |
Anova Table (Type 3 tests)
To interpret the results of the main effect of therapy we look at the adjusted means in Table 108. The adjusted means tell us that stalking behaviour was lower after the therapy involving the cattle prod than after Psychodyshamic therapy (after adjusting for baseline stalking).
| therapy | Mean | SE | 95% CI | t(47) |
|---|---|---|---|---|
| Cruel to be kind | 55.30 | 1.87 | (51.53, 59.06) | 29.55 |
| Psychodyshamic | 61.50 | 1.87 | (57.74, 65.27) | 32.87 |
Variable predicted: stalk_post; Predictors modulated: therapy; Predictors averaged: stalk_pre (65)
To interpret the covariate look at the parameter estimates (Table 109). The parameter estimate shows that there is a positive relationship between stalking pre-therapy and post-therapy, \(\hat{b}\) = 0.89 (0.64, 1.14), t(47) = 7.10, p < 0.001. That is, higher stalking levels pre-therapy corresponded with higher levels post therapy.
model_parameters(stalk_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(47) | p |
|---|---|---|---|---|---|
| (Intercept) | -2.84 | 8.35 | (-19.64, 13.96) | -0.34 | 0.735 |
| stalk pre | 0.89 | 0.13 | (0.64, 1.14) | 7.10 | < .001 |
| therapy (Psychodyshamic) | 6.20 | 2.65 | (0.88, 11.53) | 2.34 | 0.023 |
Evaluate the model assumptions using Figure 45. The plots suggest non-normal residuals and heterogeneity of variance. Which brings us to Task 2.
check_model(stalk_lm)
Task 12.2
Fit a robust model for Task 1
We should fit a robust model. We typically deal with the problem that matters most, which is heteroscedasticity. However, the Q-Q plot showed some fairly extreme deviations (especially in the lower tail) so we’ll re-fit the model using lmrob(), which will down-weight these cases. The robust model, contrary to the non-robust model, suggests that after adjusting for pre-therapy levels of stalking there is no significant difference between the types of treatments. Note also that the parameter estimate for the type of therapy has fallen from 6.2 (non-robust) to 1.6 (robust) suggesting a much smaller effect when robust estimation is used.
Call:
robust::lmRob(formula = stalk_post ~ stalk_pre + therapy, data = stalk_tib)
Residuals:
Min 1Q Median 3Q Max
-34.783005 -2.426286 0.003723 1.894398 21.643705
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.60233 3.20285 2.998 0.00433 **
stalk_pre 0.76178 0.04596 16.575 < 2e-16 ***
therapyPsychodyshamic 1.61503 0.92560 1.745 0.08755 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.207 on 47 degrees of freedom
Multiple R-Squared: 0.5159
Test for Bias:
statistic p-value
M-estimate 6.588 0.08627
LS-estimate 9.291 0.02567
Task 12.3
A marketing manager tested the benefit of soft drinks for curing hangovers. He took 15 people and got them drunk. The next morning as they awoke, dehydrated and feeling as though they’d licked a camel’s sandy feet clean with their tongue, he gave five of them water to drink, five of them Lucozade (a very nice glucose-based UK drink) and the remaining five a leading brand of cola (this variable is called
drink). He measured how well they felt (on a scale from 0 = I feel like death to 10 = I feel really full of beans and healthy) two hours later (this variable is calledwell). He measured howdrunkthe person got the night before on a scale of 0 = straight edge to 10 = flapping about like a haddock out of water (hangover.csv). Fit a model to see whether people felt better after different drinks when adjusting for how drunk they were the night before.
Load the data using (see Tip 2):
cure_tib <- discovr::hangoverThe design of this study warrants an ANCOVA approach to the model. First visualise the data (Figure 46 and Figure 47). Notice that the violin plots look like different drinking receptacles (cup, bottle, bowl).
ggplot(cure_tib, aes(x = drink, y = well, colour = drink)) +
geom_point(position = position_jitter(width = 0.1), alpha = 0.6) +
geom_violin(alpha = 0.2) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_cartesian(ylim = c(0, 10)) +
scale_y_continuous(breaks = seq(0, 10, 1)) +
scale_colour_viridis_d(begin = 0.3, end = 0.8) +
labs(x = "Drink", y = "Wellness (0-10)", colour = "Drink") +
theme_minimal() +
theme(legend.position = "none")
ggplot(cure_tib, aes(x = drunk, y = well)) +
geom_smooth(method = "lm", colour = "#CC6677", fill = "#CC6677", alpha = 0.2) +
geom_point(colour = "#882255", position = position_jitter()) +
coord_cartesian(ylim = c(0, 10)) +
scale_x_continuous(breaks = seq(0, 10, 1)) +
scale_y_continuous(breaks = seq(0, 10, 1)) +
labs(x = "Drunkness (0-10)", y = "Wellness (0-10)") +
facet_wrap(~drink) +
theme_minimal()
We have three levels of drink (water, Lucozade, Cola). Two of these drinks contain sugar/glucose, so water is a control. Therefore, we could reasonable set a contrast that compares ‘sugary drinks’ to water, and then a second contrast that compares the two sugary drinks. The order of the levels are Water, Lucozade, Cola, which we can confirm by executing:
levels(cure_tib$drink)[1] "Water" "Lucozade" "Cola"
Therefore, we’d set these contrasts as:
sugar_vs_none lucoz_vs_cola
Water -0.6666667 0.0
Lucozade 0.3333333 -0.5
Cola 0.3333333 0.5
First let’s test the assumption of parallel slopes/homogeneity of regression slopes. Table 112 shows a non-significant interaction between the drink type and how drunk someone was the night before, F(2, 9) = 2.73, p = 0.119, suggesting that it’s reasonable to fit the parallel slopes model in cure_lm.
| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| cure_lm | lm | 11 | |||
| full_lm | lm | 9 | 2 | 2.73 | 0.119 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 113 summarizes the main model. The covariate significantly predicted the outcome variable, F(1, 11) = 27.89, p < 0.001, so the drunkenness of the person influenced how well they felt the next day. More interesting is that at average levels of drunkenness the effect of the type of drink used as a hangover cure was significant, F(2, 11) = 4.32, p = 0.041.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| drunk | 11.19 | 1 | 11.19 | 27.89 | < .001 |
| drink | 3.46 | 2 | 1.73 | 4.32 | 0.041 |
| Residuals | 4.41 | 11 | 0.40 |
Anova Table (Type 3 tests)
Let’s look at the parameter estimates (Table 114). The parameter estimate shows that there was a negative relationship between how drunk a person got and their wellness the following day, \(\hat{b}\) = -0.55 (-0.78, -0.32), t(11) = -5.28, p < 0.001. That is, the more drunk someone got the less well they felt the following day. We can also see that at average levels of drunkeness, the mean wellness the following day was statistically comparable between the sugary drinks and water, \(\hat{b}\) = 0.64 (-0.13, 1.40), t(11) = 1.82, p = 0.095; however, the Cola and Lucozade groups had significantly different mean wellness, \(\hat{b}\) = -0.99 (-1.96, -0.01), t(11) = -2.23, p = 0.047.
model_parameters(cure_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(11) | p |
|---|---|---|---|---|---|
| (Intercept) | 7.40 | 0.39 | (6.54, 8.25) | 19.01 | < .001 |
| drunk | -0.55 | 0.10 | (-0.78, -0.32) | -5.28 | < .001 |
| drink (sugar_vs_none) | 0.64 | 0.35 | (-0.13, 1.40) | 1.82 | 0.095 |
| drink (lucoz_vs_cola) | -0.99 | 0.44 | (-1.96, -0.01) | -2.23 | 0.047 |
The adjusted means (Table 115) tell us that people felt most well in the Lucozade group. Wellness was similar in the Cola and Water groups. This finding explains the pattern of significance for the contrasts: because the Cola group has a very similar mean to the Water group, the combined mean of the Lucozade and Cola groups is also quite similar to the mean for water. The Cola mean ‘drags down’ the mean for ‘sugary drinks’. This is evident from the second contrast that shows that Lucozade drinkers had a significantly higher mean than Cola drinkers.
| drink | Mean | SE | 95% CI | t(11) |
|---|---|---|---|---|
| Water | 5.11 | 0.28 | (4.48, 5.73) | 17.99 |
| Lucozade | 6.24 | 0.30 | (5.59, 6.89) | 21.13 |
| Cola | 5.25 | 0.30 | (4.59, 5.92) | 17.41 |
Variable predicted: well; Predictors modulated: drink; Predictors averaged: drunk (3.4)
Evaluate the model assumptions using Figure 48. The plots suggest heterogeneity of variance, so a wise researcher would abadon this analysis and fit a robust model. Because there was a problem with heteroscedasticity we could use a model that applies heteroscedasticity-consistent standard errors as in Table 116. Contrary to the non-robust model, these robust tests suggest that after adjusting for how much was drunk there are no significant differences at all in wellness between the groups who had different drinks as hangover cures.
check_model(cure_lm)
model_parameters(cure_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(11) | p |
|---|---|---|---|---|---|
| (Intercept) | 7.40 | 0.44 | (6.44, 8.36) | 16.93 | < .001 |
| drunk | -0.55 | 0.11 | (-0.79, -0.31) | -4.98 | < .001 |
| drink (sugar_vs_none) | 0.64 | 0.31 | (-0.06, 1.33) | 2.02 | 0.069 |
| drink (lucoz_vs_cola) | -0.99 | 0.52 | (-2.13, 0.16) | -1.90 | 0.084 |
Task 12.4
Compute effect sizes for Task 3 and report the results.
The effect sizes for the main effect of drink were obtained in the previous task by including es_type = "omega", ci = 0.95 on model_parameters() as in Table 117.
| Parameter | Sum_Squares | df | Mean_Square | F | p | ω² (partial) | ω² 95% CI |
|---|---|---|---|---|---|---|---|
| drunk | 11.19 | 1 | 11.19 | 27.89 | < .001 | 0.64 | (0.29, 1.00) |
| drink | 3.46 | 2 | 1.73 | 4.32 | 0.041 | 0.31 | (0.00, 1.00) |
| Residuals | 4.41 | 11 | 0.40 |
Anova Table (Type 3 tests)
Task 12.5
The highlight of the elephant calendar is the annual elephant football event in Nepal (Google it). A heated argument burns between the African and Asian elephants. In 2010, the president of the Asian Elephant Football Association, an elephant named Boji, claimed that Asian elephants were more talented than their African counterparts. The head of the African Elephant Football Association, an elephant called Tunc, issued a press statement that read, ‘I make it a matter of personal pride never to take seriously any remark made by something that looks like an enormous scrotum.’ I was called in to settle things. I collected data from the two types of elephant (
elephant) over a season and recorded how many goals each elephant scored (goals) and how many years of experience the elephant had (experience). Analyse the effect of the type of elephant on goal scoring, covarying for the amount of football experience the elephant had (elephooty.csv).
Load the data using (see Tip 2):
ele_tib <- discovr::elephootyThe design of this study warrants an ANCOVA approach to the model. First visualise the data (Figure 49 and Figure 50).
ggplot(ele_tib, aes(x = elephant, y = goals, colour = elephant)) +
geom_point(position = position_jitter(width = 0.1), alpha = 0.6) +
geom_violin(alpha = 0.2) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_cartesian(ylim = c(0, 10)) +
scale_y_continuous(breaks = seq(0, 10, 1)) +
scale_colour_viridis_d(begin = 0.3, end = 0.8) +
labs(x = "Elephant", y = "Goals", colour = "Elephant") +
theme_minimal() +
theme(legend.position = "none")
ggplot(ele_tib, aes(x = experience, y = goals)) +
geom_smooth(method = "lm", colour = "#CC6677", fill = "#CC6677", alpha = 0.2) +
geom_point(colour = "#882255", position = position_jitter()) +
coord_cartesian(ylim = c(0, 9)) +
scale_x_continuous(breaks = seq(0, 9, 1)) +
scale_y_continuous(breaks = seq(0, 10, 1)) +
labs(x = "Experience (Years)", y = "Goals scored") +
facet_wrap(~elephant) +
theme_minimal()
The categorical predictor (elephant) only has two levels/categories so we don’t need to set contrasts. First let’s test the assumption of parallel slopes/homogeneity of regression slopes. Table 119 shows a non-significant interaction between the type of elephant and their previous experience, F(1, 116) = 0.05, p = 0.828, suggesting that it’s reasonable to fit the parallel slopes model in elephant_lm.
| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| elephant_lm | lm | 117 | |||
| full_lm | lm | 116 | 1 | 0.05 | 0.828 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 120 shows that the experience of the elephant significantly predicted how many goals they scored, F(1, 117) = 9.93, p = 0.002. After adjusting for the effect of experience, the effect of elephant is also significant. In other words, African and Asian elephants differed significantly in the number of goals they scored, F(1, 117) = 8.59, p = 0.004.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| experience | 32.32 | 1 | 32.32 | 9.93 | 0.002 |
| elephant | 27.95 | 1 | 27.95 | 8.59 | 0.004 |
| Residuals | 380.80 | 117 | 3.25 |
Anova Table (Type 3 tests)
The adjusted means (Table 121) tell us specifically, that African elephants scored significantly more goals than Asian elephants after adjusting for prior experience.
| elephant | Mean | SE | 95% CI | t(117) |
|---|---|---|---|---|
| Asian | 3.59 | 0.23 | (3.13, 4.05) | 15.37 |
| African | 4.56 | 0.23 | (4.10, 5.02) | 19.52 |
Variable predicted: goals; Predictors modulated: elephant; Predictors averaged: experience (4.2)
Let’s look at the parameter estimates (Table 122). The parameter estimate shows that there is a positive relationship between the two variables: the more prior football experience the elephant had, the more goals they scored in the season, \(\hat{b}\) = 0.31 (0.11, 0.50), t(117) = 3.15, p = 0.002.
model_parameters(elephant_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(117) | p |
|---|---|---|---|---|---|
| (Intercept) | 2.31 | 0.45 | (1.41, 3.21) | 5.09 | < .001 |
| experience | 0.31 | 0.10 | (0.11, 0.50) | 3.15 | 0.002 |
| elephant (African) | 0.97 | 0.33 | (0.31, 1.63) | 2.93 | 0.004 |
Evaluate the model assumptions using Figure 51. The Q-Q plot suggests normal residuals but the scale-location plot suggests heterogeneity (possibly). We could do a sensitivity analysis by applying heteroscedasticity-consistent standard errors to the tests of parameters (Table 123). Both effects are still highly significant using robust standard errors. Happy days.
check_model(elephant_lm)
model_parameters(elephant_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(117) | p |
|---|---|---|---|---|---|
| (Intercept) | 2.31 | 0.45 | (1.41, 3.21) | 5.07 | < .001 |
| experience | 0.31 | 0.09 | (0.13, 0.48) | 3.46 | < .001 |
| elephant (African) | 0.97 | 0.33 | (0.33, 1.62) | 2.98 | 0.004 |
Task 12.6
In Chapter 4 (Task 6) we looked at data from people who had fish or cats as pets and measured their life satisfaction and also how much they like animals (
pets.csv). Fit a model predicting life satisfaction from the type of pet a person had and their animal liking score (covariate)
Load the data using (see Tip 2):
pet_tib <- discovr::petsThe design of this study warrants an ANCOVA approach to the model. First visualise the data (Figure 52 and Figure 53).
ggplot(pet_tib, aes(x = pet, y = life_satisfaction, colour = pet)) +
geom_point(position = position_jitter(width = 0.1), alpha = 0.6) +
geom_violin(alpha = 0.2) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_cartesian(ylim = c(0, 80)) +
scale_y_continuous(breaks = seq(0, 80, 10)) +
scale_colour_viridis_d(begin = 0.3, end = 0.8) +
labs(x = "Pet", y = "Life satisfaction", colour = "Pet") +
theme_minimal() +
theme(legend.position = "none")
ggplot(pet_tib, aes(x = animal, y = life_satisfaction)) +
geom_smooth(method = "lm", colour = "#CC6677", fill = "#CC6677", alpha = 0.2) +
geom_point(colour = "#882255", position = position_jitter()) +
coord_cartesian(ylim = c(0, 80), xlim = c(0, 80)) +
scale_x_continuous(breaks = seq(0, 80, 10)) +
scale_y_continuous(breaks = seq(0, 80, 10)) +
labs(x = "Love of animals", y = "Life satisfaction") +
facet_wrap(~pet) +
theme_minimal()
The categorical predictor (pet) only has two levels/categories so we don’t need to set contrasts. First let’s test the assumption of parallel slopes/homogeneity of regression slopes. Table 125 shows a non-significant interaction between the type of pet and their love of animals, F(1, 16) = 0.52, p = 0.481, suggesting that it’s reasonable to fit the parallel slopes model in pet_lm.
| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| pet_lm | lm | 17 | |||
| full_lm | lm | 16 | 1 | 0.52 | 0.481 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 126 shows that the love of animals significantly predicted life satisfaction, F(1, 17) = 10.32, p = 0.005. After adjusting for the love of animals, the type of pet had a significant effect on life satisfaction. In other words, at average levels of liking animals whether someone owned a cat or fish significantly affected their life satisfaction, F(1, 17) = 16.45, p < 0.001.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| animal | 1325.40 | 1 | 1325.40 | 10.32 | 0.005 |
| pet | 2112.10 | 1 | 2112.10 | 16.45 | < .001 |
| Residuals | 2183.14 | 17 | 128.42 |
Anova Table (Type 3 tests)
The adjusted means (Table 127) tell us specifically that life satisfaction was higher in cat owners than fish owners at average levels of liking animals.
| pet | Mean | SE | 95% CI | t(17) |
|---|---|---|---|---|
| Fish | 38.55 | 3.27 | (31.64, 45.45) | 11.78 |
| Cat | 59.56 | 4.01 | (51.10, 68.02) | 14.85 |
Variable predicted: life_satisfaction; Predictors modulated: pet; Predictors averaged: animal (36)
Let’s look at the parameter estimates (Table 128). The parameter estimate shows that there is a positive relationship between liking animals and life satisfaction, \(\hat{b}\) = 0.54 (0.19, 0.90), t(17) = 3.21, p = 0.005: the more you like animals the higher your life satisfaction.
model_parameters(pet_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(17) | p |
|---|---|---|---|---|---|
| (Intercept) | 18.94 | 6.82 | (4.56, 33.33) | 2.78 | 0.013 |
| animal | 0.54 | 0.17 | (0.19, 0.90) | 3.21 | 0.005 |
| pet (Cat) | 21.01 | 5.18 | (10.08, 31.94) | 4.06 | < .001 |
Evaluate the model assumptions using Figure 54. The Q-Q plot suggests non-normal residuals and the homogeneity of variance plot suggests heteroscedasticity. We should deal with the most problematic issue by applying heteroscedasticity-consistent standard errors to the tests of parameters (Table 129). Both effects are still highly significant using robust standard errors so our conclusions don’t change.
check_model(pet_lm)
model_parameters(pet_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(17) | p |
|---|---|---|---|---|---|
| (Intercept) | 18.94 | 7.99 | (2.08, 35.81) | 2.37 | 0.030 |
| animal | 0.54 | 0.19 | (0.15, 0.93) | 2.92 | 0.010 |
| pet (Cat) | 21.01 | 5.00 | (10.46, 31.56) | 4.20 | < .001 |
Task 12.7
Fit a linear model predicting life satisfaction from the type of pet and the effect of love of animals using what you learnt in Chapter 9. Compare this model to your results for Task 6. What differences do you notice and why?
The conclusions are the same as from the linear model (Table 130), all that has changes is that we haven’t done overall model fit using F-statistic. Basically, this task is all about re-iterating the false distinction that exists between ‘ANCOVA’ and ‘regressio/linear model’ you are, in both cases, using the general linear model.
| Parameter | Coefficient | SE | 95% CI | t(17) | p |
|---|---|---|---|---|---|
| (Intercept) | 18.94 | 7.99 | (2.08, 35.81) | 2.37 | 0.030 |
| animal | 0.54 | 0.19 | (0.15, 0.93) | 2.92 | 0.010 |
| pet (Cat) | 21.01 | 5.00 | (10.46, 31.56) | 4.20 | < .001 |
Task 12.8
In Chapter 9 we compared the number of mischievous acts in people who had invisibility cloaks to those without (
cloak). Imagine we replicated that study, but changed the design so that we recorded the number of mischievous acts in these participants before the study began (mischief_pre) as well as during the study (mischief). Fit a model to see whether people with invisibility cloaks get up to more mischief than those without when factoring in their baseline level of mischief (invisibility_base.csv).
Load the data using (see Tip 2):
cloak_tib <- discovr::invisibility_baseThe design of this study warrants an ANCOVA approach to the model. First visualise the data (Figure 55 and Figure 56).
ggplot(cloak_tib, aes(x = cloak, y = mischief_post, colour = cloak)) +
geom_point(position = position_jitter(width = 0.1), alpha = 0.6) +
geom_violin(alpha = 0.2) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.9)) +
coord_cartesian(ylim = c(0, 15)) +
scale_y_continuous(breaks = seq(0, 15, 1)) +
scale_colour_viridis_d(begin = 0.3, end = 0.8) +
labs(x = "Group", y = "Number of mischievous acts", colour = "Group") +
theme_minimal() +
theme(legend.position = "none")
ggplot(cloak_tib, aes(x = mischief_pre, y = mischief_post)) +
geom_smooth(method = "lm", colour = "#CC6677", fill = "#CC6677", alpha = 0.2) +
geom_point(colour = "#882255", position = position_jitter()) +
coord_cartesian(ylim = c(0, 15), xlim = c(0, 10)) +
scale_x_continuous(breaks = seq(0, 10, 1)) +
scale_y_continuous(breaks = seq(0, 15, 1)) +
labs(x = "Number of mischievous acts (baseline)", y = "Number of mischievous acts (post)") +
facet_wrap(~cloak) +
theme_minimal()
The categorical predictor (cloak) only has two levels/categories so we don’t need to set contrasts. First let’s test the assumption of parallel slopes/homogeneity of regression slopes. Table 132 shows a non-significant interaction between the group to which people were assigned and their pre-experiment mischief, F(1, 76) = 0.34, p = 0.562, suggesting that it’s reasonable to fit the parallel slopes model in cloak_lm.
| Name | Model | df | df_diff | F | p |
|---|---|---|---|---|---|
| cloak_lm | lm | 77 | |||
| full_lm | lm | 76 | 1 | 0.34 | 0.562 |
Models were detected as nested (in terms of fixed parameters) and are compared in sequential order.
Table 133 shows that the baseline mischief significantly predicted mischief during the study, F(1, 77) = 7.40, p = 0.008. After adjusting for baseline mischief, whether or not people were given an invisibility cloak significant affected mischief during the study. In other words, at average levels of baseline mischief givng someone an invisibility cloak (or not) significantly affected the number of mischievous acts committed, F(1, 77) = 11.33, p = 0.001.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| mischief_pre | 22.97 | 1 | 22.97 | 7.40 | 0.008 |
| cloak | 35.17 | 1 | 35.17 | 11.33 | 0.001 |
| Residuals | 239.08 | 77 | 3.10 |
Anova Table (Type 3 tests)
The adjusted means (Table 134) tell us specifically that those with an invisibility cloak committed more mischievous acts.
| cloak | Mean | SE | 95% CI | t(77) |
|---|---|---|---|---|
| No cloak | 8.79 | 0.30 | (8.19, 9.39) | 29.07 |
| Cloak | 10.13 | 0.26 | (9.62, 10.65) | 38.99 |
Variable predicted: mischief_post; Predictors modulated: cloak; Predictors averaged: mischief_pre (4.5)
Let’s look at the parameter estimates (Table 135). The parameter estimate shows that there is a positive relationship pre-experiment mischief and subsequent mischief: the more mischievous you are the more mischief you get up to regardless of whether you are given an invisibility cloak, \(\hat{b}\) = 0.25 (0.07, 0.43), t(77) = 2.72, p = 0.008.
model_parameters(cloak_lm) |>
display()| Parameter | Coefficient | SE | 95% CI | t(77) | p |
|---|---|---|---|---|---|
| (Intercept) | 7.68 | 0.50 | (6.69, 8.68) | 15.39 | < .001 |
| mischief pre | 0.25 | 0.09 | (0.07, 0.43) | 2.72 | 0.008 |
| cloak (Cloak) | 1.34 | 0.40 | (0.55, 2.14) | 3.37 | 0.001 |
Evaluate the model assumptions using Figure 57. The Q-Q plot suggests non-normal residuals and the homogeneity of variance plot suggests heteroscedasticity. We deal with the most problematic issue by applying heteroscedasticity-consistent standard errors to the tests of parameters (Table 136). Both effects are still highly significant using robust standard errors so our conclusions don’t change.
check_model(cloak_lm)
model_parameters(cloak_lm, vcov = "HC4") |>
display()| Parameter | Coefficient | SE | 95% CI | t(77) | p |
|---|---|---|---|---|---|
| (Intercept) | 7.68 | 0.46 | (6.77, 8.60) | 16.74 | < .001 |
| mischief pre | 0.25 | 0.08 | (0.08, 0.42) | 2.91 | 0.005 |
| cloak (Cloak) | 1.34 | 0.39 | (0.56, 2.13) | 3.41 | 0.001 |
Chapter 13
Task 13.1
People have claimed that listening to heavy metal, because of its aggressive sonic palette and often violent or emotionally negative lyrics, leads to angry and aggressive behaviour (Selfhout et al., 2008). As a very non-violent metal fan this accusation bugs me. Imagine I designed a study to test this possibility. I took groups of self-classifying metalheads and non-metalheads (
fan) and assigned them randomly to listen to 15 minutes of either the sound of an angle grinder scraping a sheet of metal (control noise), metal music, or pop music (soundtrack). Each person rated theirangeron a scale ranging from 0 (“All you need is love, da, da, da-da-da”) to 100 (“All you wanna do is drag me down, all I wanna do is stamp you out”). Fit a model to test my idea (metal.csv).
Load the data using (see Tip 2):
metal_tib <- discovr::metalFigure 58 shows that after listening to an angle grinder both groups are angry. After listening to metal, the metal heads score low on anger but the pop fans score high. The reverse is true after listening to pop music.
ggplot(metal_tib, aes(x = soundtrack, y = anger, colour = fan)) +
geom_violin(alpha = 0.5) +
stat_summary(fun.data = "mean_cl_normal", position = position_dodge(width = 0.9)) +
scale_y_continuous(breaks = seq(0, 100, 10)) +
scale_colour_viridis_d(begin = 0.3, end = 0.8) +
labs(y = "Anger (out of 100)", x = "Soundtrack", colour = "Fan group") +
theme_minimal()
Table 137 shows the model fitted using afex::aov_4(). The soundtrack × fan interaction was significant, F(2, 84) = 433.28, p < 0.001, indicating that the soundtrack combined with the type of fan significantly affected anger. This effect supersedes the main effects, so we’ll ignore them. Figure 58 shows that after listening to an angle grinder both groups are angry. After listening to metal, the metal heads score low on anger but the pop fans score high. The reverse is true after listening to pop music. We can test this formally with simple effects analysis. Table 138 shows (in combination with the means) that:
- Anger levels did not differ in pop and metal fans when listening to an angle grinder, F(1, 84) = 0.30, p = 1.000.
- When listening to metal music, anger was significantly higher in pop fans than metal fans, F(1, 84) = 431.55, p < 0.001.
- When listening to pop music, anger was significantly higher in metal fans than pop fans, F(1, 84) = 434.80, p < 0.001.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| soundtrack | 20469.76 | 2 | 10234.88 | 116.82 | < .001 |
| fan | 6.40 | 1 | 6.40 | 0.07 | 0.788 |
| soundtrack:fan | 75919.27 | 2 | 37959.63 | 433.28 | < .001 |
| Residuals | 7359.20 | 84 | 87.61 |
Anova Table (Type 3 tests)
metal_se <- estimate_contrasts(model = metal_afx,
contrast = "fan",
by = "soundtrack",
comparison = "joint",
p_adjust = "bonferroni")
model_parameters(metal_se) |>
display()| Contrast | soundtrack | df1 | df2 | F | p |
|---|---|---|---|---|---|
| fan | Angle grinder | 1 | 84 | 0.30 | > .999 |
| fan | Metal | 1 | 84 | 431.55 | < .001 |
| fan | Pop | 1 | 84 | 434.80 | < .001 |
p-value adjustment method: Bonferroni
Task 13.2
Compute omega squared for the effects in Task 1 and report the results of the analysis.
The simplest way to add effect sizes is by including es_type = "omega", ci = 0.95 in model_parameters() as in
model_parameters(metal_afx, es_type = "omega", ci = 0.95) |>
display()| Parameter | Sum_Squares | df | Mean_Square | F | p | Omega2 (partial) | Omega2 95% CI |
|---|---|---|---|---|---|---|---|
| soundtrack | 20469.76 | 2 | 10234.88 | 116.82 | < .001 | 0.72 | (0.64, 1.00) |
| fan | 6.40 | 1 | 6.40 | 0.07 | 0.788 | 0.00 | (0.00, 1.00) |
| soundtrack:fan | 75919.27 | 2 | 37959.63 | 433.28 | < .001 | 0.91 | (0.88, 1.00) |
| Residuals | 7359.20 | 84 | 87.61 |
Anova Table (Type 3 tests)
We could report (remember if you’re using APA format to drop the leading zeros before p-values and ω2, for example report p = .035 instead of p = 0.035).
Task 13.3
In Chapter 5 we used some data that related to male and female arousal levels when watching The Notebook or a documentary about notebooks (
notebook.csv). Fit a model to test whether men and women differ in their reactions to different types of films.
Load the data using (see Tip 2):
notebook_tib <- discovr::notebookTable 140 shows the model fitted using afex::aov_4(). The main effect of gender identity is significant, F(1, 36) = 7.29, p = 0.011, as is the main effect of film, F(1, 36) = 141.87, p < 0.001 and the interaction, F(1, 36) = 4.64, p = 0.038. The interaction supersedes the main effects, so we’ll ignore them. Figure 59 shows that psychological arousal is very similar for self-identifying men and women during the documentary about notebooks (it is low for both). However, for the notebook, self-identifying men experienced greater psychological arousal than self-identifying women. The interaction is likely to reflect that there is a difference between men and women for one type of film (the notebook) but not the other (the documentary about notebooks). This is confirmed by the simple effects analysis, which shows (Table 141)
- Arousal levels were significantly higher in self-identifying men when watching the film ‘The Notebook’, F(1, 36) = 11.78, p = 0.003.
- Arousal levels did not significantly differ between men and women when watching the documentary about notebooks, F(1, 36) = 0.15, p = 1.000.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| film | 5784.02 | 1 | 5784.02 | 141.87 | < .001 |
| gender_identity | 297.02 | 1 | 297.02 | 7.29 | 0.011 |
| film:gender_identity | 189.22 | 1 | 189.22 | 4.64 | 0.038 |
| Residuals | 1467.70 | 36 | 40.77 |
Anova Table (Type 3 tests)
ggplot(notebook_tib, aes(x = film, y = arousal, colour = gender_identity, shape = gender_identity)) +
geom_point(position = position_jitterdodge(dodge.width = 0.2, jitter.width = 0.2), alpha = 0.5) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.2)) +
coord_cartesian(ylim = c(0, 50)) +
scale_y_continuous(breaks = seq(0, 50, 10)) +
scale_colour_viridis_d(begin = 0.3, end = 0.85) +
labs(x = "Film", y = "Arousal", colour = "Gender identity", shape = "Gender identity") +
theme_minimal()
notebook_se <- estimate_contrasts(model = notebook_afx,
contrast = "gender_identity",
by = "film",
comparison = "joint",
p_adjust = "bonferroni")
model_parameters(notebook_se) |>
display()| Contrast | film | df1 | df2 | F | p |
|---|---|---|---|---|---|
| gender_identity | The Notebook | 1 | 36 | 11.78 | 0.003 |
| gender_identity | A documentary about notebooks | 1 | 36 | 0.15 | > .999 |
p-value adjustment method: Bonferroni
Task 13.4
Compute omega squared for the effects in Task 3 and report the results of the analysis.
The simplest way to add effect sizes is by including es_type = "omega", ci = 0.95 in model_parameters() as in
model_parameters(notebook_afx, es_type = "omega", ci = 0.95) |>
display()| Parameter | Sum_Squares | df | Mean_Square | F | p | Omega2 (partial) | Omega2 95% CI |
|---|---|---|---|---|---|---|---|
| film | 5784.02 | 1 | 5784.02 | 141.87 | < .001 | 0.78 | (0.67, 1.00) |
| gender_identity | 297.02 | 1 | 297.02 | 7.29 | 0.011 | 0.14 | (0.01, 1.00) |
| film:gender_identity | 189.22 | 1 | 189.22 | 4.64 | 0.038 | 0.08 | (0.00, 1.00) |
| Residuals | 1467.70 | 36 | 40.77 |
Anova Table (Type 3 tests)
We could report (remember if you’re using APA format to drop the leading zeros before p-values and ω2, for example report p = .035 instead of p = 0.035).
Task 13.5
In Chapter 5 we used some data that related to learning in men and women when either reinforcement or punishment was used in teaching (
teach_method.csv). Analyse these data to see whether men and women’s learning differs according to the teaching method used.
Load the data using (see Tip 2):[^I rename method to teaching_method because at the time of writing a variable name called method seems to really mess up estimate_contrasts().]
Table 143 shows the model fitted using afex::aov_4(). The main effect of sex was significant, F(1, 16) = 12.25, p = 0.003, but the main effect of the teaching method was not, F(1, 16) = 2.25, p = 0.153. The interaction was significant, F(1, 16) = 30.25, p < 0.001. The interaction supersedes the main effects, so we’ll ignore them. Figure 60 and the simple effects analysis (Table 144) shows that
- For women, marks were significantly higher when the teacher was nice than when they used punishment, F(1, 16) = 8.00, p = 0.024.
- For men the reverse was true, marks were significantly higher when the teacher used punishment than when they were nice, F(1, 16) = 24.50, p < 0.001.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| teaching_method | 11.25 | 1 | 11.25 | 2.25 | 0.153 |
| sex | 61.25 | 1 | 61.25 | 12.25 | 0.003 |
| teaching_method:sex | 151.25 | 1 | 151.25 | 30.25 | < .001 |
| Residuals | 80.00 | 16 | 5.00 |
Anova Table (Type 3 tests)
ggplot(teach_tib, aes(x = sex, y = mark, colour = teaching_method, shape = teaching_method)) +
geom_point(position = position_jitterdodge(dodge.width = 0.2, jitter.width = 0.2), alpha = 0.5) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.2)) +
coord_cartesian(ylim = c(0, 22)) +
scale_y_continuous(breaks = seq(0, 22, 2)) +
scale_colour_viridis_d(begin = 0.3, end = 0.85) +
labs(x = "Sex", y = "Mark (%)", colour = "Teaching method", shape = "Teaching method") +
theme_minimal()
teach_se <- estimate_contrasts(model = teach_afx,
contrast = "sex",
by = "teaching_method",
comparison = "joint",
p_adjust = "bonferroni")
model_parameters(teach_se) |>
display()| Contrast | teaching_method | df1 | df2 | F | p |
|---|---|---|---|---|---|
| sex | Electric shock | 1 | 16 | 40.50 | < .001 |
| sex | Being nice | 1 | 16 | 2.00 | 0.353 |
p-value adjustment method: Bonferroni
Task 13.6
At the start of this Chapter I described a way of empirically researching whether I wrote better songs than my old bandmate Malcolm, and whether this depended on the type of song (a symphony or song about flies). The outcome variable was the number of screams elicited by audience members during the songs. Plot the data and fit a model to test my hypothesis that the type of song moderates which songwriter is preferred (
escape.csv).
Load the data using (see Tip 2):
escape_tib <- discovr::escapeTable 145 shows the model fitted using afex::aov_4(). The main effect of the type of song was significant, F(1, 64) = 20.87, p < 0.001, as was the main effect of the songwriter, F(1, 64) = 9.94, p = 0.002. The interaction was also significant, F(1, 64) = 5.07, p = 0.028. The interaction supersedes the main effects, so we’ll ignore them. Figure 61 and the simple effects analysis (Table 146) shows that for songs about a fly the number of screams were comparable between songwriters, F(1, 64) = 0.41, p = 1.000, but for symphonies there were significantly more screams when Andy wrote the song than when Malcolm did, F(1, 64) = 14.61, p < 0.001.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| song_type | 74.13 | 1 | 74.13 | 20.87 | < .001 |
| songwriter | 35.31 | 1 | 35.31 | 9.94 | 0.002 |
| song_type:songwriter | 18.01 | 1 | 18.01 | 5.07 | 0.028 |
| Residuals | 227.29 | 64 | 3.55 |
Anova Table (Type 3 tests)
ggplot(escape_tib, aes(x = song_type, y = screams, colour = songwriter, shape = songwriter)) +
geom_point(position = position_jitterdodge(dodge.width = 0.2, jitter.width = 0.2), alpha = 0.5) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.2)) +
coord_cartesian(ylim = c(0, 15)) +
scale_y_continuous(breaks = seq(0, 15, 1)) +
scale_colour_viridis_d(begin = 0.3, end = 0.85) +
labs(x = "Type of song", y = "Number of screams", colour = "Songwriter", shape = "Songwriter") +
theme_minimal()
escape_se <- estimate_contrasts(model = escape_afx,
contrast = "songwriter",
by = "song_type",
comparison = "joint",
p_adjust = "bonferroni")
model_parameters(escape_se) |>
display()| Contrast | song_type | df1 | df2 | F | p |
|---|---|---|---|---|---|
| songwriter | Symphony | 1 | 64 | 14.61 | < .001 |
| songwriter | Fly song | 1 | 64 | 0.41 | > .999 |
p-value adjustment method: Bonferroni
Task 13.7
Compute omega squared for the effects in Task 6 and report the results of the analysis.
The simplest way to add effect sizes is by including es_type = "omega", ci = 0.95 in model_parameters() as in
model_parameters(escape_afx, es_type = "omega", ci = 0.95) |>
display()| Parameter | Sum_Squares | df | Mean_Square | F | p | Omega2 (partial) | Omega2 95% CI |
|---|---|---|---|---|---|---|---|
| song_type | 74.13 | 1 | 74.13 | 20.87 | < .001 | 0.23 | (0.09, 1.00) |
| songwriter | 35.31 | 1 | 35.31 | 9.94 | 0.002 | 0.12 | (0.02, 1.00) |
| song_type:songwriter | 18.01 | 1 | 18.01 | 5.07 | 0.028 | 0.06 | (0.00, 1.00) |
| Residuals | 227.29 | 64 | 3.55 |
Anova Table (Type 3 tests)
We could report (remember if you’re using APA format to drop the leading zeros before p-values and ω2, for example report p = .035 instead of p = 0.035).
Task 13.8
There are reports of increases in injuries related to playing games consoles. These injuries were attributed mainly to muscle and tendon strains. A researcher hypothesized that a stretching warm-up before playing would help lower injuries, and that athletes would be less susceptible to injuries because their regular activity makes them more flexible. She took 60 athletes and 60 non-athletes (
athlete); half of them played on a Nintendo Switch and half watched others playing as a control (switch), and within these groups half did a 5-minute stretch routine before playing/watching whereas the other half did not (stretch). The outcome was a pain score out of 10 (where 0 is no pain, and 10 is severe pain) after playing for 4 hours (injury). Fit a model to test whether athletes are less prone to injury, and whether the prevention programme worked (switch.csv).
Load the data using (see Tip 2):
switch_tib <- discovr::switchTable 148 shows the model fitted using afex::aov_4().
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| athlete | 99.01 | 1 | 99.01 | 64.82 | < .001 |
| switch | 85.01 | 1 | 85.01 | 55.66 | < .001 |
| stretch | 16.88 | 1 | 16.88 | 11.05 | 0.001 |
| athlete:switch | 69.01 | 1 | 69.01 | 45.18 | < .001 |
| athlete:stretch | 1.88 | 1 | 1.88 | 1.23 | 0.270 |
| switch:stretch | 21.68 | 1 | 21.68 | 14.19 | < .001 |
| athlete:switch:stretch | 9.08 | 1 | 9.08 | 5.94 | 0.016 |
| Residuals | 171.07 | 112 | 1.53 |
Anova Table (Type 3 tests)
ggplot(switch_tib, aes(switch, injury, colour = stretch)) +
geom_point(position = position_jitterdodge(dodge.width = 0.75, jitter.width = 0.2), size = 1, alpha = 0.2) +
stat_summary(fun.data = "mean_cl_normal", position = position_dodge(width = 0.75)) +
facet_wrap(~athlete) +
labs(x = "Playing or watching", y = "Injury score (out of 10)", colour = "Stretching before play") +
discovr::scale_color_prayer() +
coord_cartesian(ylim = c(0, 10)) +
scale_y_continuous(breaks = seq(0, 10, 2)) +
theme_minimal() +
theme(axis.text.x = element_text(size = rel(0.8)))
Task 13.9
A researcher was interested in what factors contributed to injuries resulting from game console use. She tested 40 participants who were randomly assigned to either an active or static game played on either a Nintendo Switch or Xbox One Kinect. At the end of the session their physical condition was evaluated on an injury severity scale. The data are in the file
xbox.csvwhich contains the variablesgame(0 = static, 1 = active),console(0 = Switch, 1 = Xbox), andinjury(a score ranging from 0 (no injury) to 20 (severe injury)). Fit a model to see whether injury severity is significantly predicted from the type of game, the type of console and their interaction
Load the data using (see Tip 2):
xbox_tib <- discovr::xboxTable 149 shows the model fitted using afex::aov_4(). The main effect of the type of game was significant, F(1, 36) = 25.86, p < 0.001, as was the main effect of the console, F(1, 36) = 3.58, p = 0.067. The interaction was also significant, F(1, 36) = 5.05, p = 0.031. The interaction supersedes the main effects, so we’ll ignore them. Figure 63 and the simple effects analysis (Table 150) shows that for active games there were significantly more injuries on the Switch than the Xbox, F(1, 36) = 8.57, p = 0.012, but for static games the number of injuries were comparable in the two consoles, F(1, 36) = 0.06, p = 1.000.
| Parameter | Sum_Squares | df | Mean_Square | F | p |
|---|---|---|---|---|---|
| game | 184.90 | 1 | 184.90 | 25.86 | < .001 |
| console | 25.60 | 1 | 25.60 | 3.58 | 0.067 |
| game:console | 36.10 | 1 | 36.10 | 5.05 | 0.031 |
| Residuals | 257.40 | 36 | 7.15 |
Anova Table (Type 3 tests)
ggplot(xbox_tib, aes(x = game, y = injury, colour = console, shape = console)) +
geom_point(position = position_jitterdodge(dodge.width = 0.2, jitter.width = 0.2), alpha = 0.5) +
stat_summary(fun.data = "mean_cl_normal", geom = "pointrange", position = position_dodge(width = 0.2)) +
coord_cartesian(ylim = c(0, 20)) +
scale_y_continuous(breaks = seq(0, 20, 1)) +
scale_colour_viridis_d(begin = 0.3, end = 0.85) +
labs(x = "Type of game", y = "Number of injuries", colour = "Console", shape = "Console") +
theme_minimal()
xbox_se <- estimate_contrasts(model = xbox_afx,
contrast = "console",
by = "game",
comparison = "joint",
p_adjust = "bonferroni")
model_parameters(xbox_se) |>
display()| Contrast | game | df1 | df2 | F | p |
|---|---|---|---|---|---|
| console | Active | 1 | 36 | 8.57 | 0.012 |
| console | Static | 1 | 36 | 0.06 | > .999 |
p-value adjustment method: Bonferroni
Chapter 14
Task 14.1
Using the cosmetic surgery example, run the analysis described in Section 14.6.6 but also including BDI as a fixed effect predictor. What differences does including this predictor make?
Load the data using (see Tip 2):
cosmetic_tib <- discovr::cosmeticTo fit the model follow the instructions in section 14.6 of the book except that the model itself should include months, base_qoL, reason and bdi in the list of predictors. If you haven’t done so already create the variable months as described in Section 14.6.5.
cosmetic_tib <- cosmetic_tib |>
mutate(
months = days*12/365
)Fit the model by adapting the code in Section 14.6.7. In that section we build the model up sequentially and compared the models at each step. Therefore we can take the final model (cosmetic_mlm) as our starting point (because we’ve already seen the steps leading up to this model) and then update this model to also include bdi as a predictor. We can then use test_lrt() to compare the models. Table 151 shows that adding bdi significantly improves the fit of the model, \(\chi^2\)(1) = 55.49, p < 0.001. Table 152 also shows that including bdi explains an additional 1.02% of the variance.
Looking at the fixed effects (Table 153) bdi, significantly predicted quality of life after surgery, \(\hat{b}\) = 0.18 (0.13, 0.23), z = 7.52, p < 0.001. Including bdi has not affected the interpretation of the other variables in the model. Obviously the parameter estimates themselves have changed, but how we’d interpret them is unchanged except that we’re now interpreting each one at average levels of depression. Crucially the months by reason interaction is still significant. We can follow this up using the same code as in the chapter to obtain the simple slopes in Table 154. For those who had surgery to change their appearance, their quality of life increased over time but not significantly so, \(\hat{b}\) = 0.47 (-0.29, 1.23), z = 1.21, p = 0.227. In contrast, for those who had surgery to help with a physical problem, quality of life significantly increased over time, \(\hat{b}\) = 0.91 (0.15, 1.68), z = 2.33, p = 0.020. In essence, the inclusion of BDI has made very little difference to the conclusions or parameter estimates for the substantive interaction.
| Name | Model | df | df_diff | Chi2 | p |
|---|---|---|---|---|---|
| cosmetic_mlm | glmmTMB | 9 | |||
| bdi_mlm | glmmTMB | 10 | 1 | 55.49 | < .001 |
compare_performance(cosmetic_mlm, bdi_mlm) |>
display()| Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma |
|---|---|---|---|---|---|---|---|---|---|
| cosmetic_mlm | glmmTMB | 11667.1 (<.001) | 11667.2 (<.001) | 11715.4 (<.001) | 0.71 | 0.09 | 0.68 | 9.13 | 9.25 |
| bdi_mlm | glmmTMB | 11613.6 (>.999) | 11613.8 (>.999) | 11667.3 (>.999) | 0.72 | 0.10 | 0.69 | 8.97 | 9.08 |
model_parameters(bdi_mlm) |>
display()| Parameter | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (Intercept) | 22.01 | 3.94 | (14.28, 29.73) | 5.58 | < .001 |
| months | 0.47 | 0.39 | (-0.29, 1.23) | 1.21 | 0.227 |
| reason [Physical reason] | -1.87 | 0.96 | (-3.75, 0.01) | -1.95 | 0.051 |
| base qol | 0.47 | 0.02 | (0.43, 0.52) | 19.99 | < .001 |
| bdi | 0.18 | 0.02 | (0.13, 0.23) | 7.52 | < .001 |
| months × reason [Physical reason] | 0.45 | 0.13 | (0.20, 0.70) | 3.49 | < .001 |
| Parameter | Coefficient |
|---|---|
| SD (Intercept: clinic) | 16.63 |
| SD (months: clinic) | 1.69 |
| Cor (Intercept~months: clinic) | -0.71 |
| SD (Residual) | 9.08 |
estimate_slopes(bdi_mlm,
trend = "months",
by = "reason",
ci = 0.95) |>
display()| reason | Slope | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| Change appearance | 0.47 | 0.39 | (-0.29, 1.23) | 1.21 | 0.227 |
| Physical reason | 0.91 | 0.39 | ( 0.15, 1.68) | 2.33 | 0.020 |
Marginal effects estimated for months; Type of slope was dY/dX
Task 14.2
Miller et al. (2007) tested the ‘hidden-estrus’ theory, which suggests that unlike other female mammals, humans do not experience an ‘estrus’ phase during which they are more sexually receptive, proceptive, selective and attractive. If this theory is wrong then human heterosexual men should find women most attractive during the fertile phase of their menstrual cycle compared to the pre-fertile (menstrual) and post-fertile (luteal) phase. Miller used the
tipsobtained by dancers (id) at a lap dancing club as a proxy for their sexual attractiveness and also recorded the phase of the dancer’s menstrual cycle during a given shift (cyclephase), and whether they were using hormonal contraceptives (contraceptive). Dancers provided data from between 9 and 29 of their shifts. Fit a multilevel model using these data (miller_2007.csv) to see whethertipscan be predicted fromcyclephase,contraceptiveand their interaction, allowing the overall level of tips to vary across dancers. Is the ‘hidden-estrus’ hypothesis supported?
Load the data using (see Tip 2):
miller_tib <- discovr::miller_2007In the model that Miller et al. (2007) fitted, they did not assume that there would be random slopes (i.e., the relationship between each predictor and tips was not assumed to vary within lap dancers). This decision is appropriate for contraceptive because this variable didn’t vary at level 2 (the lap dancer was either taking contraceptives or not, so this could not be set up as a random effect because it doesn’t vary over our level 2 variable of participant). Also, because cyclephase is a categorical variable with three unordered categories we could not expect a linear relationship with tips: we expect tips to vary over categories but the categories themselves have no meaningful order. However, we might expect tips to vary over participants (some lap dancers will naturally get more money than others) and we can factor this variability in by allowing the intercept to be random. As such, we’re fitting a random intercept model to the data.
The substantive hypothesis is around an interaction between cyclephase and contraceptive and to get at this we need the main effects in the model, so let’s fit the full model (rather than building up sequentially) and look at the F tests based on Type III sums of squares, as shown in Table 155. A few things to note:
- The authors report in the paper that they used restricted maximum-likelihood estimation (REML), so I have included
REML = TRUEto override the default of maximum likelihood estimation. - If using Type III sums of squares we need categorical variables to have orthogonal contrasts. We can achieve this by manually setting a contrast for
cyclephaseandcontraceptiveusingcontrasts()(which is covered in the book). However, we can also do this directly withinglmmTMB()using thecontrastsargument. Note I have includedcontrasts = list(contraceptive = "contr.sum", cyclephase = "contr.sum"). This sets a contrast for each variable included withinlist(), we can use any built in contrast, but I have chosen"contr.sum"for both variables, which is an orthogonal contrast.
Table 156 shows that about 51% of the variance in tips is explained by the model. Table 155 shows that the phase of the dancer’s cycle significantly predicted tip income, \(\Chi^2\)(2) = 54.90, p < 0.001 and this interacted with whether or not the dancer was having natural cycles or was on the contraceptive pill, \(\Chi^2\)(2) = 10.60, p = 0.005. Table 155 also tells us that standard deviation in tips across dancers was 59.76 95% CI [39.51, 90.39]. In other words, the average tip per dancer varied substantially confirming the decision to treat the intercept as a random variable.
To tease the interaction apart we can use the contrasts in Table 158. A few things to note about the code:
- The built in contrast
"trt.vs.ctrlk"compares each condition to the kth condition (e.g. the last). Socyclephase = "trt.vs.ctrlk"asks for each category ofcyclephaseto be compared to the last category. This variable is coded such that Fertile is the last category which means this contrast compares every category to Fertile. I have done this because the authors predictions stated that the fertile phase should differ from the other phases: “human heterosexual men should find women most attractive during the fertile phase of their menstrual cycle compared to the pre-fertile (menstrual) and post-fertile (luteal) phase” - The built in contrast
"trt.vs.ctrl1"compares each condition to the 1st condition. It’s the same as settingref = 1. Socyclephase = "trt.vs.ctrlk"asks for each category ofcontraceptiveto be compared to the first category. In fact there are only 2 categories so this is as good a comparison as any.
Table 158 shows these contrasts. The first contrast tells us the following: if we worked out the relative difference in tips between the fertile phase and the Luteal phase, how much more do those on contraceptive pill earn compared to those in their natural cycle? The answer is about $86 (the \(\hat{b}\)). In other words, there is a combined effect of being in a natural cycle and being in the fertile phase compared to the menstrual phase, and this is significant, \(\bar{X}_\text{Diff}\) = 86.09 (26.81, 145.36), t(247) = 2.86, p = 0.005. The second contrast tells us the following: if we worked out the relative difference in tips between the fertile phase and the menstrual phase, how much more do those in their natural cycle earn compared to those on contraceptive pills? The answer is about $90. In other words, there is a combined effect of being on the pill (relative to in a natural cycle) and being in the fertile phase (relative to the Menstrual phase), and this is significant, \(\bar{X}_\text{Diff}\) = 89.94 (22.63, 157.25), t(247) = 2.63, p = 0.009.
To conclude, then, this study showed that the ‘estrus-hidden’ hypothesis is wrong: men did find women more attractive (as indexed by how many lap dances a woman performed and therefore how much she earned) during the fertile phase of their cycle compared to the other phases.
| Parameter | χ² | df | p |
|---|---|---|---|
| contraceptive | 6.75 | 1 | 0.009 |
| cyclephase | 54.90 | 2 | < .001 |
| contraceptive:cyclephase | 10.60 | 2 | 0.005 |
Anova Table (Type 3 tests)
model_performance(miller_mlm) |>
display()| AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma |
|---|---|---|---|---|---|---|---|
| 3039.8 | 3040.4 | 3068.2 | 0.51 | 0.30 | 0.29 | 89.28 | 92.90 |
model_parameters(miller_mlm) |>
display()| Parameter | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (Intercept) | 224.76 | 15.96 | (193.48, 256.03) | 14.08 | < .001 |
| contraceptive1 | 41.47 | 15.96 | (10.19, 72.76) | 2.60 | 0.009 |
| cyclephase1 | -64.80 | 10.29 | (-84.98, -44.63) | -6.30 | < .001 |
| cyclephase2 | 3.72 | 9.17 | (-14.25, 21.69) | 0.41 | 0.685 |
| contraceptive1 × cyclephase1 | -15.63 | 10.29 | (-35.81, 4.55) | -1.52 | 0.129 |
| contraceptive1 × cyclephase2 | -13.71 | 9.17 | (-31.68, 4.27) | -1.49 | 0.135 |
| Parameter | Coefficient | 95% CI |
|---|---|---|
| SD (Intercept: id) | 59.76 | (39.51, 90.39) |
| SD (Residual) | 92.90 | (84.84, 101.74) |
| cyclephase_trt.vs.ctrlk | contraceptive_trt.vs.ctrl1 | Difference | 95% CI | SE | t(247) | p |
|---|---|---|---|---|---|---|
| Luteal - Fertile | Using contraceptives - Not using contraceptives | 86.09 | (26.81, 145.36) | 30.09 | 2.86 | 0.005 |
| Menstural - Fertile | Using contraceptives - Not using contraceptives | 89.94 | (22.63, 157.25) | 34.17 | 2.63 | 0.009 |
Variable predicted: tips; Predictors contrasted: cyclephase, contraceptive; p-values are uncorrected.
Task 14.3
Hill et al. (2007) examined whether providing children with a leaflet based on the ‘theory of planned behaviour’ increased their exercise. There were four different interventions (
intervention): a control group, a leaflet, a leaflet and quiz, and a leaflet and a plan. A total of 503 children from 22 different classrooms were sampled (classroom). The 22 classrooms were randomly assigned to the four different conditions. Children were asked ‘On average over the last three weeks, I have exercised energetically for at least 30 minutes ______ times per week’ after the intervention (post_exercise). Run a multilevel model analysis on these data (hill_2007.csv) to see whether the intervention affected the children’s exercise levels (the hierarchy is children within classrooms within interventions).
Load the data using (see Tip 2):
hill_tib <- discovr::hill_2007Table 160 shows that about 8% of the variance in post-intervention exercise is explained by the intervention. However, Table 159 shows that post-intervention exercise was not significantly different across intervention groups, \(\chi^2\)(3) = 5.50, p = 0.139. In terms of whether these effects varied by classroom, the ICC (Table 160) was 5.6% which is fairly small and the standard deviation of post-intervention exercise across classrooms was 0.13 95% CI [0.08, 0.22]
The result from these data could be that the intervention failed to affect exercise. However, there is a lot of individual variability in the amount of exercise people get. A better approach would be to take into account the amount of self-reported exercise prior to the study as a covariate, which leads us to the next task.
| Name | Model | df | df_diff | Chi2 | p |
|---|---|---|---|---|---|
| Null model | glmmTMB | 3 | |||
| Full model | glmmTMB | 6 | 3 | 5.50 | 0.139 |
model_performance(hill_mlm) |>
display()| AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma |
|---|---|---|---|---|---|---|---|
| 836.9 | 837.0 | 862.2 | 0.08 | 0.03 | 0.06 | 0.53 | 0.54 |
model_parameters(hill_mlm) |>
display()| Parameter | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (Intercept) | 1.65 | 0.07 | (1.51, 1.79) | 22.99 | < .001 |
| intervention [Leaflet] | 0.19 | 0.10 | (-0.02, 0.39) | 1.80 | 0.071 |
| intervention [Leaflet + Quiz] | 0.14 | 0.10 | (-0.06, 0.33) | 1.34 | 0.181 |
| intervention [Leaflet + Plan] | 0.25 | 0.11 | (0.04, 0.46) | 2.37 | 0.018 |
| Parameter | Coefficient | 95% CI |
|---|---|---|
| SD (Intercept: classroom) | 0.13 | (0.08, 0.22) |
| SD (Residual) | 0.54 | (0.51, 0.57) |
Task 14.4
Repeat the analysis in Task 3 but include the pre-intervention exercise scores (pre_exercise) as a covariate. What difference does this make to the results?
Given we have already fitted a model for intervention that we now want to extend to include pre_exercise as a predictor, we can use update() to create the new model. Table 162 shows that adding pre_exercise to the model significantly improves the fit, \(\chi^2\)(1) = 449.20, p < 0.001. Table 163 shows that about 61% of the variance in post-intervention exercise is explained by the model, which is 53.39% more than the previous model - a huge increase!
In terms of whether these effects varied by classroom, the ICC (Table 163) was 1.4% which is fairly small and the standard deviation of post-intervention exercise across classrooms was 0.04 95% CI [0.01, 0.14].
Looking at the parameter estimates (Table 164) shows that the effect of pre-intervention exercise level is a significant predictor of post-intervention exercise level, \(\hat{b}\) = 0.72 (0.66, 0.77), z = 27.12, p < 0.001, and, most interestingly, the effects of intervention are all now significant. Specifically, at average levels of pre-intervention exercise compared to no intervention, the post-intervention exercise was significantly greater after a leaflet, \(\hat{b}\) = 0.16 (0.06, 0.26), z = 3.13, p = 0.002, after a leaflet and quiz, \(\hat{b}\) = 0.21 (0.11, 0.30), z = 4.16, p < 0.001, and after a leaflet and plan, \(\hat{b}\) = 0.22 (0.12, 0.32), z = 4.25, p < 0.001. These results show that when we adjust for the amount of self-reported exercise prior to the study, the intervention group becomes a significant predictor of post-intervention exercise levels.
| Name | Model | df | df_diff | Chi2 | p |
|---|---|---|---|---|---|
| hill_mlm | glmmTMB | 6 | |||
| hill_pre_mlm | glmmTMB | 7 | 1 | 449.20 | < .001 |
model_performance(hill_mlm) |>
display()| AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | ICC | RMSE | Sigma |
|---|---|---|---|---|---|---|---|
| 836.9 | 837.0 | 862.2 | 0.08 | 0.03 | 0.06 | 0.53 | 0.54 |
model_parameters(hill_mlm) |>
display()| Parameter | Coefficient | SE | 95% CI | z | p |
|---|---|---|---|---|---|
| (Intercept) | 1.65 | 0.07 | (1.51, 1.79) | 22.99 | < .001 |
| intervention [Leaflet] | 0.19 | 0.10 | (-0.02, 0.39) | 1.80 | 0.071 |
| intervention [Leaflet + Quiz] | 0.14 | 0.10 | (-0.06, 0.33) | 1.34 | 0.181 |
| intervention [Leaflet + Plan] | 0.25 | 0.11 | (0.04, 0.46) | 2.37 | 0.018 |
| Parameter | Coefficient | 95% CI |
|---|---|---|
| SD (Intercept: classroom) | 0.13 | (0.08, 0.22) |
| SD (Residual) | 0.54 | (0.51, 0.57) |
Chapter 15
Task 15.1
It is common that lecturers obtain reputations for being ‘hard’ or ‘light’ markers, but there is often little to substantiate these reputations. A group of students investigated the consistency of marking by submitting the same essays to four different lecturers. The outcome was the percentage mark given by each lecturer and the predictor was the lecturer who marked the report (
tutor_marks.csv). Compute the F-statistic for the effect of marker by hand.
Load the data using (see Tip 2):
tutor_tib <- discovr::tutor_marksThere were eight essays, each marked by four different lecturers. The data are in Table 165 The mean mark that each essay received and the variance of marks for a particular essay are shown too. Now, the total variance within essay marks will in part be due to different lecturers marking (some are more critical and some more lenient), and in part by the fact that the essays themselves differ in quality (individual differences). Our job is to tease apart these sources.
| Essay id | Tutor 1 | Tutor 2 | Tutor 3 | Tutor 4 | \(\overline{X}_\text{essay}\) | \(s_\text{essay}^2\) |
|---|---|---|---|---|---|---|
| qegggt | 62 | 58 | 63 | 64 | 61.75 | 6.92 |
| nghnol | 63 | 60 | 68 | 65 | 64.00 | 11.33 |
| gcomqu | 65 | 61 | 72 | 65 | 65.75 | 20.92 |
| hphjhp | 68 | 64 | 58 | 61 | 62.75 | 18.25 |
| jrcbpi | 69 | 65 | 54 | 59 | 61.75 | 43.58 |
| acnlxu | 71 | 67 | 65 | 50 | 63.25 | 84.25 |
| tenwth | 78 | 66 | 67 | 50 | 65.25 | 132.92 |
| unwyka | 75 | 73 | 75 | 45 | 67.00 | 216.00 |
The total sum of squares
The \(\text{SS}_\text{T}\) is calculated as:
\[ \text{SS}_\text{T} = \sum_{i=1}^{N} (x_i-\overline{X})^2 \]
To use this equation we need the overall mean of all marks (regardless of the essay marked or who marked it). Table 166 shows descriptive statistics for all marks. The grand mean (the mean of all scores) is 63.94.
| Mean | SD | IQR | Min | Max | n |
|---|---|---|---|---|---|
| 63.94 | 7.42 | 7.75 | 45 | 78 | 32 |
To get the total sum of squares, we take each mark, subtract from it the mean of all scores (63.94) and square this difference (that’s the \((x_i-\overline{X})^2\) in the equation) to get the squared errors. Table 167 shows this process. We then add these squared differences to get the sum of squared error:
\[ \begin{aligned} \text{SS}_\text{T} &= 3.76 + 35.28 + 0.88 + 0.00 + 0.88 + 15.52 + 16.48 + 1.12 + \\ &\quad 1.12 + 8.64 + 64.96 + 1.12 + 16.48 + 0.00 + 35.28 + 8.64 +\\ &\quad 25.60 + 1.12 + 98.80 + 24.40 + 49.84 + 9.36 + 1.12 + 194.32 +\\ &\quad 197.68 + 4.24 + 9.36 + 194.32 + 122.32 + 82.08 + 122.32 + 358.72 +\\ &= 1705.76 \end{aligned} \]
The degrees of freedom for this sum of squares is \(N–1\), or 31.
| Essay id | Tutor | Mark | Mean | Error (score - mean) | Error squared | |
|---|---|---|---|---|---|---|
| qegggt | Tutor 1 | 62 | 63.94 | -1.94 | 3.76 | |
| qegggt | Tutor 2 | 58 | 63.94 | -5.94 | 35.28 | |
| qegggt | Tutor 3 | 63 | 63.94 | -0.94 | 0.88 | |
| qegggt | Tutor 4 | 64 | 63.94 | 0.06 | 0.00 | |
| nghnol | Tutor 1 | 63 | 63.94 | -0.94 | 0.88 | |
| nghnol | Tutor 2 | 60 | 63.94 | -3.94 | 15.52 | |
| nghnol | Tutor 3 | 68 | 63.94 | 4.06 | 16.48 | |
| nghnol | Tutor 4 | 65 | 63.94 | 1.06 | 1.12 | |
| gcomqu | Tutor 1 | 65 | 63.94 | 1.06 | 1.12 | |
| gcomqu | Tutor 2 | 61 | 63.94 | -2.94 | 8.64 | |
| gcomqu | Tutor 3 | 72 | 63.94 | 8.06 | 64.96 | |
| gcomqu | Tutor 4 | 65 | 63.94 | 1.06 | 1.12 | |
| hphjhp | Tutor 1 | 68 | 63.94 | 4.06 | 16.48 | |
| hphjhp | Tutor 2 | 64 | 63.94 | 0.06 | 0.00 | |
| hphjhp | Tutor 3 | 58 | 63.94 | -5.94 | 35.28 | |
| hphjhp | Tutor 4 | 61 | 63.94 | -2.94 | 8.64 | |
| jrcbpi | Tutor 1 | 69 | 63.94 | 5.06 | 25.60 | |
| jrcbpi | Tutor 2 | 65 | 63.94 | 1.06 | 1.12 | |
| jrcbpi | Tutor 3 | 54 | 63.94 | -9.94 | 98.80 | |
| jrcbpi | Tutor 4 | 59 | 63.94 | -4.94 | 24.40 | |
| acnlxu | Tutor 1 | 71 | 63.94 | 7.06 | 49.84 | |
| acnlxu | Tutor 2 | 67 | 63.94 | 3.06 | 9.36 | |
| acnlxu | Tutor 3 | 65 | 63.94 | 1.06 | 1.12 | |
| acnlxu | Tutor 4 | 50 | 63.94 | -13.94 | 194.32 | |
| tenwth | Tutor 1 | 78 | 63.94 | 14.06 | 197.68 | |
| tenwth | Tutor 2 | 66 | 63.94 | 2.06 | 4.24 | |
| tenwth | Tutor 3 | 67 | 63.94 | 3.06 | 9.36 | |
| tenwth | Tutor 4 | 50 | 63.94 | -13.94 | 194.32 | |
| unwyka | Tutor 1 | 75 | 63.94 | 11.06 | 122.32 | |
| unwyka | Tutor 2 | 73 | 63.94 | 9.06 | 82.08 | |
| unwyka | Tutor 3 | 75 | 63.94 | 11.06 | 122.32 | |
| unwyka | Tutor 4 | 45 | 63.94 | -18.94 | 358.72 | |
| Total | — | — | — | — | — | 1705.76 |
The within-participant sum of squares
The within-participant sum of squares, \(\text{SS}_\text{W}\), is calculated using the variance in marks for each essay, which are shown in Table 165. The ns are the number of scores on which the variances are based (i.e. in this case the number of marks each essay received, which was 4).
\[ \text{SS}_\text{W} = s_\text{essay 1}^2(n_1-1)+s_\text{essay 2}^2(n_2-1) + s_\text{essay 3}^2(n_3-1) +\ldots+ s_\text{essay 8}^2(n_8-1) \]
Using the values in in Table 165 we get
\[ \begin{aligned} \text{SS}_\text{W} &= s_\text{essay 1}^2(n_1-1)+s_\text{essay 2}^2(n_2-1) + s_\text{essay 3}^2(n_3-1) +\ldots+ s_\text{essay 8}^2(n_8-1) \\ &= 6.92(4-1) + 11.33(4-1) + 20.92(4-1) + 18.25(4-1) + \\ &\quad 43.58(4-1) + 84.25(4-1) + 132.92(4-1) + 216.00(4-1)\\ &= 1602.51. \end{aligned} \]
The degrees of freedom for each essay are \(n–1\) (i.e. the number of marks per essay minus 1). To get the total degrees of freedom we add the df for each essay
\[ \begin{aligned} \text{df}_\text{W} &= df_\text{essay 1}+df_\text{essay 2} + df_\text{essay 3} +\ldots+ df_\text{essay 8} \\ &= (4-1) + (4-1) + (4-1) + (4-1) + (4-1) + (4-1) + (4-1) + (4-1)\\ &= 24 \end{aligned} \]
A shortcut would be to multiply the degrees of freedom per essay (3) by the number of essays (8): \(3 \times 8 = 24\)
The model sum of squares
We calculate the model sum of squares \(\text{SS}_\text{M}\) as:
\[ \sum_{g = 1}^{k}n_g(\overline{x}_g-\overline{x}_\text{grand})^2 \]
Therefore, we need to subtract the mean of all marks (in Table 165) from the mean mark awarded by each tutor (in Table 168), then squares these differences, multiply them by the number of essays marked and sum the results.
| tutor | Mean | Variance |
|---|---|---|
| Tutor 1 | 68.88 | 31.84 |
| Tutor 2 | 64.25 | 22.21 |
| Tutor 3 | 65.25 | 47.93 |
| Tutor 4 | 57.38 | 62.55 |
Using the values in Table 168, \(\text{SS}_\text{M}\) is
\[ \begin{aligned} \text{SS}_\text{M} &= 8(68.88 – 63.94)^2 +8(64.25 – 63.94)^2 + 8(65.25 – 63.94)^2 + 8(57.38–63.94)^2\\ &= 554 \end{aligned} \] The degrees of freedom are the number of conditions (in this case the number of markers) minus 1, \(df_M = k-1 = 3\)
The residual sum of squares
We now know that there are 1706 units of variation to be explained in our data, and that the variation across our conditions accounts for 1602 units. Of these 1602 units, our experimental manipulation can explain 554 units. The final sum of squares is the residual sum of squares (\(\text{SS}_\text{R}\)), which tells us how much of the variation cannot be explained by the model. Knowing \(\text{SS}_\text{W}\) and \(\text{SS}_\text{M}\) already, the simplest way to calculate \(\text{SS}_\text{R}\) is through subtraction
\[ \begin{aligned} \text{SS}_\text{R} &= \text{SS}_\text{W}-\text{SS}_\text{M}\\ &=1602.51-554\\ &=1048.51. \end{aligned} \]
The degrees of freedom are calculated in a similar way
\[ \begin{aligned} df_\text{R} &= df_\text{W}-df_\text{M}\\ &= 24-3\\ &= 21. \end{aligned} \]
The mean squares
Next, convert the sums of squares to mean squares by dividing by their degrees of freedom
\[ \begin{aligned} \text{MS}_\text{M} &= \frac{\text{SS}_\text{M}}{df_\text{M}} = \frac{554}{3} = 184.67 \\ \text{MS}_\text{R} &= \frac{\text{SS}_\text{R}}{df_\text{R}} = \frac{1048.51}{21} = 49.93. \\ \end{aligned} \]
The F-statistic
The F-statistic is calculated by dividing the model mean squares by the residual mean squares:
\[ F = \frac{\text{MS}_\text{M}}{\text{MS}_\text{R}} = \frac{184.67}{49.93} = 3.70. \]
This value of F can be compared against a critical value based on its degrees of freedom (which are 3 and 21 in this case).
Task 2
Repeat the analysis for Task 1 using R and interpret the results.
We already loaded the data in Task 1.
Figure 64 shows the mean marks for each tutor as well as the distribution. Tutors 1 yo 3 have a lot of overlap in their distributions and have similar average marks, whereas Tutor 4 seems a little lower on average (although the distribution still overlaps a lot with other tutors).
ggplot(tutor_tib, aes(x = tutor, y = mark)) +
geom_violin(colour = "#28BBECFF", fill = "#28BBECFF", alpha = 0.1) +
stat_summary(fun.data = "mean_cl_normal", colour = "#28BBECFF") +
coord_cartesian(ylim = c(0, 80)) +
scale_y_continuous(breaks = seq(0, 80, 10)) +
labs(y = "Mark (%)", x = "Marker") +
theme_minimal()
Table 169 shows the main summary table from the model. We would conclude that tutors did not significantly differ in the marks they awarded, F(1.67, 11.71) = 3.70, p = 0.063. In light of this non-significance, no follow up tests are necessary.
| Parameter | Sum_Squares | Sum_Squares_Error | df | df (error) | Mean_Square | F | p |
|---|---|---|---|---|---|---|---|
| tutor | 554.13 | 1048.38 | 1.67 | 11.71 | 89.53 | 3.70 | 0.063 |
Anova Table (Type 3 tests)
Task 3
Calculate the effect sizes for the analysis in Task 1
The easiest way to do this is to add es_type = "omega" into model_parameters() so that the effect sizes are added to the summary table as in Table 170.
model_parameters(tutor_afx, es_type = "omega") |>
display(use_symbols = TRUE)| Parameter | Sum_Squares | Sum_Squares_Error | df | df (error) | Mean_Square | F | p | ω² (partial) |
|---|---|---|---|---|---|---|---|---|
| tutor | 554.13 | 1048.38 | 1.67 | 11.71 | 89.53 | 3.70 | 0.063 | 0.24 |
Anova Table (Type 3 tests)
Alternatively, we could pass the model (tutor_afx) into omega_squared() using the code below
omega_squared(tutor_afx) |>
display()| Parameter | Omega2 (partial) | 95% CI |
|---|---|---|
| tutor | 0.24 | [0.00, 1.00] |
One-sided CIs: upper bound fixed at [1.00].
Either way we get the same results, which show a fairly substantial effect size, \(\hat{\omega}_p\) = 0.24 [0.00, 1.00]. In light of this, we might amend how we report this model.
Task 4
In the previous chapter we came across the beer-goggles effect. In that chapter, we saw that the beer-goggles effect was stronger for unattractive faces. We took a follow-up sample of 26 people and gave them doses of alcohol (0 pints, 2 pints, 4 pints and 6 pints of lager) over four different weeks. We asked them to rate a bunch of photos of unattractive faces in either dim or bright lighting. The outcome measure was the mean attractiveness rating (out of 100) of the faces and the predictors were the dose of alcohol and the lighting conditions (
goggles_lighting.csv). Do alcohol dose and lighting interact to magnify the beer goggles effect?
Load the data using (see Tip 2):
goggle_tib <- discovr::goggles_lightingTable 171 shows the main summary table from the model and Figure 65 shows the interaction plot. The plot shows a similar profile or ratings after no and 2 pints in the dim and light environments. However, at 4 and 6 pints the attractiveness ratings decline faster in dim lighting conditions than in bright environments.
From Table 171, the main effect of lighting shows that the attractiveness ratings of photos was significantly lower when the lighting was dim compared to when it was bright, F(1, 25) = 23.42, p < 0.001, \(\hat{\omega}^2_p\) = 0.25. The main effect of alcohol shows that the attractiveness ratings of photos of faces was significantly affected by how much alcohol was consumed, F(2.62, 65.47) = 104.39, p < 0.001, \(\hat{\omega}^2_p\) = 0.75. However, both of these effects are superseded by the interaction, which shows that the effect that alcohol had on ratings of attractiveness was significantly moderated by the brightness of the lighting, F(2.81, 70.23) = 22.22, p < 0.001, \(\hat{\omega}^2_p\) = 0.35. To interpret this effect let’s move onto the next task.
| Parameter | Sum_Squares | Sum_Squares_Error | df | df (error) | Mean_Square | F | p | ω² (partial) |
|---|---|---|---|---|---|---|---|---|
| lighting | 1993.92 | 2128.33 | 1.00 | 25.00 | 85.13 | 23.42 | < .001 | 0.25 |
| alcohol | 38591.65 | 9242.60 | 2.62 | 65.47 | 141.18 | 104.39 | < .001 | 0.75 |
| lighting:alcohol | 5765.42 | 6487.33 | 2.81 | 70.23 | 92.37 | 22.22 | < .001 | 0.35 |
Anova Table (Type 3 tests)
afex::afex_plot(object = goggles_afx,
x = "alcohol",
trace = "lighting",
legend_title = "Lighting condition",
error = "within",
mapping = c("shape", "color")) +
labs(x = "Dose of alcohol", y = "Attractiveness rating (0-100)") +
scale_color_viridis_d(begin = 0.2, end = 0.85, option = "turbo") +
theme_minimal()
Task 5
Interpret the simple effect of alcohol at different levels of lighting.
The interaction effect is shown in Figure 65. Table 172 shows the simple effect of alcohol at different levels of lighting. These analyses are not particularly helpful because they show that alcohol significantly affected attractiveness ratings both when lights were dim, F(3, 25) = 169.88, p < 0.001, and when they were bright, F(3, 25) = 39.12, p < 0.001. This is an example where it might be worth looking at the alternative simple effects, that is, the simple effect of lighting within each dose of alcohol. These effects are shown in Table 173 and they are somewhat more useful because they show that lighting did not have a significant effect on attractiveness ratings after no alcohol, F(1, 25) = 1.49, p = 0.937, or after 2 pints of lager, F(1, 25) = 4.56, p = 0.171, but did have a significant effect after 4, F(1, 25) = 27.17, p < 0.001, and 6, F(1, 25) = 55.59, p < 0.001, pints. Basically the effect of lighting got stronger as the alcohol dose increased: you can see this on Figure 65 by the circle and triangle getting further apart as we move up the doses of alcohol.
estimate_contrasts(model = goggles_afx,
contrast = "alcohol",
by = "lighting",
comparison = "joint",
p_adjust = "bonferroni") |>
display()| Contrast | lighting | df1 | df2 | F | p |
|---|---|---|---|---|---|
| alcohol | Dim | 3 | 25 | 169.88 | < .001 |
| alcohol | Bright | 3 | 25 | 39.12 | < .001 |
Predictors averaged: id; p-value adjustment method: Bonferroni
estimate_contrasts(model = goggles_afx,
contrast = "lighting",
by = "alcohol",
comparison = "joint",
p_adjust = "bonferroni") |>
display()| Contrast | alcohol | df1 | df2 | F | p |
|---|---|---|---|---|---|
| lighting | None | 1 | 25 | 1.49 | 0.937 |
| lighting | X2.pints | 1 | 25 | 4.56 | 0.171 |
| lighting | X4.pints | 1 | 25 | 27.17 | < .001 |
| lighting | X6.pints | 1 | 25 | 55.59 | < .001 |
Predictors averaged: id; p-value adjustment method: Bonferroni
Task 6
Early in my career I looked at the effect of giving children information about entities. In one study (Field, 2006), I used three novel entities (the quoll, quokka and cuscus) and children were told negative things about one of the entities, positive things about another, and given no information about the third (our control). After the information I asked the children to place their hands in three wooden boxes each of which they believed contained one of the aforementioned entities (
field_2006.sav). Draw and interpret error bar and Q-Q plots of the data.
Load the data using (see Tip 2):
field_tib <- discovr::field_2006Figure 66 shows the mean latency to latency to approach each box. Children were sloer to approach the box they believed contained an animal about which they’d heard threat information. The Q-Q plots (Figure 67) show that latencies are very non-normal for all three boxes. This will be, in part, because if a child hadn’t approached the box within 15 seconds we (for ethical reasons) assumed that they did not want to complete the task and so assigned a score of 15 and asked them to approach the next box. These days, given these data, I’d use a robust test (Task 8!) back when I conducted these research these tests were not so readily available and I log-transformed the scores to reduce the skew (a practice I’m not keen on these days). This brings us onto Task 7!
ggplot(field_tib, aes(x = info_type, y = latency)) +
stat_summary(fun.data = "mean_cl_normal", colour = "#28BBECFF") +
coord_cartesian(ylim = c(0, 10)) +
scale_y_continuous(breaks = seq(0, 10, 1)) +
labs(y = "Latency to appoach (s)", x = "Information type") +
theme_minimal()
ggplot(field_tib, aes(sample = latency)) +
qqplotr::stat_qq_band(fill = "#C4F134FF", alpha = 0.3) +
qqplotr::stat_qq_line(colour = "#C4F134FF") +
qqplotr::stat_qq_point(colour = "#3E9BFEFF", alpha = 0.5, size = 1) +
labs(x = "Theoretical quantiles", y = "Sample quantiles") +
facet_wrap(~info_type, scales = "free") +
theme_minimal()
Task 7
Log-transform the scores in Task 7, make a Q-Q plot of the transformed scores and interpret it.
First we use mutate() to transform the latency scores.
We obtain the Q-Q plots using the same code except that we change the variable to be plotted from latency to ln_latency. The Q-Q plots (Figure 68) show that log latencies for all boxes are a lot more normally distributed than the raw latencies but the positive and no information latencies still show a lack of normality. This brings us onto Task 8!
ggplot(field_tib, aes(sample = ln_latency)) +
qqplotr::stat_qq_band(fill = "#C4F134FF", alpha = 0.3) +
qqplotr::stat_qq_line(colour = "#C4F134FF") +
qqplotr::stat_qq_point(colour = "#3E9BFEFF", alpha = 0.5, size = 1) +
labs(x = "Theoretical quantiles", y = "Sample quantiles") +
facet_wrap(~info_type, scales = "free") +
theme_minimal()
Task 8
Analyse the data in Task 7 with a robust model. Do children take longer to put their hands in a box that they believe contains an entity about which they have been told nasty things?
The results are shown below and they mirror the analysis that I conducted on the log-transformed values in the paper itself (in case you want to check). The main effect of the type of information was significant, F(1.24, 94.32) = 1.24, p < 0.001. The post hoc tests show a significantly longer time to approach the box containing the negative information animal compared to the positive information animal, \(\hat{\psi}\) = 2.42, pobserved < 0.001, pcrit = 0.017, and compared to the no information box, \(\hat{\psi}\) = 2.07, pobserved < 0.001, pcrit = 0.025. Children also approached the box containing the positive information animal significantly faster than the no information animal, \(\hat{\psi}\) = -0.21, pobserved = 0.014, pcrit = 0.050.
WRS2::rmanova(y = field_tib$latency, groups = field_tib$info_type, blocks = field_tib$id)Call:
WRS2::rmanova(y = field_tib$latency, groups = field_tib$info_type,
blocks = field_tib$id)
Test statistic: F = 78.1521
Degrees of freedom 1: 1.24
Degrees of freedom 2: 94.32
p-value: 0
WRS2::rmmcp(y = field_tib$latency, groups = field_tib$info_type, blocks = field_tib$id)Call:
WRS2::rmmcp(y = field_tib$latency, groups = field_tib$info_type,
blocks = field_tib$id)
psihat ci.lower ci.upper p.value p.crit sig
Threat vs. Positive 2.41558 1.71695 3.11421 0.00000 0.0169 TRUE
Threat vs. No information 2.07013 1.35313 2.78713 0.00000 0.0250 TRUE
Positive vs. No information -0.20597 -0.40537 -0.00658 0.01351 0.0500 TRUE
Chapter 16
Coming soon
Chapter 17
Coming soon
Chapter 18
Coming soon
Chapter 19
Coming soon