Smart Alex solutions Chapter 18

Smart Alex charatcer from Discovering Statistics using R and RStudio
This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.

Task 18.1

Rerun the analysis in this chapter using principal component analysis and compare the results to those in the chapter.

Load the data

To load the data from the CSV file (assuming you have set up a project folder as suggested in the book):

raq_tib <- here::here("data/raq.csv") %>%
  readr::read_csv()

Alternative, load the data directly from the discovr package:

raq_tib <- discovr::raq

Fit the model

All of the descriptives, correlation matrices, KMO tests and so on are unaffected by our choice of principal components as the method of dimension reduction. Also in the book chapter we did parallel analysis based on components and this suggested 4 components (as did the parallel analysis based on components). So, follow everything in the book (code and interpretation) up to the point at which we for the main model.

As a reminder, we set up the correlation matrix to be based on polychoric correlations

# create tibble that contains only the questionnaire items
raq_items_tib <- raq_tib %>% 
  dplyr::select(-id)
# get the polychoric correlation object
raq_poly <- psych::polychoric(raq_items_tib)
# store the polychoric correlation matrix
raq_cor <- raq_poly$rho

Things start to get different at the point of fitting the model. We can use the same code as the book chapter except that we use the pca() (or principal() if you prefer) function instead of fa() and we need to remove scores = "tenBerge" because for PCA there is only a single method for computing component scores (and this is used by default). We also need to add rotate = "oblimin" because for PCA the default is to use an orthogonal rotation (varimax). I’ve also changed the name of the object to store this in to raq_pca to reflect the fact we’ve done PCA and not component analysis.

From the raw data:

raq_pca <- psych::pca(raq_items_tib,
                    nfactors = 4,
                    cor = "poly",
                    rotate = "oblimin"
                    )

From the correlation matrix:

raq_pca <- psych::pca(raq_cor,
                    n.obs = 2571,
                    nfactors = 4,
                    rotate = "oblimin"
                    )

To see the output:

raq_pca

Note that the components are labelled TC1 to TC4 (unlike for the component analysis in the book where the labels were MR1 etc.). We are given some information about how much variance each component accounts for.

##                        TC1  TC2  TC3  TC4
## SS loadings           3.67 2.77 2.60 2.32
## Proportion Var        0.16 0.12 0.11 0.10
## Cumulative Var        0.16 0.28 0.39 0.49
## Proportion Explained  0.32 0.24 0.23 0.20
## Cumulative Proportion 0.32 0.57 0.80 1.00

We see, for example, from Proportion Var that TC1 accounts for 0.16 of the overall variance (16%) and TC2 accounts for 0.12 of the variance (12%) and so on. The Cumulative Var is the proportion of variance explained cumulatively by the components. So, cumulatively, TC1 accounts for 0.16 of the overall variance (16%) and TC1 and TC2 together account for 0.16 + 0.12 = 0.28 of the variance (28%). Importantly, we can see that all four components in combination explain 0.49 of the overall variance (49%).

The Proportion Explained is the proportion of the explained variance, that is explained by a component. So, of the 49% of variance accounted for, 0.32 (32%) is attributable to TC1, 0.25 (25%) to TC2, 0.24 (24%) to TC3 and 0.19 (19%) to TC4.

## 
## Factor analysis with Call: principal(r = r, nfactors = nfactors, residuals = residuals, 
##     rotate = rotate, n.obs = n.obs, covar = covar, scores = scores, 
##     missing = missing, impute = impute, oblique.scores = oblique.scores, 
##     method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## 
## Test of the hypothesis that 4 factors are sufficient.
## The degrees of freedom for the model is 167  and the objective function was  0.63 
## The number of observations was  2571  with Chi Square =  1614.3  with prob <  4.9e-235 
## 
## The root mean square of the residuals (RMSA) is  0.05 
## 
##  With component correlations of 
##      TC1  TC2  TC3  TC4
## TC1 1.00 0.34 0.38 0.41
## TC2 0.34 1.00 0.21 0.25
## TC3 0.38 0.21 1.00 0.41
## TC4 0.41 0.25 0.41 1.00

The correlations between components are also displayed. These are all non-zero indicating that components are correlated (and oblique rotation was appropriate). It also tells us the degree to which components are correlated. All of the components positively, and fairly strongly, correlate with each other. In other words, the latent constructs represented by the components are related.

In terms of fit

  • The chi-square statistic is $ \chi^2 = $ 1614.3, p < 0.001. This is consistent with when we ran the analysis as factor analysis in the chapter.
  • The RMSR is 0.05.

Let’s look at the loadings (I’ve suppressed values below 0.2 and sorted).

parameters::model_parameters(raq_pca, threshold = 0.2, sort = TRUE) %>% 
  kableExtra::kable(digits = 2)
VariableTC1TC2TC3TC4ComplexityUniqueness
raq\_060.801.020.29
raq\_180.691.030.49
raq\_130.681.020.56
raq\_070.651.010.56
raq\_100.631.090.63
raq\_150.581.020.63
raq\_140.531.030.70
raq\_050.500.361.860.42
raq\_230.861.030.31
raq\_090.861.030.30
raq\_190.240.651.270.41
raq\_220.621.160.48
raq\_020.240.591.360.52
raq\_160.631.020.63
raq\_040.621.030.58
raq\_210.591.130.53
raq\_120.58-0.211.260.72
raq\_200.561.090.60
raq\_03-0.541.010.70
raq\_010.511.060.72
raq\_080.841.010.24
raq\_110.831.000.30
raq\_170.831.000.33

The clusters of items match the book chapter where we used factor analysis instead of PCA. The questions that load highly on TC1 seem to be items that relate to Fear of computers:

  • raq_05: I don’t understand statistics
  • raq_06: I have little experience of computers
  • raq_07: All computers hate me
  • raq_10: Computers are useful only for playing games
  • raq_13: I worry that I will cause irreparable damage because of my incompetence with computers
  • raq_14: Computers have minds of their own and deliberately go wrong whenever I use them
  • raq_15: Computers are out to get me
  • raq_18: R always crashes when I try to use it

Note that item 5 also loads highly onto TC3.

The questions that load highly on TC2 seem to be items that relate to Fear of peer/social evaluation:

  • raq_02: My friends will think I’m stupid for not being able to cope with {{< icon name=“r-project” pack=“fab” >}}
  • raq_09: My friends are better at statistics than me
  • raq_19: Everybody looks at me when I use {{< icon name=“r-project” pack=“fab” >}}
  • raq_22: My friends are better at {{< icon name=“r-project” pack=“fab” >}} than I am
  • raq_23: If I am good at statistics people will think I am a nerd

The questions that load highly on TC3 seem to be items that relate to Fear of statistics:

  • raq_01: Statistics make me cry
  • raq_03: Standard deviations excite me
  • raq_04: I dream that Pearson is attacking me with correlation coefficients
  • raq_05: I don’t understand statistics
  • raq_12: People try to tell you that {{< icon name=“r-project” pack=“fab” >}} makes statistics easier to understand but it doesn’t
  • raq_16: I weep openly at the mention of central tendency
  • raq_20: I can’t sleep for thoughts of eigenvectors
  • raq_21: I wake up under my duvet thinking that I am trapped under a normal distribution

The questions that load highly on TC4 seem to be items that relate to Fear of mathematics:

  • raq_08: I have never been good at mathematics
  • raq_11: I did badly at mathematics at school
  • raq_17: I slip into a coma whenever I see an equation

Basically using PCA hasn’t changed the interpretation.

Task 18.2

The University of Sussex constantly seeks to employ the best people possible as lecturers. They wanted to revise the ‘Teaching of Statistics for Scientific Experiments’ (TOSSE) questionnaire, which is based on Bland’s theory that says that good research methods lecturers should have: (1) a profound love of statistics; (2) an enthusiasm for experimental design; (3) a love of teaching; and (4) a complete absence of normal interpersonal skills. These characteristics should be related (i.e., correlated). The University revised this questionnaire to become the ‘Teaching of Statistics for Scientific Experiments – Revised (TOSSE – R; Error! Reference source not found.). They gave this questionnaire to 661 research methods lecturers to see if it supported Bland’s theory. Conduct a factor analysis using maximum likelihood (with appropriate rotation) and interpret the component structure.

Load the data

To load the data from the CSV file (assuming you have set up a project folder as suggested in the book):

tosr_tib <- here::here("data/tosser.csv") %>%
  readr::read_csv()

Alternative, load the data directly from the discovr package:

tosr_tib <- discovr::tosser

Create correlation matrix

The data file has a variable in it containing participants’ ids. Let’s store a version of the data that only has the item scores.

tosr_items_tib <- tosr_tib %>% 
  dplyr::select(-id)

We can create the correlations between variables by executing (again, items were rated on Likert response scales, so we’ll use polychoric correlations).

tosr_poly <- psych::polychoric(tosr_items_tib)
tosr_cor <- tosr_poly$rho

To get a plot of the correlations we can execute:

psych::cor.plot(tosr_cor, upper = FALSE)

The Bartlett test

psych::cortest.bartlett(tosr_cor, n = 661)
## $chisq
## [1] 6392.17
## 
## $p.value
## [1] 0
## 
## $df
## [1] 378

This (basically useless) tests confirms that the correlation matrix is significantly different from an identity matrix (i.e. correlations are non-zero).

Determinant of the correlation matrix:

det(tosr_cor)
## [1] 5.345715e-05

The determinant of the correlation matrix was 0.00005345715, which is greater than 0.00001 and, therefore, indicates that multicollinearity is unlikley to be a problem in these data.

The KMO test

psych::KMO(tosr_cor)
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = tosr_cor)
## Overall MSA =  0.91
## MSA for each item = 
## tosr_01 tosr_02 tosr_03 tosr_04 tosr_05 tosr_06 tosr_07 tosr_08 tosr_09 tosr_10 
##    0.85    0.93    0.76    0.89    0.86    0.93    0.96    0.87    0.87    0.92 
## tosr_11 tosr_12 tosr_13 tosr_14 tosr_15 tosr_16 tosr_17 tosr_18 tosr_19 tosr_20 
##    0.96    0.93    0.82    0.85    0.87    0.83    0.83    0.96    0.94    0.95 
## tosr_21 tosr_22 tosr_23 tosr_24 tosr_25 tosr_26 tosr_27 tosr_28 
##    0.81    0.83    0.91    0.88    0.96    0.94    0.93    0.93

The KMO measure of sampling adequacy is 0.91, which is above Kaiser’s (1974) recommendation of 0.5. This value is also ‘marvellous.’ Individual items KMO values ranged from 0.76 to 0.96. As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors.

Distributions for items

tosr_tidy_tib <- tosr_items_tib %>% 
  tidyr::pivot_longer(
    cols = tosr_01:tosr_28,
    names_to = "Item",
    values_to = "Response"
  ) %>% 
  dplyr::mutate(
    Item = gsub("tosr_", "TOSSER ", Item)
  )

ggplot2::ggplot(tosr_tidy_tib, aes(Response)) +
  geom_histogram(binwidth = 1, fill = "#136CB9", colour = "#136CB9", alpha = 0.5) +
  labs(y = "Frequency") +
  facet_wrap(~ Item, ncol = 6) +
  theme_minimal()

Parallel analysis

psych::fa.parallel(tosr_cor, fm = "ml", fa = "fa", n.obs = 661)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA

Based on parallel analysis five factors should be extracted.

Factor analysis

Create the factor analysis object. The question asks us to use maximum likelihood so I have included fm = "ml". We choose an oblique rotation (the default) because the question says that the constructs we’re measuring are related.

tosr_fa <- psych::fa(tosr_cor,
                    n.obs = 661,
                    fm = "ml",
                    nfactors = 4,
                    scores = "tenBerge"
                    )
summary(tosr_fa)
## 
## Factor analysis with Call: psych::fa(r = tosr_cor, nfactors = 4, n.obs = 661, scores = "tenBerge", 
##     fm = "ml")
## 
## Test of the hypothesis that 4 factors are sufficient.
## The degrees of freedom for the model is 272  and the objective function was  0.7 
## The number of observations was  661  with Chi Square =  450.06  with prob <  6.1e-11 
## 
## The root mean square of the residuals (RMSA) is  0.02 
## The df corrected root mean square of the residuals is  0.03 
## 
## Tucker Lewis Index of factoring reliability =  0.959
## RMSEA index =  0.031  and the 10 % confidence intervals are  0.026 0.037
## BIC =  -1316.24
##  With factor correlations of 
##       ML1  ML3  ML2   ML4
## ML1  1.00 0.08 0.11 -0.39
## ML3  0.08 1.00 0.32  0.23
## ML2  0.11 0.32 1.00  0.31
## ML4 -0.39 0.23 0.31  1.00

In terms of fit

  • The chi-square statistic is $ \chi^2 = $(272) 450.06, p < 0.001. This is consistent with when we ran the analysis as factor analysis in the chapter.
  • The Tucker Lewis Index of factoring reliability (TFI) is given as 0.96, which is equal to 0.96.
  • The RMSEA = 0.031 90% CI [0.026, 0.037], which is below than 0.05.
  • The RMSR is 0.02, which is smaller than both 0.09 and 0.06.

Remember that we’re looking for a combination of TLI > 0.96 and SRMR (RMSR in the output) < 0.06, and a combination of RMSEA < 0.05 and SRMR < 0.09. With the caveat that universal cut-offs need to be taken with a pinch of salt, it’s reasonable to conclude that the model has good fit.

Inspect the factor loadings:

parameters::model_parameters(tosr_fa, threshold = 0.2, sort = TRUE) %>% 
  knitr::kable(digits = 2)
VariableML1ML3ML2ML4ComplexityUniqueness
tosr\_02-0.791.070.42
tosr\_190.761.120.30
tosr\_200.721.150.44
tosr\_100.690.321.490.41
tosr\_260.691.120.43
tosr\_250.681.070.49
tosr\_06-0.661.070.55
tosr\_070.621.080.56
tosr\_27-0.591.070.68
tosr\_110.541.350.58
tosr\_180.481.290.69
tosr\_140.650.271.480.35
tosr\_170.631.180.58
tosr\_160.551.200.68
tosr\_220.501.060.78
tosr\_080.49-0.261.730.70
tosr\_130.481.140.79
tosr\_090.340.342.370.68
tosr\_210.621.070.60
tosr\_040.260.561.620.44
tosr\_010.521.130.70
tosr\_030.461.600.81
tosr\_240.401.290.77
tosr\_150.360.332.100.65
tosr\_050.561.070.67
tosr\_28-0.350.481.980.46
tosr\_23-0.280.441.720.62
tosr\_12-0.220.361.960.76

Factor 1

This factor seems to relate to teaching.

  • Q2: I wish students would stop bugging me with their shit.
  • Q19: I like to help students
  • Q20: Passing on knowledge is the greatest gift you can bestow an individual
  • Q10: I could spend all day explaining statistics to people
  • Q26: I spend lots of time helping students
  • Q25: I love teaching
  • Q6: Teaching others makes me want to swallow a large bottle of bleach because the pain of my burning oesophagus would be light relief in comparison
  • Q7: Helping others to understand sums of squares is a great feeling
  • Q27: I love teaching because students have to pretend to like me or they’ll get bad marks
  • Q11: I like it when people tell me I’ve helped them to understand factor rotation
  • Q18: Standing in front of 300 people in no way makes me lose control of my bowels

Factor 2

This factor 1 seems to relate to research methods.

  • Q14: I’d rather think about appropriate outcome variables than go for a drink with my friends
  • Q17: I enjoy sitting in the park contemplating whether to use participant observation in my next experiment
  • Q16: Thinking about whether to use repeated or independent measures thrills me
  • Q22: I quiver with excitement when thinking about designing my next experiment
  • Q8: I like control conditions
  • Q13: Designing experiments is fun
  • Q9: I calculate 3 ANOVAs in my head before getting out of bed [equally loaded on factor 3]

Factor 3

This factor seems to relate to statistics.

  • Q9: I calculate 3 ANOVAs in my head before getting out of bed [equally loaded on factor 3]
  • Q21: Thinking about Bonferroni corrections gives me a tingly feeling in my groin
  • Q4: I worship at the shrine of Pearson
  • Q1: I once woke up in the middle of a vegetable patch hugging a turnip that I’d mistakenly dug up thinking it was Roy’s largest root
  • Q3: I memorize probability values for the F-distribution
  • Q 24: I tried to build myself a time machine so that I could go back to the 1930s and follow Mahalanobis on my hands and knees licking the ground on which he’d just trodden
  • Q15: I soil my pants with excitement at the mere mention of factor analysis [equally loaded on factor 4]

Factor 4

This factor seems to relate to social functioning. Not sure where the soiling pants comes in but probably if you’re the sort of person who soils their pants at the mention of factor analysis then things are going to get social awkward for you sooner rather than later.

  • Q5: I still live with my mother and have little personal hygiene
  • Q28: My cat is my only friend
  • Q23: I often spend my spare time talking to the pigeons … and even they die of boredom
  • Q12: People fall asleep as soon as I open my mouth to speak
  • Q15: I soil my pants with excitement at the mere mention of factor analysis [equally loaded on factor 4]

Task 18.3

Dr Sian Williams (University of Brighton) devised a questionnaire to measure organizational ability. She predicted five components to do with organizational ability:(1) preference for organization; (2) goal achievement; (3) planning approach; (4) acceptance of delays; and (5) preference for routine. Williams’s questionnaire contains 28 items using a seven-point Likert scale (1 = strongly disagree, 4 = neither, 7 = strongly agree). She gave it to 239 people. Run a factor analysis (following the settings in this chapter) on the data in williams.csv.

Load the data

To load the data from the CSV file (assuming you have set up a project folder as suggested in the book):

org_tib <- here::here("data/williams.csv") %>%
  readr::read_csv()

Alternative, load the data directly from the discovr package:

org_tib <- discovr::williams

Fit the model

The questionnaire items are as follows:

  1. I like to have a plan to work to in everyday life
  2. I feel frustrated when things don’t go to plan
  3. I get most things done in a day that I want to
  4. I stick to a plan once I have made it
  5. I enjoy spontaneity and uncertainty
  6. I feel frustrated if I can’t find something I need
  7. I find it difficult to follow a plan through
  8. I am an organized person
  9. I like to know what I have to do in a day
  10. Disorganized people annoy me
  11. I leave things to the last minute
  12. I have many different plans relating to the same goal
  13. I like to have my documents filed and in order
  14. I find it easy to work in a disorganized environment
  15. I make ‘to do’ lists and achieve most of the things on it
  16. My workspace is messy and disorganized
  17. I like to be organized
  18. Interruptions to my daily routine annoy me
  19. I feel that I am wasting my time
  20. I forget the plans I have made
  21. I prioritize the things I have to do
  22. I like to work in an organized environment
  23. I feel relaxed when I don’t have a routine
  24. I set deadlines for myself and achieve them
  25. I change rather aimlessly from one activity to another during the day
  26. I have trouble organizing the things I have to do
  27. I put tasks off to another day
  28. I feel restricted by schedules and plans

Create correlation matrix

The data file has a variables in it containing participants’ demographic information. Let’s store a version of the data that only has the item scores.

org_items_tib <- org_tib %>% 
  dplyr::select(-id)

We can create the correlations between variables by executing (again, items were rated on Likert response scales, so we’ll use polychoric correlations).

org_poly <- psych::polychoric(org_items_tib)
org_cor <- org_poly$rho

To get a plot of the correlations we can execute:

psych::cor.plot(org_cor, upper = FALSE)

The Bartlett test

psych::cortest.bartlett(org_cor, n = 239)
## $chisq
## [1] 3679.19
## 
## $p.value
## [1] 0
## 
## $df
## [1] 378

This (basically useless) tests confirms that the correlation matrix is significantly different from an identity matrix (i.e. correlations are non-zero).

Determinant of the correlation matrix:

det(org_cor)
## [1] 9.699541e-08

The determinant of the correlation matrix was 0.00000009699541, which is smaller than 0.00001 and, therefore, indicates that multicollinearity could be a problem in these data.

The KMO test

psych::KMO(org_cor)
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = org_cor)
## Overall MSA =  0.88
## MSA for each item = 
## org_01 org_02 org_03 org_04 org_05 org_06 org_07 org_08 org_09 org_10 org_11 
##   0.93   0.78   0.84   0.82   0.85   0.76   0.83   0.94   0.94   0.90   0.91 
## org_12 org_13 org_14 org_15 org_16 org_17 org_18 org_19 org_20 org_21 org_22 
##   0.57   0.93   0.88   0.88   0.94   0.91   0.85   0.76   0.79   0.85   0.88 
## org_23 org_24 org_25 org_26 org_27 org_28 
##   0.82   0.89   0.87   0.90   0.86   0.82

The KMO measure of sampling adequacy is 0.88, which is above Kaiser’s (1974) recommendation of 0.5. This value is also ‘meritorious’ (and almost ‘marvellous’). Individual items KMO values ranged from 0.57 to 0.94. As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors.

Distributions for items

org_tidy_tib <- org_items_tib %>% 
  tidyr::pivot_longer(
    cols = org_01:org_28,
    names_to = "Item",
    values_to = "Response"
  ) %>% 
  dplyr::mutate(
    Item = gsub("org_", "ORG ", Item)
  )

ggplot2::ggplot(org_tidy_tib, aes(Response)) +
  geom_histogram(binwidth = 1, fill = "#136CB9", colour = "#136CB9", alpha = 0.5) +
  labs(y = "Frequency") +
  facet_wrap(~ Item, ncol = 6) +
  theme_minimal()

Parallel analysis

psych::fa.parallel(org_cor, fm = "ml", fa = "fa", n.obs = 239)

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

Based on parallel analysis five factors should be extracted.

Factor analysis

Create the factor analysis object. The question asks us to use maximum likelihood so I have included fm = "ml". We choose an oblique rotation (the default) because the question says that the constructs we’re measuring are related.

org_fa <- psych::fa(org_cor,
                    n.obs = 239,
                    fm = "minres",
                    nfactors = 5
                    )
summary(org_fa)
## 
## Factor analysis with Call: psych::fa(r = org_cor, nfactors = 5, n.obs = 239, fm = "minres")
## 
## Test of the hypothesis that 5 factors are sufficient.
## The degrees of freedom for the model is 248  and the objective function was  2.51 
## The number of observations was  239  with Chi Square =  563.4  with prob <  1.4e-26 
## 
## The root mean square of the residuals (RMSA) is  0.04 
## The df corrected root mean square of the residuals is  0.04 
## 
## Tucker Lewis Index of factoring reliability =  0.852
## RMSEA index =  0.073  and the 10 % confidence intervals are  0.065 0.081
## BIC =  -794.77
##  With factor correlations of 
##      MR1   MR2  MR4  MR3   MR5
## MR1 1.00  0.35 0.45 0.38  0.30
## MR2 0.35  1.00 0.35 0.21 -0.08
## MR4 0.45  0.35 1.00 0.22  0.29
## MR3 0.38  0.21 0.22 1.00  0.25
## MR5 0.30 -0.08 0.29 0.25  1.00

In terms of fit

  • The chi-square statistic is $ \chi^2 = $(248) 563.4, p < 0.001. This is consistent with when we ran the analysis as factor analysis in the chapter.
  • The Tucker Lewis Index of factoring reliability (TFI) is given as 0.85, which is well below 0.96.
  • The RMSEA = 0.073 90% CI [0.065, 0.081], which is greater than 0.05.
  • The RMSR is 0.04, which is smaller than both 0.09 and 0.06.

Remember that we’re looking for a combination of TLI > 0.96 and SRMR (RMSR in the output) < 0.06, and a combination of RMSEA < 0.05 and SRMR < 0.09. With the caveat that universal cut-offs need to be taken with a pinch of salt, it’s reasonable to conclude that the model has poor fit.

Inspect the factor loadings:

parameters::model_parameters(org_fa, threshold = 0.2, sort = TRUE) %>% 
  knitr::kable(digits = 2)
VariableMR1MR2MR4MR3MR5ComplexityUniqueness
org\_140.78-0.271.380.34
org\_160.781.140.35
org\_220.770.201.220.19
org\_170.721.250.25
org\_080.500.300.222.310.28
org\_130.490.241.620.51
org\_100.400.271.940.65
org\_190.631.230.60
org\_270.610.341.580.39
org\_250.601.120.52
org\_200.581.090.63
org\_260.380.47-0.252.560.42
org\_070.460.321.900.57
org\_110.260.360.210.203.300.47
org\_240.691.220.39
org\_030.250.551.550.51
org\_040.260.490.242.210.49
org\_210.430.452.050.46
org\_150.36-0.220.442.720.56
org\_010.310.430.223.110.38
org\_230.661.250.49
org\_050.651.100.51
org\_280.631.180.53
org\_120.331.700.87
org\_020.711.040.45
org\_060.701.060.53
org\_180.280.472.040.55
org\_090.320.320.343.460.34

Factor 1

This factor 1 seems to relate to preference for organization.

  • Q14: I find it easy to work in a disorganized environment
  • Q16: My workspace is messy and disorganized
  • Q22: I like to work in an organized environment
  • Q17: I like to be organized
  • Q8: I am an organized person
  • Q13: I like to have my documents filed and in order
  • Q10: Disorganized people annoy me

Factor 2

This factor seems to relate to goal achievement (it probably depends how you define goal achievement but does seem to relate to your ability to follow a plan through!).

  • Q19: I feel that I am wasting my time
  • Q27: I put tasks off to another day
  • Q25: I change rather aimlessly from one activity to another during the day
  • Q20: I forget the plans I have made
  • Q26: I have trouble organizing the things I have to do
  • Q7: I find it difficult to follow a plan through
  • Q11: I leave things to the last minute

Factor 3

This factor seems to relate to planning approach.

  • Q24: I set deadlines for myself and achieve them
  • Q3: I get most things done in a day that I want to
  • Q4: I stick to a plan once I have made it
  • Q21: I prioritize the things I have to do
  • Q15: I make ‘to do’ lists and achieve most of the things on it
  • Q1: I like to have a plan to work to in everyday life

Factor 4

This factor seems to relate to preference for routine.

  • Q23: I feel relaxed when I don’t have a routine
  • Q5: I enjoy spontaneity and uncertainty
  • Q28: I feel restricted by schedules and plans
  • Q12: I have many different plans relating to the same goal

Factor 5

This factor seems to relate to acceptance of delays.

  • Q2: I feel frustrated when things don’t go to plan
  • Q6: I feel frustrated if I can’t find something I need
  • Q18: Interruptions to my daily routine annoy me
  • Q9: I like to know what I have to do in a day

It seems as though there is some factorial validity to the hypothesized structure. (But remember that this model has poor fit.)

Task 18.4

Zibarras et al., (2008) looked at the relationship between personality and creativity. They used the Hogan Development Survey (HDS), which measures 11 dysfunctional dispositions of employed adults: being volatile, mistrustful, cautious, detached, passive_aggressive, arrogant, manipulative, dramatic, eccentric, perfectionist, and dependent. Zibarras et al. wanted to reduce these 11 traits down and, based on parallel analysis, found that they could be reduced to three components. They ran a principal component analysis with varimax rotation. Repeat this analysis (zibarras_2008.csv) to see which personality dimensions clustered together (see page 210 of the original paper).

Load the data

To load the data from the CSV file (assuming you have set up a project folder as suggested in the book):

zibarras_tib <- here::here("data/zibarras_2018.csv") %>%
  readr::read_csv()

Alternative, load the data directly from the discovr package:

zibarras_tib <- discovr::zibarras_2008

Like the authors, I ran the analysis with principal components and varimax rotation.

Create correlation matrix

The data file has a variable in it containing participants’ ids. Let’s store a version of the data that only has the item scores.

zibarras_tib <- zibarras_tib %>% 
  dplyr::select(-id)

We can create the correlations between variables by executing.

zibarras_cor <- cor(zibarras_tib)

To get a plot of the correlations we can execute:

psych::cor.plot(zibarras_cor, upper = FALSE)

The Bartlett test

psych::cortest.bartlett(zibarras_cor, n = 207)
## $chisq
## [1] 527.8976
## 
## $p.value
## [1] 1.841689e-78
## 
## $df
## [1] 55

This (basically useless) tests confirms that the correlation matrix is significantly different from an identity matrix (i.e. correlations are non-zero).

The KMO test

psych::KMO(zibarras_cor)
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = zibarras_cor)
## Overall MSA =  0.68
## MSA for each item = 
##           volatile        mistrustful           cautious           detached 
##               0.52               0.62               0.71               0.56 
## passive_aggressive           arrogant       manipulative           dramatic 
##               0.50               0.76               0.81               0.72 
##          eccentric         perfectist          dependent 
##               0.82               0.54               0.58

The KMO measure of sampling adequacy is 0.68, which is above Kaiser’s (1974) recommendation of 0.5. Individual items KMO values ranged from 0.5 to 0.82. Thee values are in the mediocre to middling range. The sample size is probably adequate to yield distinct and reliable factors.

Distributions for items

zib_tidy_tib <- zibarras_tib %>% 
  tidyr::pivot_longer(
    cols = volatile:dependent,
    names_to = "Item",
    values_to = "Response"
  ) %>% 
  dplyr::mutate(
    Item = stringr::str_to_sentence(Item)
  )

ggplot2::ggplot(zib_tidy_tib, aes(Response)) +
  geom_histogram(binwidth = 1, fill = "#136CB9", colour = "#136CB9", alpha = 0.5) +
  labs(y = "Frequency") +
  facet_wrap(~ Item, ncol = 3) +
  theme_minimal()

Parallel analysis

psych::fa.parallel(zibarras_cor, fa = "pc", n.obs = 207)

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  3

Based on parallel analysis three components should be extracted (as the authors did in the paper).

PCA

Create the PCA object. We choose an orthogonal rotation (varimax) because that’s what the authors did - this is the default for PCA so we don’t need to specify it explicitly.

zib_pca <- psych::pca(zibarras_cor,
                    n.obs = 207,
                    nfactors = 3
                    )
summary(zib_pca)
## 
## Factor analysis with Call: principal(r = r, nfactors = nfactors, residuals = residuals, 
##     rotate = rotate, n.obs = n.obs, covar = covar, scores = scores, 
##     missing = missing, impute = impute, oblique.scores = oblique.scores, 
##     method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 25  and the objective function was  0.92 
## The number of observations was  207  with Chi Square =  182.93  with prob <  5.7e-26 
## 
## The root mean square of the residuals (RMSA) is  0.1

Inspect the factor loadings:

parameters::model_parameters(zib_pca, threshold = 0.2, sort = TRUE) %>% 
  knitr::kable(digits = 2)
VariableRC1RC2RC3ComplexityUniqueness
dramatic0.831.110.27
manipulative0.791.030.37
arrogant0.681.060.53
cautious-0.660.501.870.31
eccentric0.550.301.730.59
perfectist-0.321.090.89
volatile0.791.030.37
mistrustful0.270.680.241.580.40
detached-0.280.761.390.30
dependent-0.380.33-0.642.210.33
passive\_aggressive0.621.170.59

The output shows the rotated component matrix, from which we see this pattern:

  • Component 1:
    • Dramatic
    • Manipulative
    • Arrogant
    • Cautious (negative weight)
    • Eccentric
    • Perfectionist (negative weight)
  • Component 2:
    • Volatile
    • Mistrustful
  • Component 3:
    • Detached
    • Dependent (negative weight)
    • Passive-aggressive

Compare these results to those of Zibarras et al. (Table 4 from the original paper reproduced below), and note that they are the same.

Table 4 from Zibarras et al. (2008)
Previous