# Smart Alex solutions Chapter 18

This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.

Rerun the analysis in this chapter using principal component analysis and compare the results to those in the chapter.

To load the data from the CSV file (assuming you have set up a project folder as suggested in the book):

raq_tib <- here::here("data/raq.csv") %>%


Alternative, load the data directly from the discovr package:

raq_tib <- discovr::raq


### Fit the model

All of the descriptives, correlation matrices, KMO tests and so on are unaffected by our choice of principal components as the method of dimension reduction. Also in the book chapter we did parallel analysis based on components and this suggested 4 components (as did the parallel analysis based on components). So, follow everything in the book (code and interpretation) up to the point at which we for the main model.

As a reminder, we set up the correlation matrix to be based on polychoric correlations

# create tibble that contains only the questionnaire items
raq_items_tib <- raq_tib %>%
dplyr::select(-id)
# get the polychoric correlation object
raq_poly <- psych::polychoric(raq_items_tib)
# store the polychoric correlation matrix
raq_cor <- raq_poly$rho  Things start to get different at the point of fitting the model. We can use the same code as the book chapter except that we use the pca() (or principal() if you prefer) function instead of fa() and we need to remove scores = "tenBerge" because for PCA there is only a single method for computing component scores (and this is used by default). We also need to add rotate = "oblimin" because for PCA the default is to use an orthogonal rotation (varimax). I’ve also changed the name of the object to store this in to raq_pca to reflect the fact we’ve done PCA and not component analysis. From the raw data: raq_pca <- psych::pca(raq_items_tib, nfactors = 4, cor = "poly", rotate = "oblimin" )  From the correlation matrix: raq_pca <- psych::pca(raq_cor, n.obs = 2571, nfactors = 4, rotate = "oblimin" )  To see the output: raq_pca  Note that the components are labelled TC1 to TC4 (unlike for the component analysis in the book where the labels were MR1 etc.). We are given some information about how much variance each component accounts for. ## TC1 TC2 TC3 TC4 ## SS loadings 3.67 2.77 2.60 2.32 ## Proportion Var 0.16 0.12 0.11 0.10 ## Cumulative Var 0.16 0.28 0.39 0.49 ## Proportion Explained 0.32 0.24 0.23 0.20 ## Cumulative Proportion 0.32 0.57 0.80 1.00  We see, for example, from Proportion Var that TC1 accounts for 0.16 of the overall variance (16%) and TC2 accounts for 0.12 of the variance (12%) and so on. The Cumulative Var is the proportion of variance explained cumulatively by the components. So, cumulatively, TC1 accounts for 0.16 of the overall variance (16%) and TC1 and TC2 together account for 0.16 + 0.12 = 0.28 of the variance (28%). Importantly, we can see that all four components in combination explain 0.49 of the overall variance (49%). The Proportion Explained is the proportion of the explained variance, that is explained by a component. So, of the 49% of variance accounted for, 0.32 (32%) is attributable to TC1, 0.25 (25%) to TC2, 0.24 (24%) to TC3 and 0.19 (19%) to TC4. ## ## Factor analysis with Call: principal(r = r, nfactors = nfactors, residuals = residuals, ## rotate = rotate, n.obs = n.obs, covar = covar, scores = scores, ## missing = missing, impute = impute, oblique.scores = oblique.scores, ## method = method, use = use, cor = cor, correct = 0.5, weight = NULL) ## ## Test of the hypothesis that 4 factors are sufficient. ## The degrees of freedom for the model is 167 and the objective function was 0.63 ## The number of observations was 2571 with Chi Square = 1614.3 with prob < 4.9e-235 ## ## The root mean square of the residuals (RMSA) is 0.05 ## ## With component correlations of ## TC1 TC2 TC3 TC4 ## TC1 1.00 0.34 0.38 0.41 ## TC2 0.34 1.00 0.21 0.25 ## TC3 0.38 0.21 1.00 0.41 ## TC4 0.41 0.25 0.41 1.00  The correlations between components are also displayed. These are all non-zero indicating that components are correlated (and oblique rotation was appropriate). It also tells us the degree to which components are correlated. All of the components positively, and fairly strongly, correlate with each other. In other words, the latent constructs represented by the components are related. In terms of fit • The chi-square statistic is$ \chi^2 = $1614.3, p < 0.001. This is consistent with when we ran the analysis as factor analysis in the chapter. • The RMSR is 0.05. Let’s look at the loadings (I’ve suppressed values below 0.2 and sorted). parameters::model_parameters(raq_pca, threshold = 0.2, sort = TRUE) %>% kableExtra::kable(digits = 2)  VariableTC1TC2TC3TC4ComplexityUniqueness raq\_060.801.020.29 raq\_180.691.030.49 raq\_130.681.020.56 raq\_070.651.010.56 raq\_100.631.090.63 raq\_150.581.020.63 raq\_140.531.030.70 raq\_050.500.361.860.42 raq\_230.861.030.31 raq\_090.861.030.30 raq\_190.240.651.270.41 raq\_220.621.160.48 raq\_020.240.591.360.52 raq\_160.631.020.63 raq\_040.621.030.58 raq\_210.591.130.53 raq\_120.58-0.211.260.72 raq\_200.561.090.60 raq\_03-0.541.010.70 raq\_010.511.060.72 raq\_080.841.010.24 raq\_110.831.000.30 raq\_170.831.000.33 The clusters of items match the book chapter where we used factor analysis instead of PCA. The questions that load highly on TC1 seem to be items that relate to Fear of computers: • raq_05: I don’t understand statistics • raq_06: I have little experience of computers • raq_07: All computers hate me • raq_10: Computers are useful only for playing games • raq_13: I worry that I will cause irreparable damage because of my incompetence with computers • raq_14: Computers have minds of their own and deliberately go wrong whenever I use them • raq_15: Computers are out to get me • raq_18: R always crashes when I try to use it Note that item 5 also loads highly onto TC3. The questions that load highly on TC2 seem to be items that relate to Fear of peer/social evaluation: • raq_02: My friends will think I’m stupid for not being able to cope with {{< icon name=“r-project” pack=“fab” >}} • raq_09: My friends are better at statistics than me • raq_19: Everybody looks at me when I use {{< icon name=“r-project” pack=“fab” >}} • raq_22: My friends are better at {{< icon name=“r-project” pack=“fab” >}} than I am • raq_23: If I am good at statistics people will think I am a nerd The questions that load highly on TC3 seem to be items that relate to Fear of statistics: • raq_01: Statistics make me cry • raq_03: Standard deviations excite me • raq_04: I dream that Pearson is attacking me with correlation coefficients • raq_05: I don’t understand statistics • raq_12: People try to tell you that {{< icon name=“r-project” pack=“fab” >}} makes statistics easier to understand but it doesn’t • raq_16: I weep openly at the mention of central tendency • raq_20: I can’t sleep for thoughts of eigenvectors • raq_21: I wake up under my duvet thinking that I am trapped under a normal distribution The questions that load highly on TC4 seem to be items that relate to Fear of mathematics: • raq_08: I have never been good at mathematics • raq_11: I did badly at mathematics at school • raq_17: I slip into a coma whenever I see an equation Basically using PCA hasn’t changed the interpretation. ## Task 18.2 The University of Sussex constantly seeks to employ the best people possible as lecturers. They wanted to revise the ‘Teaching of Statistics for Scientific Experiments’ (TOSSE) questionnaire, which is based on Bland’s theory that says that good research methods lecturers should have: (1) a profound love of statistics; (2) an enthusiasm for experimental design; (3) a love of teaching; and (4) a complete absence of normal interpersonal skills. These characteristics should be related (i.e., correlated). The University revised this questionnaire to become the ‘Teaching of Statistics for Scientific Experiments – Revised (TOSSE – R; Error! Reference source not found.). They gave this questionnaire to 661 research methods lecturers to see if it supported Bland’s theory. Conduct a factor analysis using maximum likelihood (with appropriate rotation) and interpret the component structure. ### Load the data To load the data from the CSV file (assuming you have set up a project folder as suggested in the book): tosr_tib <- here::here("data/tosser.csv") %>% readr::read_csv()  Alternative, load the data directly from the discovr package: tosr_tib <- discovr::tosser  ### Create correlation matrix The data file has a variable in it containing participants’ ids. Let’s store a version of the data that only has the item scores. tosr_items_tib <- tosr_tib %>% dplyr::select(-id)  We can create the correlations between variables by executing (again, items were rated on Likert response scales, so we’ll use polychoric correlations). tosr_poly <- psych::polychoric(tosr_items_tib) tosr_cor <- tosr_poly$rho


To get a plot of the correlations we can execute:

psych::cor.plot(tosr_cor, upper = FALSE)


### The Bartlett test

psych::cortest.bartlett(tosr_cor, n = 661)

## $chisq ## [1] 6392.17 ## ##$p.value
## [1] 0
##
## $df ## [1] 378  This (basically useless) tests confirms that the correlation matrix is significantly different from an identity matrix (i.e. correlations are non-zero). Determinant of the correlation matrix: det(tosr_cor)  ## [1] 5.345715e-05  The determinant of the correlation matrix was 0.00005345715, which is greater than 0.00001 and, therefore, indicates that multicollinearity is unlikley to be a problem in these data. ### The KMO test psych::KMO(tosr_cor)  ## Kaiser-Meyer-Olkin factor adequacy ## Call: psych::KMO(r = tosr_cor) ## Overall MSA = 0.91 ## MSA for each item = ## tosr_01 tosr_02 tosr_03 tosr_04 tosr_05 tosr_06 tosr_07 tosr_08 tosr_09 tosr_10 ## 0.85 0.93 0.76 0.89 0.86 0.93 0.96 0.87 0.87 0.92 ## tosr_11 tosr_12 tosr_13 tosr_14 tosr_15 tosr_16 tosr_17 tosr_18 tosr_19 tosr_20 ## 0.96 0.93 0.82 0.85 0.87 0.83 0.83 0.96 0.94 0.95 ## tosr_21 tosr_22 tosr_23 tosr_24 tosr_25 tosr_26 tosr_27 tosr_28 ## 0.81 0.83 0.91 0.88 0.96 0.94 0.93 0.93  The KMO measure of sampling adequacy is 0.91, which is above Kaiser’s (1974) recommendation of 0.5. This value is also ‘marvellous.’ Individual items KMO values ranged from 0.76 to 0.96. As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors. ### Distributions for items tosr_tidy_tib <- tosr_items_tib %>% tidyr::pivot_longer( cols = tosr_01:tosr_28, names_to = "Item", values_to = "Response" ) %>% dplyr::mutate( Item = gsub("tosr_", "TOSSER ", Item) ) ggplot2::ggplot(tosr_tidy_tib, aes(Response)) + geom_histogram(binwidth = 1, fill = "#136CB9", colour = "#136CB9", alpha = 0.5) + labs(y = "Frequency") + facet_wrap(~ Item, ncol = 6) + theme_minimal()  ### Parallel analysis psych::fa.parallel(tosr_cor, fm = "ml", fa = "fa", n.obs = 661)  ## Parallel analysis suggests that the number of factors = 4 and the number of components = NA  Based on parallel analysis five factors should be extracted. ### Factor analysis Create the factor analysis object. The question asks us to use maximum likelihood so I have included fm = "ml". We choose an oblique rotation (the default) because the question says that the constructs we’re measuring are related. tosr_fa <- psych::fa(tosr_cor, n.obs = 661, fm = "ml", nfactors = 4, scores = "tenBerge" )  summary(tosr_fa)  ## ## Factor analysis with Call: psych::fa(r = tosr_cor, nfactors = 4, n.obs = 661, scores = "tenBerge", ## fm = "ml") ## ## Test of the hypothesis that 4 factors are sufficient. ## The degrees of freedom for the model is 272 and the objective function was 0.7 ## The number of observations was 661 with Chi Square = 450.06 with prob < 6.1e-11 ## ## The root mean square of the residuals (RMSA) is 0.02 ## The df corrected root mean square of the residuals is 0.03 ## ## Tucker Lewis Index of factoring reliability = 0.959 ## RMSEA index = 0.031 and the 10 % confidence intervals are 0.026 0.037 ## BIC = -1316.24 ## With factor correlations of ## ML1 ML3 ML2 ML4 ## ML1 1.00 0.08 0.11 -0.39 ## ML3 0.08 1.00 0.32 0.23 ## ML2 0.11 0.32 1.00 0.31 ## ML4 -0.39 0.23 0.31 1.00  In terms of fit • The chi-square statistic is$ \chi^2 = $(272) 450.06, p < 0.001. This is consistent with when we ran the analysis as factor analysis in the chapter. • The Tucker Lewis Index of factoring reliability (TFI) is given as 0.96, which is equal to 0.96. • The RMSEA = 0.031 90% CI [0.026, 0.037], which is below than 0.05. • The RMSR is 0.02, which is smaller than both 0.09 and 0.06. Remember that we’re looking for a combination of TLI > 0.96 and SRMR (RMSR in the output) < 0.06, and a combination of RMSEA < 0.05 and SRMR < 0.09. With the caveat that universal cut-offs need to be taken with a pinch of salt, it’s reasonable to conclude that the model has good fit. Inspect the factor loadings: parameters::model_parameters(tosr_fa, threshold = 0.2, sort = TRUE) %>% knitr::kable(digits = 2)  VariableML1ML3ML2ML4ComplexityUniqueness tosr\_02-0.791.070.42 tosr\_190.761.120.30 tosr\_200.721.150.44 tosr\_100.690.321.490.41 tosr\_260.691.120.43 tosr\_250.681.070.49 tosr\_06-0.661.070.55 tosr\_070.621.080.56 tosr\_27-0.591.070.68 tosr\_110.541.350.58 tosr\_180.481.290.69 tosr\_140.650.271.480.35 tosr\_170.631.180.58 tosr\_160.551.200.68 tosr\_220.501.060.78 tosr\_080.49-0.261.730.70 tosr\_130.481.140.79 tosr\_090.340.342.370.68 tosr\_210.621.070.60 tosr\_040.260.561.620.44 tosr\_010.521.130.70 tosr\_030.461.600.81 tosr\_240.401.290.77 tosr\_150.360.332.100.65 tosr\_050.561.070.67 tosr\_28-0.350.481.980.46 tosr\_23-0.280.441.720.62 tosr\_12-0.220.361.960.76 #### Factor 1 This factor seems to relate to teaching. • Q2: I wish students would stop bugging me with their shit. • Q19: I like to help students • Q20: Passing on knowledge is the greatest gift you can bestow an individual • Q10: I could spend all day explaining statistics to people • Q26: I spend lots of time helping students • Q25: I love teaching • Q6: Teaching others makes me want to swallow a large bottle of bleach because the pain of my burning oesophagus would be light relief in comparison • Q7: Helping others to understand sums of squares is a great feeling • Q27: I love teaching because students have to pretend to like me or they’ll get bad marks • Q11: I like it when people tell me I’ve helped them to understand factor rotation • Q18: Standing in front of 300 people in no way makes me lose control of my bowels #### Factor 2 This factor 1 seems to relate to research methods. • Q14: I’d rather think about appropriate outcome variables than go for a drink with my friends • Q17: I enjoy sitting in the park contemplating whether to use participant observation in my next experiment • Q16: Thinking about whether to use repeated or independent measures thrills me • Q22: I quiver with excitement when thinking about designing my next experiment • Q8: I like control conditions • Q13: Designing experiments is fun • Q9: I calculate 3 ANOVAs in my head before getting out of bed [equally loaded on factor 3] #### Factor 3 This factor seems to relate to statistics. • Q9: I calculate 3 ANOVAs in my head before getting out of bed [equally loaded on factor 3] • Q21: Thinking about Bonferroni corrections gives me a tingly feeling in my groin • Q4: I worship at the shrine of Pearson • Q1: I once woke up in the middle of a vegetable patch hugging a turnip that I’d mistakenly dug up thinking it was Roy’s largest root • Q3: I memorize probability values for the F-distribution • Q 24: I tried to build myself a time machine so that I could go back to the 1930s and follow Mahalanobis on my hands and knees licking the ground on which he’d just trodden • Q15: I soil my pants with excitement at the mere mention of factor analysis [equally loaded on factor 4] #### Factor 4 This factor seems to relate to social functioning. Not sure where the soiling pants comes in but probably if you’re the sort of person who soils their pants at the mention of factor analysis then things are going to get social awkward for you sooner rather than later. • Q5: I still live with my mother and have little personal hygiene • Q28: My cat is my only friend • Q23: I often spend my spare time talking to the pigeons … and even they die of boredom • Q12: People fall asleep as soon as I open my mouth to speak • Q15: I soil my pants with excitement at the mere mention of factor analysis [equally loaded on factor 4] ## Task 18.3 Dr Sian Williams (University of Brighton) devised a questionnaire to measure organizational ability. She predicted five components to do with organizational ability:(1) preference for organization; (2) goal achievement; (3) planning approach; (4) acceptance of delays; and (5) preference for routine. Williams’s questionnaire contains 28 items using a seven-point Likert scale (1 = strongly disagree, 4 = neither, 7 = strongly agree). She gave it to 239 people. Run a factor analysis (following the settings in this chapter) on the data in williams.csv. ### Load the data To load the data from the CSV file (assuming you have set up a project folder as suggested in the book): org_tib <- here::here("data/williams.csv") %>% readr::read_csv()  Alternative, load the data directly from the discovr package: org_tib <- discovr::williams  ### Fit the model The questionnaire items are as follows: 1. I like to have a plan to work to in everyday life 2. I feel frustrated when things don’t go to plan 3. I get most things done in a day that I want to 4. I stick to a plan once I have made it 5. I enjoy spontaneity and uncertainty 6. I feel frustrated if I can’t find something I need 7. I find it difficult to follow a plan through 8. I am an organized person 9. I like to know what I have to do in a day 10. Disorganized people annoy me 11. I leave things to the last minute 12. I have many different plans relating to the same goal 13. I like to have my documents filed and in order 14. I find it easy to work in a disorganized environment 15. I make ‘to do’ lists and achieve most of the things on it 16. My workspace is messy and disorganized 17. I like to be organized 18. Interruptions to my daily routine annoy me 19. I feel that I am wasting my time 20. I forget the plans I have made 21. I prioritize the things I have to do 22. I like to work in an organized environment 23. I feel relaxed when I don’t have a routine 24. I set deadlines for myself and achieve them 25. I change rather aimlessly from one activity to another during the day 26. I have trouble organizing the things I have to do 27. I put tasks off to another day 28. I feel restricted by schedules and plans ### Create correlation matrix The data file has a variables in it containing participants’ demographic information. Let’s store a version of the data that only has the item scores. org_items_tib <- org_tib %>% dplyr::select(-id)  We can create the correlations between variables by executing (again, items were rated on Likert response scales, so we’ll use polychoric correlations). org_poly <- psych::polychoric(org_items_tib) org_cor <- org_poly$rho


To get a plot of the correlations we can execute:

psych::cor.plot(org_cor, upper = FALSE)


### The Bartlett test

psych::cortest.bartlett(org_cor, n = 239)

## $chisq ## [1] 3679.19 ## ##$p.value
## [1] 0
##
## $df ## [1] 378  This (basically useless) tests confirms that the correlation matrix is significantly different from an identity matrix (i.e. correlations are non-zero). Determinant of the correlation matrix: det(org_cor)  ## [1] 9.699541e-08  The determinant of the correlation matrix was 0.00000009699541, which is smaller than 0.00001 and, therefore, indicates that multicollinearity could be a problem in these data. ### The KMO test psych::KMO(org_cor)  ## Kaiser-Meyer-Olkin factor adequacy ## Call: psych::KMO(r = org_cor) ## Overall MSA = 0.88 ## MSA for each item = ## org_01 org_02 org_03 org_04 org_05 org_06 org_07 org_08 org_09 org_10 org_11 ## 0.93 0.78 0.84 0.82 0.85 0.76 0.83 0.94 0.94 0.90 0.91 ## org_12 org_13 org_14 org_15 org_16 org_17 org_18 org_19 org_20 org_21 org_22 ## 0.57 0.93 0.88 0.88 0.94 0.91 0.85 0.76 0.79 0.85 0.88 ## org_23 org_24 org_25 org_26 org_27 org_28 ## 0.82 0.89 0.87 0.90 0.86 0.82  The KMO measure of sampling adequacy is 0.88, which is above Kaiser’s (1974) recommendation of 0.5. This value is also ‘meritorious’ (and almost ‘marvellous’). Individual items KMO values ranged from 0.57 to 0.94. As such, the evidence suggests that the sample size is adequate to yield distinct and reliable factors. ### Distributions for items org_tidy_tib <- org_items_tib %>% tidyr::pivot_longer( cols = org_01:org_28, names_to = "Item", values_to = "Response" ) %>% dplyr::mutate( Item = gsub("org_", "ORG ", Item) ) ggplot2::ggplot(org_tidy_tib, aes(Response)) + geom_histogram(binwidth = 1, fill = "#136CB9", colour = "#136CB9", alpha = 0.5) + labs(y = "Frequency") + facet_wrap(~ Item, ncol = 6) + theme_minimal()  ### Parallel analysis psych::fa.parallel(org_cor, fm = "ml", fa = "fa", n.obs = 239)  ## Parallel analysis suggests that the number of factors = 5 and the number of components = NA  Based on parallel analysis five factors should be extracted. ### Factor analysis Create the factor analysis object. The question asks us to use maximum likelihood so I have included fm = "ml". We choose an oblique rotation (the default) because the question says that the constructs we’re measuring are related. org_fa <- psych::fa(org_cor, n.obs = 239, fm = "minres", nfactors = 5 )  summary(org_fa)  ## ## Factor analysis with Call: psych::fa(r = org_cor, nfactors = 5, n.obs = 239, fm = "minres") ## ## Test of the hypothesis that 5 factors are sufficient. ## The degrees of freedom for the model is 248 and the objective function was 2.51 ## The number of observations was 239 with Chi Square = 563.4 with prob < 1.4e-26 ## ## The root mean square of the residuals (RMSA) is 0.04 ## The df corrected root mean square of the residuals is 0.04 ## ## Tucker Lewis Index of factoring reliability = 0.852 ## RMSEA index = 0.073 and the 10 % confidence intervals are 0.065 0.081 ## BIC = -794.77 ## With factor correlations of ## MR1 MR2 MR4 MR3 MR5 ## MR1 1.00 0.35 0.45 0.38 0.30 ## MR2 0.35 1.00 0.35 0.21 -0.08 ## MR4 0.45 0.35 1.00 0.22 0.29 ## MR3 0.38 0.21 0.22 1.00 0.25 ## MR5 0.30 -0.08 0.29 0.25 1.00  In terms of fit • The chi-square statistic is$ \chi^2 = $(248) 563.4, p < 0.001. This is consistent with when we ran the analysis as factor analysis in the chapter. • The Tucker Lewis Index of factoring reliability (TFI) is given as 0.85, which is well below 0.96. • The RMSEA = 0.073 90% CI [0.065, 0.081], which is greater than 0.05. • The RMSR is 0.04, which is smaller than both 0.09 and 0.06. Remember that we’re looking for a combination of TLI > 0.96 and SRMR (RMSR in the output) < 0.06, and a combination of RMSEA < 0.05 and SRMR < 0.09. With the caveat that universal cut-offs need to be taken with a pinch of salt, it’s reasonable to conclude that the model has poor fit. Inspect the factor loadings: parameters::model_parameters(org_fa, threshold = 0.2, sort = TRUE) %>% knitr::kable(digits = 2)  VariableMR1MR2MR4MR3MR5ComplexityUniqueness org\_140.78-0.271.380.34 org\_160.781.140.35 org\_220.770.201.220.19 org\_170.721.250.25 org\_080.500.300.222.310.28 org\_130.490.241.620.51 org\_100.400.271.940.65 org\_190.631.230.60 org\_270.610.341.580.39 org\_250.601.120.52 org\_200.581.090.63 org\_260.380.47-0.252.560.42 org\_070.460.321.900.57 org\_110.260.360.210.203.300.47 org\_240.691.220.39 org\_030.250.551.550.51 org\_040.260.490.242.210.49 org\_210.430.452.050.46 org\_150.36-0.220.442.720.56 org\_010.310.430.223.110.38 org\_230.661.250.49 org\_050.651.100.51 org\_280.631.180.53 org\_120.331.700.87 org\_020.711.040.45 org\_060.701.060.53 org\_180.280.472.040.55 org\_090.320.320.343.460.34 #### Factor 1 This factor 1 seems to relate to preference for organization. • Q14: I find it easy to work in a disorganized environment • Q16: My workspace is messy and disorganized • Q22: I like to work in an organized environment • Q17: I like to be organized • Q8: I am an organized person • Q13: I like to have my documents filed and in order • Q10: Disorganized people annoy me #### Factor 2 This factor seems to relate to goal achievement (it probably depends how you define goal achievement but does seem to relate to your ability to follow a plan through!). • Q19: I feel that I am wasting my time • Q27: I put tasks off to another day • Q25: I change rather aimlessly from one activity to another during the day • Q20: I forget the plans I have made • Q26: I have trouble organizing the things I have to do • Q7: I find it difficult to follow a plan through • Q11: I leave things to the last minute #### Factor 3 This factor seems to relate to planning approach. • Q24: I set deadlines for myself and achieve them • Q3: I get most things done in a day that I want to • Q4: I stick to a plan once I have made it • Q21: I prioritize the things I have to do • Q15: I make ‘to do’ lists and achieve most of the things on it • Q1: I like to have a plan to work to in everyday life #### Factor 4 This factor seems to relate to preference for routine. • Q23: I feel relaxed when I don’t have a routine • Q5: I enjoy spontaneity and uncertainty • Q28: I feel restricted by schedules and plans • Q12: I have many different plans relating to the same goal #### Factor 5 This factor seems to relate to acceptance of delays. • Q2: I feel frustrated when things don’t go to plan • Q6: I feel frustrated if I can’t find something I need • Q18: Interruptions to my daily routine annoy me • Q9: I like to know what I have to do in a day It seems as though there is some factorial validity to the hypothesized structure. (But remember that this model has poor fit.) ## Task 18.4 Zibarras et al., (2008) looked at the relationship between personality and creativity. They used the Hogan Development Survey (HDS), which measures 11 dysfunctional dispositions of employed adults: being volatile, mistrustful, cautious, detached, passive_aggressive, arrogant, manipulative, dramatic, eccentric, perfectionist, and dependent. Zibarras et al. wanted to reduce these 11 traits down and, based on parallel analysis, found that they could be reduced to three components. They ran a principal component analysis with varimax rotation. Repeat this analysis (zibarras_2008.csv) to see which personality dimensions clustered together (see page 210 of the original paper). ### Load the data To load the data from the CSV file (assuming you have set up a project folder as suggested in the book): zibarras_tib <- here::here("data/zibarras_2018.csv") %>% readr::read_csv()  Alternative, load the data directly from the discovr package: zibarras_tib <- discovr::zibarras_2008  Like the authors, I ran the analysis with principal components and varimax rotation. ### Create correlation matrix The data file has a variable in it containing participants’ ids. Let’s store a version of the data that only has the item scores. zibarras_tib <- zibarras_tib %>% dplyr::select(-id)  We can create the correlations between variables by executing. zibarras_cor <- cor(zibarras_tib)  To get a plot of the correlations we can execute: psych::cor.plot(zibarras_cor, upper = FALSE)  ### The Bartlett test psych::cortest.bartlett(zibarras_cor, n = 207)  ##$chisq
## [1] 527.8976
##
## $p.value ## [1] 1.841689e-78 ## ##$df
## [1] 55


This (basically useless) tests confirms that the correlation matrix is significantly different from an identity matrix (i.e. correlations are non-zero).

### The KMO test

psych::KMO(zibarras_cor)

## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = zibarras_cor)
## Overall MSA =  0.68
## MSA for each item =
##           volatile        mistrustful           cautious           detached
##               0.52               0.62               0.71               0.56
## passive_aggressive           arrogant       manipulative           dramatic
##               0.50               0.76               0.81               0.72
##          eccentric         perfectist          dependent
##               0.82               0.54               0.58


The KMO measure of sampling adequacy is 0.68, which is above Kaiser’s (1974) recommendation of 0.5. Individual items KMO values ranged from 0.5 to 0.82. Thee values are in the mediocre to middling range. The sample size is probably adequate to yield distinct and reliable factors.

### Distributions for items

zib_tidy_tib <- zibarras_tib %>%
tidyr::pivot_longer(
cols = volatile:dependent,
names_to = "Item",
values_to = "Response"
) %>%
dplyr::mutate(
Item = stringr::str_to_sentence(Item)
)

ggplot2::ggplot(zib_tidy_tib, aes(Response)) +
geom_histogram(binwidth = 1, fill = "#136CB9", colour = "#136CB9", alpha = 0.5) +
labs(y = "Frequency") +
facet_wrap(~ Item, ncol = 3) +
theme_minimal()


### Parallel analysis

psych::fa.parallel(zibarras_cor, fa = "pc", n.obs = 207)


## Parallel analysis suggests that the number of factors =  NA  and the number of components =  3


Based on parallel analysis three components should be extracted (as the authors did in the paper).

### PCA

Create the PCA object. We choose an orthogonal rotation (varimax) because that’s what the authors did - this is the default for PCA so we don’t need to specify it explicitly.

zib_pca <- psych::pca(zibarras_cor,
n.obs = 207,
nfactors = 3
)

summary(zib_pca)

##
## Factor analysis with Call: principal(r = r, nfactors = nfactors, residuals = residuals,
##     rotate = rotate, n.obs = n.obs, covar = covar, scores = scores,
##     missing = missing, impute = impute, oblique.scores = oblique.scores,
##     method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
##
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 25  and the objective function was  0.92
## The number of observations was  207  with Chi Square =  182.93  with prob <  5.7e-26
##
## The root mean square of the residuals (RMSA) is  0.1


parameters::model_parameters(zib_pca, threshold = 0.2, sort = TRUE) %>%
knitr::kable(digits = 2)

VariableRC1RC2RC3ComplexityUniqueness
dramatic0.831.110.27
manipulative0.791.030.37
arrogant0.681.060.53
cautious-0.660.501.870.31
eccentric0.550.301.730.59
perfectist-0.321.090.89
volatile0.791.030.37
mistrustful0.270.680.241.580.40
detached-0.280.761.390.30
dependent-0.380.33-0.642.210.33
passive\_aggressive0.621.170.59

The output shows the rotated component matrix, from which we see this pattern:

• Component 1:
• Dramatic
• Manipulative
• Arrogant
• Cautious (negative weight)
• Eccentric
• Perfectionist (negative weight)
• Component 2:
• Volatile
• Mistrustful
• Component 3:
• Detached
• Dependent (negative weight)
• Passive-aggressive

Compare these results to those of Zibarras et al. (Table 4 from the original paper reproduced below), and note that they are the same.

Previous