Smart Alex solutions Chapter 8

Smart Alex charatcer from Discovering Statistics using R and RStudio

This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.

Task 8.1

We looked at data based on findings that the number of cups of tea drunk was related to cognitive functioning (Feng, Gwee, Kua, & Ng, 2010). Using a linear model that predicts cognitive functioning from tea drinking, what would cognitive functioning be if someone drank 10 cups of tea? Is there a significant effect? (Tea Makes You Brainy 716.csv)

Load the data and fit the model

tea716_tib <- discovr::tea_716

tea_lm <- lm(cog_fun ~ tea, data = tea716_tib, na.action = na.exclude)
tea_lm %>% 
  broom::tidy() %>% 
  knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	49.218	0.764	64.382	0.000
tea	0.460	0.221	2.081	0.038

Looking at the output below, we can see that we have a model that significantly improves our ability to predict cognitive functioning. The positive beta value (0.46) indicates a positive relationship between number of cups of tea drunk per day and level of cognitive functioning, in that the more tea drunk, the higher your level of cognitive functioning. We can then use the model to predict level of cognitive functioning after drinking 10 cups of tea per day. The first stage is to define the model by replacing the b-values in the equation below with the values from the Coefficients output. In addition, we can replace the X and Y with the variable names so that the model becomes:

$$ \begin{aligned} \text{Cognitive functioning}_i &= b_0 + b_1 \text{Tea drinking}_i \\
&= 49.22 +(0.460 \times \text{Tea drinking}_i) \end{aligned} $$

We can predict cognitive functioning, by replacing Tea drinking in the equation with the value 10:

$$ \begin{aligned} \text{Cognitive functioning}_i &= 49.22 +(0.460 \times \text{Tea drinking}_i) \\
&= 49.22 +(0.460 \times 10) \\
&= 53.82 \end{aligned} $$

Therefore, if you drank 10 cups of tea per day, your level of cognitive functioning would be 53.82.

Task 8.2

Estimate a linear model for the pubs.csv data predicting mortality from the number of pubs. Try repeating the analysis but bootstrapping the confidence intervals.

Load the data and fit the model

pubs_tib <- discovr::pubs

pub_lm <- lm(mortality ~ pubs, data = pubs_tib, na.action = na.exclude)

pub_lm %>%
  broom::glance() %>% 
  knitr::kable(digits = 3)

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.649	0.591	1864.431	11.117	0.016	1	-70.446	146.893	147.131	20856611	6	8

pub_lm %>%
  broom::tidy() %>% 
  knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	3351.955	781.236	4.291	0.005
pubs	14.339	4.301	3.334	0.016

Looking at the output, we can see that the number of pubs significantly predicts mortality, t(6) = 3.33, p = 0.016. The positive beta value (14.34) indicates a positive relationship between number of pubs and death rate in that, the more pubs in an area, the higher the rate of mortality (as we would expect). The value of $ R^2 $ tells us that number of pubs accounts for 64.9% of the variance in mortality rate – that’s over half!

pub_lm %>% 
  parameters::model_parameters(bootstrap = TRUE) %>% 
  knitr::kable(digits = 3)

Parameter	Coefficient	CI_low	CI_high	p
(Intercept)	2777.610	0.000	4710.824	0.523
pubs	15.175	11.161	100.000	0.000

The bootstrapped confidence intervals are both positive values – they do not cross zero (10.92, 100.00) – then assuming this interval is one of the 95% that contain the population value we can gain confidence that there is a positive and non-zero relationship between number of pubs in an area and its mortality rate.

Task 8.3

We encountered data (honesty_lab.csv) relating to people’s ratings of dishonest acts and the likeableness of the perpetrator. Run a linear model with bootstrapping to predict ratings of dishonesty from the likeableness of the perpetrator.

Load the data and fit the model

honesty_tib <- discovr::honesty_lab

honest_lm <- lm(deed ~ likeableness, data = honesty_tib, na.action = na.exclude)

honest_lm %>%
  broom::glance() %>% 
  knitr::kable(digits = 3)

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.691	0.688	1.203	219.097	0	1	-159.406	324.812	332.627	141.941	98	100

honest_lm %>%
  broom::tidy() %>% 
  knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	-1.861	0.328	-5.673	0
likeableness	0.939	0.063	14.802	0

Looking at the output we can see that the likeableness of the perpetrator significantly predicts ratings of dishonest acts, t(98) = 14.80, p < 0.001. The positive beta value (0.94) indicates a positive relationship between likeableness of the perpetrator and ratings of dishonesty, in that, the more likeable the perpetrator, the more positively their dishonest acts were viewed (remember that dishonest acts were measured on a scale from 0 = appalling behaviour to 10 = it’s OK really). The value of $R^2$ tells us that likeableness of the perpetrator accounts for 69.1% of the variance in the rating of dishonesty, which is over half.

honest_lm %>% 
  parameters::model_parameters(bootstrap = TRUE) %>% 
  knitr::kable(digits = 3)

Parameter	Coefficient	CI_low	CI_high	p
(Intercept)	-1.854	-2.521	-1.353	0
likeableness	0.937	0.813	1.069	0

The bootstrapped confidence intervals do not cross zero (0.82, 1.08), then assuming this interval is one of the 95% that contain the population value we can gain confidence that there is a non-zero relationship between the likeableness of the perpetrator and ratings of dishonest acts.

Task 8.4

A fashion student was interested in factors that predicted the salaries of male and female catwalk models. She collected data from 231 models (supermodel.csv). For each model she asked them their salary per day (salary), their age (age), their length of experience as models (years), and their industry status as a model as their percentile position rated by a panel of experts (status). Use a linear model to see which variables predict a model’s salary. How valid is the model?

The model

Load the data and fit the model

model_tib <- discovr::supermodel

model_lm <- lm(salary ~ age + years + status, data = model_tib, na.action = na.exclude)

model_lm %>%
  broom::glance() %>% 
  knitr::kable(digits = 3)

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.184	0.173	14.572	17.066	0	3	-944.632	1899.264	1916.476	48202.79	227	231

model_lm %>%
  broom::tidy() %>% 
  knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	-60.890	16.497	-3.691	0.000
age	6.234	1.411	4.418	0.000
years	-5.561	2.122	-2.621	0.009
status	-0.196	0.152	-1.289	0.199

To begin with, a sample size of 231 with three predictors seems reasonable because this would easily detect medium to large effects (see the diagram in the chapter). Overall, the model accounts for 18.4% of the variance in salaries and is a significant fit to the data, F(3, 227) = 17.07, p < .001. The adjusted $ R^2 $ (0.17) shows some shrinkage from the unadjusted value (0.184), indicating that the model may not generalize well.

It seems as though salaries are significantly predicted by the age of the model. This is a positive relationship (look at the sign of the beta), indicating that as age increases, salaries increase too. The number of years spent as a model also seems to significantly predict salaries, but this is a negative relationship indicating that the more years you’ve spent as a model, the lower your salary. This finding seems very counter-intuitive, but we’ll come back to it later. Finally, the status of the model doesn’t seem to predict salaries significantly. If we wanted to write the regression model, we could write it as:

The next part of the question asks whether this model is valid.

Residuals

model_rsd <- model_lm %>% 
  broom::augment() %>% 
  tibble::rowid_to_column(var = "case_no")

The table shows about 5% of standardized residuals have absolute values above 1.96, which is what we would expect. About 3% of standardized residuals have absolute values above 2.58, which is more than we would expect. About 2% of standardized residuals have absolute values above 3.29, which is potentially problematic.

get_cum_percent <- function(var,  cut_off = 1.96){
  ecdf_var <- abs(var) %>% ecdf()
  100*(1 - ecdf_var(cut_off))
}


model_rsd %>%
  dplyr::summarize(
    `z >= 1.96` = get_cum_percent(.std.resid),
    `z >= 2.58` = get_cum_percent(.std.resid, cut_off = 2.58),
    `z >= 3.29` = get_cum_percent(.std.resid, cut_off = 3.29)
  ) %>% 
  knitr::kable(digits = 3)

z >= 1.96	z >= 2.58	z >= 3.29
5.195	3.03	2.165

model_rsd %>% 
  dplyr::filter(abs(.std.resid) >= 1.96) %>%
  dplyr::select(case_no, .std.resid, .resid) %>%
  dplyr::arrange(desc(.std.resid)) %>% 
  knitr::kable(digits = 3)

case_no	.std.resid	.resid
135	4.717	-68.085
5	4.697	-67.073
198	3.531	-51.148
116	3.440	-49.865
155	3.319	-47.458
191	3.178	-45.939
127	2.778	-40.113
41	2.421	-35.139
24	2.242	-32.523
2	2.215	-31.853
170	2.200	-31.625
91	2.099	-30.046

There are six cases that have a standardized residual greater than 3, and two of these are fairly substantial (case 5 and 135). We have 5.19% of cases with standardized residuals above 2, so that’s as we expect, but 3% of cases with residuals above 2.5 (we’d expect only 1%), which indicates possible outliers.

Normality of errors

plot(model_lm, which = 2)

The normal Q-Q plot shows a highly non-normal distribution of residuals because the dashed line deviates considerably from the straight line (which indicates what you’d get from normally distributed errors).

Homoscedasticity and independence of errors

plot(model_lm, which = c(1, 3))

The scatterplots of fitted values vs residuals do not show a random pattern. There is a distinct funnelling, and the red trend line is not flat, indicating heteroscedasticity.

Multicollinearity

car::vif(model_lm)

##       age     years    status 
## 12.652841 12.156757  1.153364

1/car::vif(model_lm)

##        age      years     status 
## 0.07903364 0.08225878 0.86702902

car::vif(model_lm) %>% 
  mean()

## [1] 8.65432

For the age and experience variables in the model, VIF values are above 10 (or alternatively, tolerance values are all well below 0.2), indicating multicollinearity in the data. In fact, the correlation between these two variables is around .9! So, these two variables are measuring very similar things. Of course, this makes perfect sense because the older a model is, the more years she would’ve spent modelling! So, it was fairly stupid to measure both of these things! This also explains the weird result that the number of years spent modelling negatively predicted salary (i.e. more experience = less salary!): in fact if you do a simple regression with experience as the only predictor of salary you’ll find it has the expected positive relationship. This hopefully demonstrates why multicollinearity can bias the regression model. All in all, several assumptions have not been met and so this model is probably fairly unreliable.

Task 8.5

A study was carried out to explore the relationship between aggression and several potential predicting factors in 666 children who had an older sibling. Variables measured were parenting_style (high score = bad parenting practices), computer_games (high score = more time spent playing computer games), television (high score = more time spent watching television), diet (high score = the child has a good diet low in harmful additives), and sibling_aggression (high score = more aggression seen in their older sibling). Past research indicated that parenting style and sibling aggression were good predictors of the level of aggression in the younger child. All other variables were treated in an exploratory fashion. Analyse them with a linear model (child_aggression.csv).

Load the data and fit the model. We need to conduct this analysis hierarchically, entering parenting_style and sibling_aggression in the first step (forced entry) and the remaining variables in a second step using a stepwise method (applying the step() function:

aggress_tib <- discovr::child_aggression

agress_lm <- lm(aggression ~ parenting_style + sibling_aggression, data = aggress_tib, na.action = na.exclude)
agress_full_lm <- update(agress_lm, .~. + television + computer_games + diet) %>%
  step()

## Start:  AIC=-1566.61
## aggression ~ parenting_style + sibling_aggression + television + 
##     computer_games + diet
## 
##                      Df Sum of Sq    RSS     AIC
## - television          1   0.04817 62.288 -1568.1
## <none>                            62.240 -1566.6
## - sibling_aggression  1   0.41840 62.658 -1564.2
## - diet                1   0.77357 63.014 -1560.4
## - computer_games      1   1.39822 63.638 -1553.8
## - parenting_style     1   1.42809 63.668 -1553.5
## 
## Step:  AIC=-1568.1
## aggression ~ parenting_style + sibling_aggression + computer_games + 
##     diet
## 
##                      Df Sum of Sq    RSS     AIC
## <none>                            62.288 -1568.1
## - sibling_aggression  1   0.48054 62.769 -1565.0
## - diet                1   0.81815 63.106 -1561.4
## - computer_games      1   1.42689 63.715 -1555.0
## - parenting_style     1   2.28530 64.573 -1546.1

agress_full_lm %>%
  broom::glance() %>% 
  knitr::kable(digits = 3)

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.082	0.076	0.307	14.735	0	4	-155.963	323.927	350.935	62.288	661	666

agress_full_lm %>%
  broom::tidy() %>% 
  knitr::kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	-0.006	0.012	-0.497	0.619
parenting_style	0.062	0.013	4.925	0.000
sibling_aggression	0.086	0.038	2.258	0.024
computer_games	0.143	0.037	3.891	0.000
diet	-0.112	0.038	-2.947	0.003

agress_full_lm %>%
  parameters::model_parameters(standardize = "refit", digits = 3) %>% 
  knitr::kable()

Parameter	Coefficient	SE	CI_low	CI_high	t	df_error	p
(Intercept)	0.0000000	0.0372413	-0.0731256	0.0731256	0.000000	661	1.0000000
parenting_style	0.1937732	0.0393481	0.1165109	0.2710356	4.924586	661	0.0000011
sibling_aggression	0.0883114	0.0391070	0.0115225	0.1651004	2.258198	661	0.0242586
computer_games	0.1534858	0.0394434	0.0760363	0.2309353	3.891291	661	0.0001098
diet	-0.1177625	0.0399661	-0.1962384	-0.0392867	-2.946560	661	0.0033264

agress_full_lm$anova

##           Step Df   Deviance Resid. Df Resid. Dev       AIC
## 1              NA         NA       660   62.23999 -1566.614
## 2 - television  1 0.04816702       661   62.28816 -1568.099

Based on the final model (which is actually all we’re interested in) the following variables predict aggression:

Parenting style (b = 0.062, $ \beta $ = 0.194, t = 4.93, p < 0.001) significantly predicted aggression. The beta value indicates that as parenting increases (i.e. as bad practices increase), aggression increases also.
Sibling aggression (b = 0.086, $ \beta $= 0.088, t = 2.26, p = 0.024) significantly predicted aggression. The beta value indicates that as sibling aggression increases (became more aggressive), aggression increases also.
Computer games (b = 0.143, $ \beta $ = 0.037, t= 3.89, p < .001) significantly predicted aggression. The beta value indicates that as the time spent playing computer games increases, aggression increases also.
Good diet (b = –0.112, $ \beta $ = –0.118, t = –2.95, p = 0.003) significantly predicted aggression. The beta value indicates that as the diet improved, aggression decreased.

The only factor not to predict aggression significantly was:

Television did not predict aggression (base don the change in the AIC).

Based on the standardized beta values, the most substantive predictor of aggression was actually parenting style, followed by computer games, diet and then sibling aggression.

$ R^2 $ is the squared correlation between the observed values of aggression and the values of aggression predicted by the model, 8.2% of the variance in aggression is explained. With all four of these predictors in the model only 8.2% of the variance in aggression can be explained.

The Q-Q plot suggests that errors may well deviate from normally distributed:

plot(agress_full_lm, which = 2)

The scatterplot of predicted values vs residuals helps us to assess both homoscedasticity and independence of errors. The plot does show a random scatter, but the trend line is not completely flat. It would be useful to run a robust model.

plot(agress_full_lm, which = c(1, 3))

Using robust standard errors, sibling aggression is no longer significant.

agress_full_lm %>%
  parameters::model_parameters(robust = TRUE, vcov.type = "HC4") %>% 
  knitr::kable(digits = 3)

Parameter	Coefficient	SE	CI_low	CI_high	t	df_error	p
(Intercept)	-0.006	0.012	-0.029	0.018	-0.494	661	0.621
parenting_style	0.062	0.016	0.031	0.093	3.905	661	0.000
sibling_aggression	0.086	0.045	-0.002	0.174	1.927	661	0.054
computer_games	0.143	0.048	0.050	0.237	3.019	661	0.003
diet	-0.112	0.049	-0.208	-0.015	-2.269	661	0.024

The influence plot looks fine. There are no extreme Cook’s distances and the trend line is fairly flat.

plot(agress_full_lm, which = 5)

Task 8.6

Repeat the analysis in Labcoat Leni’s Real Research 8.1 using bootstrapping for the confidence intervals. What are the confidence intervals for the regression parameters?

To recap, the models are fit as follows (see also the Labcoat Leni answers). Load the data directly from the discovr package:

ong_tib <- discovr::ong_2011

Fit the models that look at whether narcissism predicts, above and beyond the other variables, the frequency of status updates.

# Fit the models
ong_base_lm <- lm(status ~ sex + age + grade, data = ong_tib, na.action = na.exclude)
ong_ext_lm <- update(ong_base_lm, .~. + extraversion)
ong_ncs_lm <- update(ong_ext_lm, .~. + narcissism)

# Compare models
anova(ong_base_lm, ong_ext_lm, ong_ncs_lm) %>% 
  broom::tidy() %>% 
  knitr::kable(digits = 3)

res.df	rss	df	sumsq	statistic	p.value
246	1481.515	NA	NA	NA	NA
245	1458.360	1	23.155	4.017	0.046
244	1406.600	1	51.759	8.979	0.003

Fit the models that look at whether narcissism predicts, above and beyond the other variables, the Facebook profile picture ratings.

# Fit models
prof_base_lm <- lm(profile ~ sex + age + grade, data = ong_tib, na.action = na.exclude)
prof_ext_lm <- update(prof_base_lm, .~. + extraversion)
prof_ncs_lm <- update(prof_ext_lm, .~. + narcissism)

# Compare models

anova(prof_base_lm, prof_ext_lm, prof_ncs_lm) %>% 
  broom::tidy()  %>% 
  knitr::kable(digits = 3)

res.df	rss	df	sumsq	statistic	p.value
188	2405.233	NA	NA	NA	NA
187	2104.969	1	300.264	29.613	0
186	1885.958	1	219.011	21.600	0

Facebook status update frequency

Get the bootstrap CIs:

ong_ncs_lm %>% 
  parameters::model_parameters(bootstrap = TRUE, digits = 3) %>% 
  knitr::kable(digits = 3)

Parameter	Coefficient	CI_low	CI_high	p
(Intercept)	0.136	-5.152	5.158	0.943
sexMale	-0.947	-1.572	-0.363	0.004
age	-0.012	-0.341	0.363	0.933
gradeSec 2	-0.432	-1.280	0.505	0.348
gradeSec 3	-1.006	-2.101	-0.001	0.050
extraversion	0.010	-0.049	0.063	0.735
narcissism	0.067	0.025	0.107	0.004

The main benefit of the bootstrap confidence intervals and significance values is that they do not rely on assumptions of normality or homoscedasticity, so they give us an accurate estimate of the true population value of b for each predictor. The bootstrapped confidence intervals in the output do not affect the conclusions reported in Ong et al. (2011). Ong et al.’s prediction was still supported in that, after controlling for age, grade and sex, narcissism significantly predicted the frequency of Facebook status updates over and above extroversion, b = 0.066 [0.03, 0.10], p < 0.001.

Facebook profile picture rating

Get the bootstrap CIs:

prof_ncs_lm %>% 
  parameters::model_parameters(bootstrap = TRUE) %>% 
  knitr::kable(digits = 3)

Parameter	Coefficient	CI_low	CI_high	p
(Intercept)	-3.843	-18.934	8.564	0.563
sexMale	0.568	-0.539	1.650	0.344
age	0.348	-0.541	1.425	0.426
gradeSec 2	-0.522	-2.163	0.920	0.501
gradeSec 3	-0.634	-3.056	1.364	0.567
extraversion	0.109	0.023	0.203	0.006
narcissism	0.173	0.104	0.242	0.000

Similarly, the bootstrapped confidence intervals for the second regression are consistent with the conclusions reported in Ong et al. (2011). That is, after adjusting for age, grade and sex, narcissism significantly predicted the Facebook profile picture ratings over and above extroversion, b = 0.171 [0.10, 0.24], p < 0.001.

Task 8.7

Coldwell, Pike and Dunn (2006) investigated whether household chaos predicted children’s problem behaviour over and above parenting. From 118 families they recorded the age and gender of the youngest child (child_age and child_gender). They measured dimensions of the child’s perceived relationship with their mum: (1) warmth/enjoyment (child_warmth), and (2) anger/hostility (child_anger). Higher scores indicate more warmth/enjoyment and anger/hostility respectively. They measured the mum’s perceived relationship with her child, resulting in dimensions of positivity (mum_pos) and negativity (mum_neg). Household chaos (chaos) was assessed. The outcome variable was the child’s adjustment (sdq): the higher the score, the more problem behaviour the child was reported to be displaying. Conduct a hierarchical linear model in three steps: (1) enter child age and gender; (2) add the variables measuring parent-child positivity, parent-child negativity, parent-child warmth, parent-child anger; (3) add chaos. Is household chaos predictive of children’s problem behaviour over and above parenting? (coldwell_2006.csv).

Load the data:

chaos_tib <- discovr::coldwell_2006

Write a function to round values in the output!

round_values <- function(tibble, digits = 3){
  require(dplyr)
  tibble <- tibble %>% 
  dplyr::mutate(
    dplyr::across(where(is.numeric), ~round(., 3))
  )
  return(tibble)
}

Model 1: sdq predicted from child_age and child_gender:

chaos_base_lm <- lm(sdq ~ child_age + child_age, data = chaos_tib, na.action = na.exclude)

chaos_base_lm %>% 
  broom::glance() %>% 
  round_values() %>% 
  knitr::kable()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.001	-0.009	0.172	0.095	0.759	1	37.285	-68.569	-60.551	3.121	105	107

chaos_base_lm %>% 
  broom::tidy() %>% 
  round_values() %>% 
  knitr::kable()

term	estimate	std.error	statistic	p.value
(Intercept)	1.247	0.144	8.642	0.000
child_age	0.001	0.002	0.308	0.759

Model 2: In a new block, add child_anger, child_warmth, mum_pos and mum_neg into the model:

chaos_emo_lm <- update(chaos_base_lm, .~. + child_anger + child_warmth + mum_pos + mum_neg)
chaos_emo_lm %>% 
  broom::glance() %>% 
  round_values() %>% 
  knitr::kable()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.061	0.008	0.174	1.159	0.335	5	34.65	-55.301	-37.35	2.731	90	96

chaos_emo_lm %>% 
  broom::tidy() %>% 
  round_values() %>% 
  knitr::kable()

term	estimate	std.error	statistic	p.value
(Intercept)	0.943	0.332	2.843	0.006
child_age	0.001	0.002	0.590	0.556
child_anger	0.007	0.017	0.381	0.704
child_warmth	-0.008	0.025	-0.340	0.735
mum_pos	0.003	0.005	0.529	0.598
mum_neg	0.013	0.006	2.158	0.034

Model 3: In a final block, add chaos to the model:

chaos_chaos_lm <- update(chaos_emo_lm, .~. + chaos)
chaos_chaos_lm %>% 
  broom::glance() %>% 
  round_values() %>% 
  knitr::kable()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual	nobs
0.104	0.044	0.171	1.725	0.124	6	36.934	-57.867	-37.352	2.604	89	96

chaos_chaos_lm %>% 
  broom::tidy() %>% 
  round_values() %>% 
  knitr::kable()

term	estimate	std.error	statistic	p.value
(Intercept)	0.763	0.337	2.266	0.026
child_age	0.002	0.002	0.656	0.514
child_anger	0.003	0.017	0.190	0.850
child_warmth	-0.002	0.025	-0.093	0.926
mum_pos	0.003	0.005	0.507	0.613
mum_neg	0.011	0.006	1.859	0.066
chaos	0.074	0.036	2.082	0.040

chaos_chaos_lm %>% 
  parameters::model_parameters(standardize = "refit", digits = 3) %>% 
  knitr::kable()

Parameter	Coefficient	SE	CI_low	CI_high	t	df_error	p
(Intercept)	0.0000000	0.0998042	-0.1983088	0.1983088	0.0000000	89	1.0000000
child_age	0.0675926	0.1030422	-0.1371501	0.2723353	0.6559696	89	0.5135360
child_anger	0.0198999	0.1047546	-0.1882453	0.2280451	0.1899668	89	0.8497677
child_warmth	-0.0101373	0.1088309	-0.2263818	0.2061073	-0.0931469	89	0.9259962
mum_pos	0.0523631	0.1032264	-0.1527456	0.2574718	0.5072645	89	0.6132239
mum_neg	0.1988785	0.1070043	-0.0137368	0.4114939	1.8586029	89	0.0663887
chaos	0.2163831	0.1039170	0.0099023	0.4228639	2.0822697	89	0.0401883

We can’t compare the models because they’re based on different amounts of data (missing values)

From the output we can conclude that household chaos significantly predicted younger sibling’s problem behaviour over and above maternal parenting, child age and gender, t(88) = 2.08, p = 0.040. The positive standardized beta value (0.216) indicates that there is a positive relationship between household chaos and child’s problem behaviour. In other words, the higher the level of household chaos, the more problem behaviours the child displayed. The value of $R^2$ (0.10) tells us that household chaos accounts for 10% of the variance in child problem behaviour.

Last updated on May 29, 2025