The discovr package will contain tutorials associated with my textbook
Discovering Statistics using R and RStudio, due out in early 2021. It will include all datasets, but most important it will contain a series of interactive tutorials that teach
alongside the chapters of the book. The tutorials are written using a package called
learnr. Once a tutorial is running it’s a bit like reading excerpts of the book but with places where you can practice the R code that you have just been taught. The discovr package is free (as are all things
-related) and offered to support tutors and students using my textbook.
What are R and RStudio?
If you’re using a textbook about
then you probably already know what it is. If not,
R is a free software environment for statistical analysis and graphics. RStudio is a user interface through which to use
. RStudio adds functionality that make working with
easier, more efficient, and generally more pleasant than working in
alone.
You can get started with
R and
RStudio by completing this tutorial (includes videos):
The tutorials are named to correspond (roughly) to the relevant chapter of the book. For example, discovr_04 would be a good tutorial to run alongside teaching related to chapter 4, and so on. Some longer chapters have several tutorials that break the content into more manageable chunks. Given the current global situation and the fact that lots of instructors are needing to teach remotely I may make what I have available in summer 2020 (for teaching in Autumn 2020), and update as and when I have new tutorials written.
discovr_01: Key concepts in
(functions and objects, packages and functions, style, data types, tidyverse, tibbles)
discovr_02: Summarizing data (frequency distributions, grouped frequency distributions, relative frequencies, histograms, mean, median, variance, standard deviation, interquartile range)
discovr_03: Confidence intervals: interactive app demonstrating what a confidence interval is, computing normal and bootstrap confidence intervals using R, adding confidence intervals to data summaries.
discovr_05: Visualizing data. The ggplot2 package, boxplots, plotting means, violin plots, scatterplots, grouping by colour, grouping using facets, adjusting scales, adjusting positions.”
discovr_06: The beast of bias. Restructuring data from messy to tidy format (and back). Spotting outliers using histograms and boxplots. Calculating z-scores (standardizing scores). Writing your own function. Using z-scores to detect outliers. Q-Q plots. Calculating skewness, kurtosis and the number of valid cases. Grouping summary statistics by multiple categorical/grouping variables.
discovr_07: Associations. Plotting data with GGally. Pearson’s r, Spearman’s Rho, Kendall’s tau, robust correlations.
discovr_08: The general linear model (GLM). Visualizing the data, fitting GLMs with one and two predictors. Viewing model parameters with broom, model parameters, standard errors, confidence intervals, fit statistics, significance.
discovr_09: Categorical predictors with two categories (comparing two means). Comparing two independent means, comparing two related means, effect sizes.
discovr_10: Moderation and mediation. Centring variables (grand mean centring), specifying interaction terms, moderation analysis, simple slopes analysis, Johnson-Neyman intervals, mediation with one predictor, direct and indirect effects, mediation using lavaan.
discovr_11: Comparing several means. Essentially ‘One-way independent ANOVA’ but taught using a general linear model framework. Covers setting contrasts (dummy coding, contrast coding, and linear and quadratic trends), the F-statistic and Welch’s robust F, robust parameter estimation, heteroscedasticity-consistent tests of parameters, robust tests of means based on trimmed data, post hoc tests, Bayes factors.
discovr_12: Comparing means adjusted for other variables. Essentially ‘Analysis of Covariance (ANCOVA)’ designs but taught using a general linear model framework. Covers setting contrasts, Type III sums of squares, the F-statistic, robust parameter estimation, heteroscedasticity-consistent tests of parameters, robust tests of adjusted means, post hoc tests, Bayes factors.
discovr_13: Factorial designs. Fitting models for two-way factorial designs (independent measures) using both lm() and the afex package. This tutorial builds on previous ones to show how models can be fit with two categorical predictors to look at the interaction between them. We look at fitting the models, setting contrasts for the two categorical predictors, obtaining estimated marginal means, interaction plots, simple effects analysis, diagnostic plots, partial eta-squared and partial omega-squared, robust models and Bayes factors.
Installing discovr
This package is incomplete but under active development. I have released it early in case it is useful for instructors needing to move rapidly to remote learning because of the current global pandemic. Check the GitHub page for updates/new tutorials.
To use discovr you first need to install
and RStudio. To learn how to do this and to get oriented with
and RStudio complete my interactive tutorial,
getting started with R and RStudio.
The name of each tutorial is in bold in the list above. Once the command to run the tutorial is executed it will spring to life in the tutorial pane.
Suggested workflow
The tutorials are self-contained (you practice code in code boxes) so you don’t need to use RStudio at the same time. However, to get the most from them I would recommend that you create an RStudio project and within that open (and save) a new RMarkdown file each time to work through a tutorial. Within that Markdown file, replicate parts of the code from the tutorial (in code chunks) and use Markdown to write notes about what you have done, and to reflect on things that you have struggled with, or note useful tips to help you remember things. Basically, write a learning journal. This workflow has the advantage of not just teaching you the code that you need to do certain things, but also provides practice in using RStudio itself.
Here’s a video explaining how I suggest using the tutorials.
Other resources
Statistics
The tutorials typically follow examples described in detail in
Discovering Statistics using R and RStudio. That book covers the theoretical side of the statistical models, and has more depth on conducting and interpreting the models in these tutorials.
If any of the statistical content doesn’t make sense, you could try my more introductory book
An adventure in statistics.
There are free lectures and screencasts on my
YouTube channel.