R code Chapter 1

Mae Jemstone character from Discovering Statistics using R and RStudio

This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. See the full license terms at the bottom of the page.

# Make sure to load this packages
library(tidyverse)

Functions and objects

metallica <- c("Lars","James","Jason", "Kirk")
metallica <- metallica[metallica != "Jason"]
metallica <- c(metallica, "Rob")

print(metallica)
## [1] "Lars"  "James" "Kirk"  "Rob"
metallica %>% print(.)
## [1] "Lars"  "James" "Kirk"  "Rob"

R markdown

Basic text formatting

  • Italic (*): Make text *italic* by placing it between asterisks (with no spaces) will knit as: Make text italic by placing it between asterisks (with no spaces)
  • Bold (**): Make text **bold** by placing it between asterisks (with no spaces) will knit as: Make text bold by placing it between asterisks (with no spaces)
  • Superscript (^^): Make text^superscript^ by placing it between carats (with no spaces) will knit as: Make text^superscript^ by placing it between asterisks (with no spaces)
  • Subscript (~~): Make text~subscript~ by placing it between tildes (with no spaces) will knit as: Make text~subscript~ by placing it between tildes (with no spaces)
  • Footnote [^\1]: A footnote[^1] and a second footnote[^2] will knit as: A footnote1 and a second footnote2

Headings

You can use hashes to make headings of different levels. For example:

# Level 1 heading
## Level 2 heading
### Level 3 heading
#### Level 4 heading
##### Level 5 heading
###### Level 6 heading

will knit as:

Level 1 heading

Level 2 heading

Level 3 heading

Level 4 heading

Level 5 heading
Level 6 heading

Bullet lists

Single bullet lists, this:

* This is the first bullet
* This is the second
* This is the third

will knit as:

  • This is the first bullet
  • This is the second
  • This is the third

Numbered bullet lists, this:

1. This is the first entry
2. This is the second
3. This is the third

will knit as:

  1. This is the first entry
  2. This is the second
  3. This is the third

Complex lists. This:

* This is the first bullet point
    + this is a sub-bullet
    + so is this
* This is the second bullet
    + This is a sub-bullet
        - I've gone crazy and done a third level of bullets
        - It had to be done
* and this is the third

will knit as:

  • This is the first bullet point
    • this is a sub-bullet
    • so is this
  • This is the second bullet
    • This is a sub-bullet
      • I’ve gone crazy and done a third level of bullets
      • It had to be done
  • and this is the third

You use [text do display](web address) to insert hyperlinks. For example:

My favourite band is [Iron Maiden](https://ironmaiden.com/) will knit as:

My favourite band is Iron Maiden

Images

You use ![Image caption (optional)](path to image) to insert images. For example:

![Figure 1: I love my spaniel](andy_milton.png) knits as:

Figure 1: I love my spaniel

Tables

Insert tables using raw text. | denotes a column and the colon position denotes the alignment of the column, for example |---:| is right justified, |:---| is left justified, and |:---:| is centred.

: My top 3 Iron Maiden albums

| Name                   | Year   | Cover rating | Favourite track |
|:-----------------------|:----:|------:|:--------------------------------:|
|Piece of mind           | 1983 | ****  | The Flight of Icarus             |
|The Number of the beast | 1982 | ****  | Children of the damned           |
|Powerslave              | 1984 | ***** | The rime of the ancient mariner  |
knits as:

My top 3 Iron Maiden albums

NameYearCover ratingFavourite track
“Piece of mind”“1983”“****”“The Flight of Icarus”
“The Number of the beast”“1982”“****”“Children of the damned”
“Powerslave”“1984”“*****”“The rime of the ancient mariner”

Alternative, put your data in a tibble (more on this later) and use knitr::kable():

tibble::tribble(
~Name, ~Year, ~`Cover rating`, ~`Favourite track`,
"Piece of mind", "1983",  "****", "The Flight of Icarus",
"The Number of the beast", "1982", "****", "Children of the damned",
"Powerslave", "1984", "*****", "The rime of the ancient mariner"
) %>% 
knitr::kable(caption = "My top 3 Iron Maiden albums")

will knit as:

tibble::tribble(
~Name, ~Year, ~`Cover rating`, ~`Favourite track`,
"Piece of mind", "1983",  "****", "The Flight of Icarus",
"The Number of the beast", "1982", "****", "Children of the damned",
"Powerslave", "1984", "*****", "The rime of the ancient mariner"
) %>% 
knitr::kable(caption = "My top 3 Iron Maiden albums")

Table: Table 1: My top 3 Iron Maiden albums

NameYearCover ratingFavourite track
Piece of mind1983****The Flight of Icarus
The Number of the beast1982****Children of the damned
Powerslave1984*****The rime of the ancient mariner

Equations

You can include equations in R markdown using latex commands. You include an equation by enclosing it within $$ (or a single $ if you want the equation within the line of text you’re writing). To give you a flavour:

We can include the linear model in a sentence like this: (Y_i = b_0 + b_1X_i + \epsilon_i)``

which will knit as:

We can include the linear model in a sentence like this: \(Y_i = b_0 + b_1X_i + \epsilon_i\)

Or, if we want it within its own paragraph we’d write it as this:

$$
Y_i = b_0 + b_1X_i + \epsilon_i
$$

which knits as:

$$ Y_i = b_0 + b_1X_i + \epsilon_i $$

Tidyverse and the pipe operator (%>%)

# library(tidyverse) or library(magrittr) to access the pipe

core_members <- metallica %>% 
  subset(., metallica != "Rob") %>% 
  sort(.)

core_members
## [1] "James" "Kirk"  "Lars"

Getting data into R

name <- c("Lars Ulrich","James Hetfield", "Kirk Hammett", "Rob Trujillo", "Jason Newsted", "Cliff Burton", "Dave Mustaine")

# Numeric variables stored as double
songs_written <-  c(111, 112, 56, 16, 3, 11, 6)
net_worth <- c(300000000, 300000000, 200000000, 20000000, 40000000, 1000000, 20000000)

# Numeric variables stored as integer
songs_written_int <-  c(111L, 112L, 56L, 16L, 3L, 11L, 6L)
net_worth_int <- c(300000000L, 300000000L, 200000000L, 20000000L, 40000000L, 1000000L, 20000000L)

# Date variables
birth_date <- c("1963-12-26", "1963-08-03", "1962-11-18", "1964-10-23", "1963-03-04", "1962-02-10", "1961-09-13") %>% lubridate::ymd()

death_date <- c(NA, NA, NA, NA, NA, "1986-09-27", NA) %>% 
  lubridate::ymd()

# Logical variables
current_member <- c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE)

# Factor variables
instrument <- c(2, 0, 0, 1, 1, 1, 0) %>% factor(levels = 0:2, labels = c("Guitar", "Bass", "Drums"))


instrument <- c("Drums", "Guitar", "Guitar", "Bass", "Bass", "Bass", "Guitar") %>%
  forcats::as_factor() %>%
  forcats::fct_relevel("Guitar", "Bass", "Drums")

levels(instrument)
## [1] "Guitar" "Bass"   "Drums"
levels(instrument) <- c("Proper guitar", "Bass guitar", "Drums") 

Tibbles and data frames

Creating data frames

metalli_dat <- data.frame(name, birth_date, death_date, instrument, current_member, songs_written, net_worth)

Creating tibbles

metalli_tib <- tibble::tibble(name, birth_date, death_date, instrument, current_member, songs_written, net_worth)

Viewing dataframes and tibbles

metalli_tib
## # A tibble: 7 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <fct>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Proper gu… TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Proper gu… TRUE                      56 200000000
## 4 Rob T… 1964-10-23 NA         Bass guit… TRUE                      16  20000000
## 5 Jason… 1963-03-04 NA         Bass guit… FALSE                      3  40000000
## 6 Cliff… 1962-02-10 1986-09-27 Bass guit… FALSE                     11   1000000
## 7 Dave … 1961-09-13 NA         Proper gu… FALSE                      6  20000000
# View(metalli_tib) 

Creating an empty tibble

#to create an empty tibble called empty_tib that has 50 rows, execute:

empty_tib <- tibble::tibble(.rows = 10)

Creating new variables within a tibble

Using base R:

metalli_tib$albums <-  c(10, 10, 10, 2, 4, 3, 0)

Using dplyrr:mutate():

metalli_tib <- metalli_tib %>% 
  dplyr::mutate(
    	albums = c(10, 10, 10, 2, 4, 3, 0)
  )

metalli_tib
## # A tibble: 7 x 8
##   name  birth_date death_date instrument current_member songs_written net_worth
##   <chr> <date>     <date>     <fct>      <lgl>                  <dbl>     <dbl>
## 1 Lars… 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 Jame… 1963-08-03 NA         Proper gu… TRUE                     112 300000000
## 3 Kirk… 1962-11-18 NA         Proper gu… TRUE                      56 200000000
## 4 Rob … 1964-10-23 NA         Bass guit… TRUE                      16  20000000
## 5 Jaso… 1963-03-04 NA         Bass guit… FALSE                      3  40000000
## 6 Clif… 1962-02-10 1986-09-27 Bass guit… FALSE                     11   1000000
## 7 Dave… 1961-09-13 NA         Proper gu… FALSE                      6  20000000
## # … with 1 more variable: albums <dbl>

You can create your data set by initializing a tibble and then defining each variable. Note that in this context we use = rather than <- to assign values to each variable, and that each variable definition ends with a comma (except the last). For example, we can create metalli_tib from scratch as follows:

metalli_tib <- tibble::tibble(
    name = c("Lars Ulrich","James Hetfield", "Kirk Hammett", "Rob Trujillo", "Jason Newsted", "Cliff Burton", "Dave Mustaine"),
    birth_date = c("1963-12-26", "1963-08-03", "1962-11-18", "1964-10-23", "1963-03-04", "1962-02-10", "1961-09-13") %>% lubridate::ymd(),
    death_date = c(NA, NA, NA, NA, NA, "1986-09-27", NA) %>% 
  lubridate::ymd(),
  instrument = c("Drums", "Guitar", "Guitar", "Bass", "Bass", "Bass", "Guitar") %>%
  forcats::as_factor() %>% forcats::fct_relevel("Guitar", "Bass", "Drums"),
  current_member = c(TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE),
  songs_written =  c(111, 112, 56, 16, 3, 11, 6),
net_worth = c(300000000, 300000000, 200000000, 20000000, 40000000, 1000000, 20000000)
  )

and now add one that quantifies how much they have earned per song that they contributed to:

metalli_tib <- metalli_tib %>% 
  dplyr::mutate(
    	worth_per_song = net_worth/songs_written
  )
metalli_tib
## # A tibble: 7 x 8
##   name  birth_date death_date instrument current_member songs_written net_worth
##   <chr> <date>     <date>     <fct>      <lgl>                  <dbl>     <dbl>
## 1 Lars… 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 Jame… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk… 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 Rob … 1964-10-23 NA         Bass       TRUE                      16  20000000
## 5 Jaso… 1963-03-04 NA         Bass       FALSE                      3  40000000
## 6 Clif… 1962-02-10 1986-09-27 Bass       FALSE                     11   1000000
## 7 Dave… 1961-09-13 NA         Guitar     FALSE                      6  20000000
## # … with 1 more variable: worth_per_song <dbl>

Entering data directly into a tibble

You can enter the data directly as a tibble (rather than creating an empty one and using dplyr::mutate():

metalli_tib <- tibble::tribble(
  ~name, ~birth_date, ~death_date, ~instrument, ~current_member, ~songs_written, ~net_worth,
  "Lars Ulrich", "1963-12-26", NA, "Drums", TRUE, 111, 300000000,
  "James Hetfield", "1963-08-03", NA, "Guitar", TRUE, 112, 300000000,
  "Kirk Hammett", "1962-11-18", NA, "Guitar", TRUE, 56, 200000000,
  "Rob Trujillo", "1964-10-23", NA, "Bass",  TRUE, 16, 20000000,
  "Jason Newsted", "1963-03-04", NA, "Bass", FALSE, 3, 40000000,
  "Cliff Burton", "1962-02-10", "1986-09-27", "Bass", FALSE, 11, 1000000,
  "Dave Mustaine", "1961-09-13", NA, "Guitar", FALSE, 6, 20000000
  ) %>% 
  dplyr::mutate(
    birth_date = lubridate::ymd(birth_date),
    death_date = lubridate::ymd(death_date)
  )

Finding a cell of a data frame or tibble

Three ways to discover which instrument Lars Ulrich ‘plays’:

metalli_tib[1, 4]
## # A tibble: 1 x 1
##   instrument
##   <chr>     
## 1 Drums
metalli_tib[1, "instrument"]
## # A tibble: 1 x 1
##   instrument
##   <chr>     
## 1 Drums
metalli_tib[name == "Lars Ulrich", "instrument"]
## # A tibble: 1 x 1
##   instrument
##   <chr>     
## 1 Drums

Selecting variables

Using base R

# These commands return the contents of the variable called 'name'
metalli_tib$name
## [1] "Lars Ulrich"    "James Hetfield" "Kirk Hammett"   "Rob Trujillo"  
## [5] "Jason Newsted"  "Cliff Burton"   "Dave Mustaine"
metalli_tib[1]
## # A tibble: 7 x 1
##   name          
##   <chr>         
## 1 Lars Ulrich   
## 2 James Hetfield
## 3 Kirk Hammett  
## 4 Rob Trujillo  
## 5 Jason Newsted 
## 6 Cliff Burton  
## 7 Dave Mustaine
metalli_tib["name"]
## # A tibble: 7 x 1
##   name          
##   <chr>         
## 1 Lars Ulrich   
## 2 James Hetfield
## 3 Kirk Hammett  
## 4 Rob Trujillo  
## 5 Jason Newsted 
## 6 Cliff Burton  
## 7 Dave Mustaine
# Both of these commands return the contents of the variables called 'name' and instrument
metalli_tib[c(1, 4)]
## # A tibble: 7 x 2
##   name           instrument
##   <chr>          <chr>     
## 1 Lars Ulrich    Drums     
## 2 James Hetfield Guitar    
## 3 Kirk Hammett   Guitar    
## 4 Rob Trujillo   Bass      
## 5 Jason Newsted  Bass      
## 6 Cliff Burton   Bass      
## 7 Dave Mustaine  Guitar
metalli_tib[c("name", "instrument")]
## # A tibble: 7 x 2
##   name           instrument
##   <chr>          <chr>     
## 1 Lars Ulrich    Drums     
## 2 James Hetfield Guitar    
## 3 Kirk Hammett   Guitar    
## 4 Rob Trujillo   Bass      
## 5 Jason Newsted  Bass      
## 6 Cliff Burton   Bass      
## 7 Dave Mustaine  Guitar

The tidyverse way

metalli_tib %>% 
  dplyr::select(name, instrument)
## # A tibble: 7 x 2
##   name           instrument
##   <chr>          <chr>     
## 1 Lars Ulrich    Drums     
## 2 James Hetfield Guitar    
## 3 Kirk Hammett   Guitar    
## 4 Rob Trujillo   Bass      
## 5 Jason Newsted  Bass      
## 6 Cliff Burton   Bass      
## 7 Dave Mustaine  Guitar

You can exclude variables too, try these out:

metalli_tib %>% 
  dplyr::select(-name)
## # A tibble: 7 x 6
##   birth_date death_date instrument current_member songs_written net_worth
##   <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 1964-10-23 NA         Bass       TRUE                      16  20000000
## 5 1963-03-04 NA         Bass       FALSE                      3  40000000
## 6 1962-02-10 1986-09-27 Bass       FALSE                     11   1000000
## 7 1961-09-13 NA         Guitar     FALSE                      6  20000000
metalli_tib %>% 
  dplyr::select(-c(name, instrument))
## # A tibble: 7 x 5
##   birth_date death_date current_member songs_written net_worth
##   <date>     <date>     <lgl>                  <dbl>     <dbl>
## 1 1963-12-26 NA         TRUE                     111 300000000
## 2 1963-08-03 NA         TRUE                     112 300000000
## 3 1962-11-18 NA         TRUE                      56 200000000
## 4 1964-10-23 NA         TRUE                      16  20000000
## 5 1963-03-04 NA         FALSE                      3  40000000
## 6 1962-02-10 1986-09-27 FALSE                     11   1000000
## 7 1961-09-13 NA         FALSE                      6  20000000

You can save a subsetted tibble to a new object:

# Save a version of metalli_tib but exclude the variable called name

metalli_anon_tib <- metalli_tib %>% 
  dplyr::select(-name)

#View this new object
metalli_anon_tib
## # A tibble: 7 x 6
##   birth_date death_date instrument current_member songs_written net_worth
##   <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 1964-10-23 NA         Bass       TRUE                      16  20000000
## 5 1963-03-04 NA         Bass       FALSE                      3  40000000
## 6 1962-02-10 1986-09-27 Bass       FALSE                     11   1000000
## 7 1961-09-13 NA         Guitar     FALSE                      6  20000000

Selecting cases (filtering tibbles)

Using base R we can do the following. View only the data for the current members of metallica:

metalli_tib[current_member == TRUE,]
## # A tibble: 4 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 Rob T… 1964-10-23 NA         Bass       TRUE                      16  20000000

View only the instruments played by the current members of metallica:

metalli_tib[current_member == TRUE, "instrument"]
## # A tibble: 4 x 1
##   instrument
##   <chr>     
## 1 Drums     
## 2 Guitar    
## 3 Guitar    
## 4 Bass

View only the names, instruments played, and number of songs written by the current members of metallica:

metalli_tib[current_member == TRUE, c("name", "instrument", "songs_written")]
## # A tibble: 4 x 3
##   name           instrument songs_written
##   <chr>          <chr>              <dbl>
## 1 Lars Ulrich    Drums                111
## 2 James Hetfield Guitar               112
## 3 Kirk Hammett   Guitar                56
## 4 Rob Trujillo   Bass                  16
metalli_tib[songs_written > 50,]
## # A tibble: 3 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Guitar     TRUE                      56 200000000

Using dplyr::filter()

dplyr::filter(metalli_tib, current_member == TRUE)
## # A tibble: 4 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 Rob T… 1964-10-23 NA         Bass       TRUE                      16  20000000
# Or using a pipe:

metalli_tib %>% 
  dplyr::filter(current_member == TRUE)
## # A tibble: 4 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 Rob T… 1964-10-23 NA         Bass       TRUE                      16  20000000

Again, we can save the filtered tibble to a new object:

metallica_current <- metalli_tib %>% 
  dplyr::filter(current_member == TRUE)

metallica_current
## # A tibble: 4 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 Rob T… 1964-10-23 NA         Bass       TRUE                      16  20000000

Combining conditions:

metalli_tib %>% 
  dplyr::filter(is.na(death_date) & instrument == "Bass guitar")
## # A tibble: 0 x 7
## # … with 7 variables: name <chr>, birth_date <date>, death_date <date>,
## #   instrument <chr>, current_member <lgl>, songs_written <dbl>,
## #   net_worth <dbl>

If we change is.na() to !is.na() we get Cliff Burton’s data (the only member who is a bassist and does NOT have a value of ‘NA’ for the variable death_date):

metalli_tib %>% 
  dplyr::filter(!is.na(death_date) & instrument == "Bass guitar")
## # A tibble: 0 x 7
## # … with 7 variables: name <chr>, birth_date <date>, death_date <date>,
## #   instrument <chr>, current_member <lgl>, songs_written <dbl>,
## #   net_worth <dbl>

We can also use the OR operator (|) to set conditions for which only one has to be true. For example, to select members who are either bassists OR drummers we’d use:

metalli_tib %>% 
  dplyr::filter(instrument == "Drums" | instrument == "Bass guitar")
## # A tibble: 1 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000

Combining selecting cases with selecting variables

This command filters metalli_tib according to whether the variable current_member is equal to TRUE and whether the variable instrument is NOT equal to (!=) the phrase “Bass guitar”:

metalli_worth  <- metalli_tib %>% 
  dplyr::filter(current_member == TRUE & instrument != "Bass guitar")

metalli_worth
## # A tibble: 4 x 7
##   name   birth_date death_date instrument current_member songs_written net_worth
##   <chr>  <date>     <date>     <chr>      <lgl>                  <dbl>     <dbl>
## 1 Lars … 1963-12-26 NA         Drums      TRUE                     111 300000000
## 2 James… 1963-08-03 NA         Guitar     TRUE                     112 300000000
## 3 Kirk … 1962-11-18 NA         Guitar     TRUE                      56 200000000
## 4 Rob T… 1964-10-23 NA         Bass       TRUE                      16  20000000

Having done this, we could pass the object net_worth object that we just created into the select function to select the variables name and net_worth:

metalli_worth  <- metalli_worth  %>% 
  dplyr::select(name, net_worth)

metalli_worth
## # A tibble: 4 x 2
##   name           net_worth
##   <chr>              <dbl>
## 1 Lars Ulrich    300000000
## 2 James Hetfield 300000000
## 3 Kirk Hammett   200000000
## 4 Rob Trujillo    20000000

Better still, combine the two operations into a single pipe:

metalli_worth <- metalli_tib %>% 
  dplyr::filter(current_member == TRUE & instrument != "Bass guitar") %>% 
  dplyr::select(name, net_worth)

metalli_worth
## # A tibble: 4 x 2
##   name           net_worth
##   <chr>              <dbl>
## 1 Lars Ulrich    300000000
## 2 James Hetfield 300000000
## 3 Kirk Hammett   200000000
## 4 Rob Trujillo    20000000

Doing the same with base R will make your eyes hurt

metalli_worth <- metalli_tib[current_member == TRUE & instrument != "Bass guitar", c("name", "net_worth")]
metalli_worth
## # A tibble: 3 x 2
##   name           net_worth
##   <chr>              <dbl>
## 1 Lars Ulrich    300000000
## 2 James Hetfield 300000000
## 3 Kirk Hammett   200000000

Exporting data

readr::write_csv(metalli_tib, "../data/metallica.csv")

# or 

readr::write_csv(metalli_tib, here::here("/data/metallica.csv"))

Using other software to get data in R

metalli_tib <- readr::read_csv("../data/metallica.csv")

# or 

metalli_tib <- here::here("/data/metallica.csv") %>% readr::read_csv()

We can specify data types:

metalli_tib <- readr::read_csv("../data/metallica.csv", col_types = cols(
  name = col_character(),
  birth_date = col_date(),
  death_date = col_date(),
  instrument = col_factor(),
  current_member = col_logical(),
  songs_written = col_double(),
  net_worth = col_double()
  )
)

#OR

metalli_tib <- readr::read_csv("../data/metallica.csv", col_types = "cDDfldd")

# OR

metalli_tib <- readr::read_csv("../data/metallica.csv") %>% 
  dplyr::mutate(
    instrument = forcats::as_factor(instrument)
  )

metalli_tib$instrument

Pieces of great

Pieces of great 1.5

husband <- c("1973-06-21", "1970-07-16", "1949-10-08", "1969-05-24")
wife <- c("1984-11-12", "1973-08-02", "1948-11-11", "1983-07-23")
agegap <- husband-wife

husband <- c("1973-06-21", "1970-07-16", "1949-10-08", "1969-05-24") %>%
  lubridate::ymd(.)
wife <- c("1984-11-12", "1973-08-02", "1948-11-11", "1983-07-23") %>%
  lubridate::ymd(.)

agegap <- husband-wife
agegap

Pieces of great 1.8

# Creates a list
metalli_lst <- list(name, instrument)
metalli_lst
## [[1]]
## [1] "Lars Ulrich"    "James Hetfield" "Kirk Hammett"   "Rob Trujillo"  
## [5] "Jason Newsted"  "Cliff Burton"   "Dave Mustaine" 
## 
## [[2]]
## [1] Drums         Proper guitar Proper guitar Bass guitar   Bass guitar  
## [6] Bass guitar   Proper guitar
## Levels: Proper guitar Bass guitar Drums
# creates a data frame using cbind
metalli_mtx <- cbind(name, instrument)
metalli_mtx
##      name             instrument
## [1,] "Lars Ulrich"    "3"       
## [2,] "James Hetfield" "1"       
## [3,] "Kirk Hammett"   "1"       
## [4,] "Rob Trujillo"   "2"       
## [5,] "Jason Newsted"  "2"       
## [6,] "Cliff Burton"   "2"       
## [7,] "Dave Mustaine"  "1"

  1. Contents of my first footnote. ↩︎

  2. Contents of my second footnote. ↩︎

Next