Remember to change the author: field on this Rmd file to your own name.

Learning objectives

In today’s Lab you will gain practice with the following concepts from today’s class:

  • Using the t.test and wilcox.test commands to run 2-sample t-tests
  • Interpreting the results of statistical significance tests
  • Using qqnorm and qqline to construct normal quantile-quantile plots, and using them to assess whether data appear to be normally distributed
  • Using fisher.test on 2x2 tables and interpreting the results

We’ll begin by loading all the packages we might need.

library(tidyverse)
## ── Attaching packages ──────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
Cars93 <- as_tibble(MASS::Cars93)

Testing means between two groups

Here is a command that generates density plots of MPG.highway from the Cars93 data. Separate densities are constructed for US and non-US vehicles.

qplot(data = Cars93, x = MPG.highway, 
      fill = Origin, geom = "density", alpha = I(0.5))

(a) Using the Cars93 data and the t.test() function, run a t-test to see if average MPG.highway is different between US and non-US vehicles. Interpret the results

Try doing this both using the formula style input and the x, y style input.

# Edit me

(b) What is the confidence interval for the difference? Interpret this confidence interval.

# Edit me

(c) Repeat part (a) using the wilcox.test() function.

# Edit me

(d) Are your results for (a) and (c) very different?

Is the data normal?

(a) Modify the density plot code provided in problem 1 to produce a plot with better axis labels. Also add a title.

# Edit me

(b) Does the data look to be normally distributed? If not, describe why.

(c) Construct qqplots of MPG.highway, one plot for each Origin category. Overlay a line on each plot as illustrated in lecture.

# Edit me

(d) Does the data look to be normally distributed? If not, describe why.

Testing 2 x 2 tables

Doll and Hill’s 1950 article studying the association between smoking and lung cancer contains one of the most important 2 x 2 tables in history.

Here’s their data:

smoking <- as.table(rbind(c(688, 650), c(21, 59)))
dimnames(smoking) <- list(has.smoked = c("yes", "no"),
                    lung.cancer = c("yes","no"))
smoking
##           lung.cancer
## has.smoked yes  no
##        yes 688 650
##        no   21  59

(a) Use fisher.test() to test if there’s an association between smoking and lung cancer.

# Edit me

(b) What is the odds ratio? Interpret this quantity.

# Edit me

(c) Are your findings statistically significant?

# Edit me

(d) Write an inline code chunk similar to the one you saw in class where you interpret the results of this hypothesis test.