Remember to change the author: field on this Rmd file to your own name.

For the first two problems we’ll use the Cars93 data set from the MASS library.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
Cars93 <- MASS::Cars93

1. Manipulating data frames

There are certain situations where we want to transform right-skewed data before analysing it. Taking the log of right-skewed data often helps to make it more normally distributed.

Here are histograms of the MPG.highway and MPG.city variables.

qplot(MPG.city, data = Cars93, bins = 10)

qplot(MPG.highway, data = Cars93, bins = 10)

(a) Do the city and highway gas-mileage figures appear to have right-skewed distributions?

Your answer: Yes. Most of the the mass is closely concentrated near low MPG values, and there’s a long right tail indicating a small proportion of cars that have very high MPG.

(b) Use the mutate() and log() functions to create a new data frame called Cars93.log that has MPG.highway and MPG.city replaced with log(MPG.highway) and log(MPG.city).

Cars93.log <- mutate(Cars93, MPG.highway = log(MPG.highway), MPG.city = log(MPG.city)) 

(c) Run the histogram commands again, this time using your new Cars93.log dataset instead of Cars93.

qplot(MPG.city, data = Cars93.log, bins = 10)

qplot(MPG.highway, data = Cars93.log, bins = 10)

(d) Do the distributions appear less skewed than before?

The MPG highway distribution does look more symmetric.

2. Table function

(a) Use the table() function to tabulate the data by DriveTrain and Origin.

table(Cars93$DriveTrain, Cars93$Origin)
##        
##         USA non-USA
##   4WD     5       5
##   Front  34      33
##   Rear    9       7

(b) Repeat part (a), this time using the count() function.

Cars93 %>% 
  count(DriveTrain, Origin)
## # A tibble: 6 x 3
##   DriveTrain Origin      n
##   <fct>      <fct>   <int>
## 1 4WD        USA         5
## 2 4WD        non-USA     5
## 3 Front      USA        34
## 4 Front      non-USA    33
## 5 Rear       USA         9
## 6 Rear       non-USA     7

(c) Does it looks like foreign car manufacturers had different Drivetrain production preferences compared to US manufacturers?

Your answer: The counts for each Drivetrain category are nearly the same for US and non-US manufacturers. The table suggests that they had similar Drivetrain production preferences.

3. Functions, lists, and if-else practice

(a) Write a function called isPassingGrade whose input x is a number, and which returns FALSE if x is lower than 50 and TRUE otherwise.

isPassingGrade <- function(x) {
  x >= 50
}

(b) Write a function called sendMessage whose input x is a number, and which prints Congratulations if isPassingGrade(x) is TRUE and prints Oh no! if isPassingGrade(x) is FALSE.

sendMessage <- function(x) {
  if(isPassingGrade(x)) {
    print("Congratulations!")
  } else {
    print("Oh no!")
  }
}

# Here's another way of accomplishing the same thing

sendMessage2 <- function(x) print(ifelse(isPassingGrade(x), "Congratulations", "Oh no!"))

(c) Write a function called gradeSummary whose input x is a number. Your function will return a list with two elements, named letter.grade and passed. The letter grade will be "A" if x is at least 90. The letter grade will be "B" if x is between 80 and 90. The letter grade will be "F" if x is lower than "80". If the student’s letter grade is an A or B, passed should be TRUE; passed should be FALSE otherwise.

gradeSummary <- function(x) {
  if(x >= 90) {
    letter.grade <- "A"
    passed <- TRUE
  } else if (x >= 80) {
    letter.grade <- "B"
    passed <- TRUE
  } else {
    letter.grade <- "F"
    passed <- FALSE
  }
  list(letter.grade = letter.grade, passed = passed)
}

gradeSummary(91)
## $letter.grade
## [1] "A"
## 
## $passed
## [1] TRUE
gradeSummary(62)
## $letter.grade
## [1] "F"
## 
## $passed
## [1] FALSE

To check if your function works, try the following cases:

x = 91 should return

## $letter.grade
## [1] "A"
## 
## $passed
## [1] TRUE

x = 62 should return

## $letter.grade
## [1] "F"
## 
## $passed
## [1] FALSE