Remember to change the author: field on this Rmd file to your own name.

Learning objectives

In today’s Lab you will gain practice with the following concepts from today’s class:

  • Using the qplot and ggplot commands from the ggplot2 library
  • Specifying shape and color attributes
  • Using facet_grid to create plots that show the data broken down by various subgroups
  • Constructing geographic heatmaps

Problems

We’ll begin by loading all the required packages.

library(tidyverse)
## ── Attaching packages ─────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
Cars93 <- as_tibble(MASS::Cars93)

1. facet_grid

Using the diamonds data set and the facet_grid command, create a figure that shows a scatterplot of price against carat for each combination of cut and clarity.

There are 8 levels of clarity, and 5 levels of cut. Your figure should therefore contain 40 scatterplots.

ggplot(diamonds, aes(x = carat, y = price)) + 
  geom_point() +  facet_grid(cut ~ clarity)

2. Plotting the Cars93 data

This problem uses the Cars93 dataset from the MASS package.

(a) Use qplot to create a scatterplot with Price on the y-axis and EngineSize on the x-axis.

qplot(x = EngineSize, y = Price, data = Cars93)

Describe the relationship between Price and EngineSize.

Price tends to increase as engine size increases. The variability in prices also seems to increase with engine size.

(b) Repeat part (a) using the ggplot function and geom_point() layer.

ggplot(Cars93, aes(x = EngineSize, y = Price)) + 
  geom_point()

(c) Repeat part (b), but this time specifying that the color mapping should depend on Type and the shape mapping should depend on DriveTrain.

ggplot(Cars93, aes(x = EngineSize, y = Price, colour = Type, shape = DriveTrain)) +
  geom_point()

Do you see any obvious patterns in how the different Types of cars cluster in the plot? Describe any clear patterns that you see.

Car types do appear to cluster. For instance, small cars tend to have small engines and low prices. Large cars ten to have mid-to-large sized engines and moderate prices.

Do you see any obvious patterns in how the different DriveTrains of cars cluster in the plot? Describe any clear patterns that you see.

This is less easy to discern. There certainly isn’t as much clustering as there is by car Type. Below we switch the role of Type and DriveTrain to make it easier to see patterns. The points are quite diffuse within each drivetrain type.

(d) Construct boxplots showing Price on the y-axis and AirBags on the x-axis. (Hint: boxplot is a valid ggplot2 geometry)

qplot(data = Cars93, x = AirBags, y = Price, geom = "boxplot")

Do you observe any association between AirBag type and Price? Explain.

The more airbags a vehicle has, the greater the price tends to be.

3. Plotting a map

At the end of lecture we used the following code to generate a headmap of murder rates in the US.

library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
# Create data frame for map data (US states)
states <- map_data("state")

# Here's what the states data frame looks like
str(states)
## 'data.frame':    15537 obs. of  6 variables:
##  $ long     : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
##  $ lat      : num  30.4 30.4 30.4 30.3 30.3 ...
##  $ group    : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ order    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ region   : chr  "alabama" "alabama" "alabama" "alabama" ...
##  $ subregion: chr  NA NA NA NA ...
# Make a copy of the data frame to manipulate
arrests <- USArrests

# Convert everything to lower case
names(arrests) <- tolower(names(arrests))
arrests$region <- tolower(rownames(USArrests))

# Merge the map data with the arrests data based on region
choro <- merge(states, arrests, sort = FALSE, by = "region")
choro <- choro[order(choro$order), ]

# Plot a map, filling in the states based on murder rate
qplot(long, lat, data = choro, group = group, fill = murder,
  geom = "polygon") + scale_fill_gradient(low = "#56B1F7", high = "#132B43")

Modify the code above to produce a heatmap of assault rates instead, with orange colours instead of blue colours for the gradient.

Here’s a document that may help you pick colors: Hex colour picker

# Make a copy of the data frame to manipulate
arrests <- USArrests

# Convert everything to lower case
names(arrests) <- tolower(names(arrests))
arrests$region <- tolower(rownames(USArrests))

# Merge the map data with the arrests data based on region
choro <- merge(states, arrests, sort = FALSE, by = "region")
choro <- choro[order(choro$order), ]

# Plot a map, filling in the states based on murder rate
qplot(long, lat, data = choro, group = group, fill = assault,
  geom = "polygon") + scale_fill_gradient(low = "#FFAD33", high = "#1A0F00")