In today’s Lab you will gain practice with the following concepts from today’s class:
- Using the
qplot
andggplot
commands from theggplot2
library- Specifying
shape
andcolor
attributes- Using
facet_grid
to create plots that show the data broken down by various subgroups- Constructing geographic heatmaps
We’ll begin by loading all the required packages.
library(tidyverse)
## ── Attaching packages ─────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Cars93 <- as_tibble(MASS::Cars93)
Using the diamonds
data set and the facet_grid
command, create a figure that shows a scatterplot of price
against carat
for each combination of cut
and clarity
.
There are 8 levels of clarity, and 5 levels of cut. Your figure should therefore contain 40 scatterplots.
ggplot(diamonds, aes(x = carat, y = price)) +
geom_point() + facet_grid(cut ~ clarity)
This problem uses the Cars93 dataset from the MASS package.
(a) Use qplot
to create a scatterplot with Price on the y-axis and EngineSize on the x-axis
.
qplot(x = EngineSize, y = Price, data = Cars93)
Describe the relationship between Price and EngineSize.
Price tends to increase as engine size increases. The variability in prices also seems to increase with engine size.
(b) Repeat part (a) using the ggplot
function and geom_point()
layer.
ggplot(Cars93, aes(x = EngineSize, y = Price)) +
geom_point()
(c) Repeat part (b), but this time specifying that the color
mapping should depend on Type
and the shape
mapping should depend on DriveTrain
.
ggplot(Cars93, aes(x = EngineSize, y = Price, colour = Type, shape = DriveTrain)) +
geom_point()
Do you see any obvious patterns in how the different Types of cars cluster in the plot? Describe any clear patterns that you see.
Car types do appear to cluster. For instance, small cars tend to have small engines and low prices. Large cars ten to have mid-to-large sized engines and moderate prices.
Do you see any obvious patterns in how the different DriveTrains of cars cluster in the plot? Describe any clear patterns that you see.
This is less easy to discern. There certainly isn’t as much clustering as there is by car Type. Below we switch the role of Type and DriveTrain to make it easier to see patterns. The points are quite diffuse within each drivetrain type.
(d) Construct boxplots showing Price on the y-axis and AirBags on the x-axis. (Hint: boxplot
is a valid ggplot2 geometry)
qplot(data = Cars93, x = AirBags, y = Price, geom = "boxplot")
Do you observe any association between AirBag type and Price? Explain.
The more airbags a vehicle has, the greater the price tends to be.
At the end of lecture we used the following code to generate a headmap of murder rates in the US.
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
# Create data frame for map data (US states)
states <- map_data("state")
# Here's what the states data frame looks like
str(states)
## 'data.frame': 15537 obs. of 6 variables:
## $ long : num -87.5 -87.5 -87.5 -87.5 -87.6 ...
## $ lat : num 30.4 30.4 30.4 30.3 30.3 ...
## $ group : num 1 1 1 1 1 1 1 1 1 1 ...
## $ order : int 1 2 3 4 5 6 7 8 9 10 ...
## $ region : chr "alabama" "alabama" "alabama" "alabama" ...
## $ subregion: chr NA NA NA NA ...
# Make a copy of the data frame to manipulate
arrests <- USArrests
# Convert everything to lower case
names(arrests) <- tolower(names(arrests))
arrests$region <- tolower(rownames(USArrests))
# Merge the map data with the arrests data based on region
choro <- merge(states, arrests, sort = FALSE, by = "region")
choro <- choro[order(choro$order), ]
# Plot a map, filling in the states based on murder rate
qplot(long, lat, data = choro, group = group, fill = murder,
geom = "polygon") + scale_fill_gradient(low = "#56B1F7", high = "#132B43")
Modify the code above to produce a heatmap of assault
rates instead, with orange colours instead of blue colours for the gradient.
Here’s a document that may help you pick colors: Hex colour picker
# Make a copy of the data frame to manipulate
arrests <- USArrests
# Convert everything to lower case
names(arrests) <- tolower(names(arrests))
arrests$region <- tolower(rownames(USArrests))
# Merge the map data with the arrests data based on region
choro <- merge(states, arrests, sort = FALSE, by = "region")
choro <- choro[order(choro$order), ]
# Plot a map, filling in the states based on murder rate
qplot(long, lat, data = choro, group = group, fill = assault,
geom = "polygon") + scale_fill_gradient(low = "#FFAD33", high = "#1A0F00")