We’ll begin by doing all the same data processing as in lecture.
library(MASS)
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Assign more descriptive variable names
colnames(birthwt) <- c("birthwt.below.2500", "mother.age", "mother.weight",
"race", "mother.smokes", "previous.prem.labor", "hypertension", "uterine.irr",
"physician.visits", "birthwt.grams")
# Assign more descriptive factor levels and convert variables to factors as needed
library(plyr)
birthwt <- transform(birthwt,
race = as.factor(mapvalues(race, c(1, 2, 3),
c("white","black", "other"))),
mother.smokes = as.factor(mapvalues(mother.smokes,
c(0,1), c("no", "yes"))),
hypertension = as.factor(mapvalues(hypertension,
c(0,1), c("no", "yes"))),
uterine.irr = as.factor(mapvalues(uterine.irr,
c(0,1), c("no", "yes"))),
birthwt.below.2500 = as.factor(mapvalues(birthwt.below.2500,
c(0,1), c("no", "yes")))
)
One of the advantages of aggregate() is that it makes it easier to view summary tables when grouping on more than two factors.
(a) Use the tapply()
function to calculate mean birthwt.grams
grouped by race, mother’s smoking status, and hypertension.
# Edit me
One of the cells in the tapply
output is equal to NA
. Explain why.
Replace this text with your solution.
(b) Repeat part (a), this time using the ddply()
function.
# Edit me
Do you see an NA
result? Explain.
Replace this text with your solution.
(c) Repeat part (b), this time adding the argument .drop = FALSE
as part of your ddply
call. What happens now?
# Edit me
(a) Construct a violin plot of showing how the distribution of diamond prices varies by diamond cut
.
# Edit me
(b) Use facet_grid
with geom_historam
to construct 5 histograms showing the distribution of price within every category of diamond color
.
# Edit me