library(ggplot2)

# Load bikes data
bikes <- read.csv("http://www.andrew.cmu.edu/user/achoulde/94842/data/bikes.csv", header = TRUE)

Jittering points

When constructing a scatterplot with a discrete x axis (e.g., x might denote month, rating, or some kind of binned score), it may be helpful to apply “jitter” to spread out the points. Here’s an illustration of how to do so with the bikeshare data.

# Basic plot
qplot(data = bikes, x = mnth, y = cnt, color = as.factor(mnth)) + guides(color = FALSE)

# Jittered plot
qplot(data = bikes, x = mnth, y = cnt, color = as.factor(mnth), geom = "jitter") + guides(color = FALSE)

# Equivalently:
ggplot(data = bikes, aes(x = mnth, y = cnt, color = as.factor(mnth))) + geom_jitter() + guides(color = FALSE)

# With more precise control: 
qplot(data = bikes, x = mnth, y = cnt, color = as.factor(mnth), 
      position = position_jitter(w = 0.2, h = 0)) + guides(color = FALSE) +
  stat_smooth(aes(group = 1))
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

Notes:

  • geom_jitter() can be viewed as a replacement for geom_point() in cases where you want to produce jittered scatterplot.

  • Using the position = position_jitter(...) formulation allows you to directly specify how much jitter to allow in each direction (w = horizontal, h = vertical). Setting h = 0 forces the plotted y value to stay the same, and jitters only the x axis values.




Collinearity for categorical variables

For the purpose of this project, do not worry about collinearity between categorical variables. It’s a tricky concept to define well, and there’s no simple analogue to the pairs plot.

Note: The pairs plot should not be used for categorical variables.

If you are interested in learning more about how to diagnose collinearity in a way that applies somewhat to categorical variables, you may want to read about the Variance Inflation Factor. VIF can help determine whether a continuous variable is collinear with a categorical variable.

Note: For the purpose of the project, you are not required to assess collinearity between categorical variables.