library(tidyverse)
## ── Attaching packages ──────────────────
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(plotly)  # for interactive graphics
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(DT)

options(scipen = 4)

We’ll illustrate some examples using a bunch of different data sets.

# You'll need to run install.packages("nycflights13") and
# install.packages("gapminder")
flights <- nycflights13::flights
# Load the data from the gapminder library
data(gapminder, package = "gapminder")

Interactive tables: datatable

Sometimes it’s helpful to output interactive summary or data tables into our reports. We can do this with the datatable function.

# Printing data
flights %>%
  group_by(carrier, origin) %>%
  summarize(`Average delay (mins)` = round(mean(dep_delay, na.rm = TRUE), 0))
## # A tibble: 35 x 3
## # Groups:   carrier [16]
##    carrier origin `Average delay (mins)`
##    <chr>   <chr>                   <dbl>
##  1 9E      EWR                         6
##  2 9E      JFK                        19
##  3 9E      LGA                         9
##  4 AA      EWR                        10
##  5 AA      JFK                        10
##  6 AA      LGA                         7
##  7 AS      EWR                         6
##  8 B6      EWR                        13
##  9 B6      JFK                        13
## 10 B6      LGA                        15
## # … with 25 more rows
# datatable
flights %>%
  group_by(carrier, origin) %>%
  summarize(`Average delay (mins)` = round(mean(dep_delay, na.rm = TRUE), 0)) %>%
  datatable(options(list(pageLength = 12)))

Interactive graphics with (gg)plotly

One of the simplest ways to get started with interactive graphics in R is to use the ggplotly function in the plotly library. It converts ggplot objects into their interactive counterparts.

Let’s create some plots with ggplot and see what happens when we make them interactive.

Bar charts

# Form a bar chart showing the number of flights from each airport
p <- ggplot(flights, aes(x = origin)) + 
  geom_bar()
p

ggplotly(p)

Box plots

Here’s a boxplot example which shows the distribution of departure delays across airports.

p <- ggplot(flights, aes(x = origin, y = dep_delay)) +
  geom_boxplot() + 
  scale_y_continuous(trans='log2')
p
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 208344 rows containing non-finite values (stat_boxplot).

ggplotly(p)
## Warning in self$trans$transform(x): NaNs produced
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 208344 rows containing non-finite values (stat_boxplot).

Note that plotly is its own graphing library. It just happens to be particularly convenient to use ggplotly, because it enables us to make interactive graphics that we already have experience constructing. Here’s an example of a ggplotly version vs a plotly version of the boxplot. I’m switching to the gapminder data because htmlwidgets are super resource intensive for large data.

p <- ggplot(gapminder, aes(continent, lifeExp, color=continent)) +
  geom_boxplot()

ggplotly(p)
plot_ly(gapminder, x = ~continent, y = ~lifeExp, color = ~continent, type = "box")

Here’s how we would do log-scaling for a plotly plot. First, a plot without log scaling on the y-axis.

plot_ly(gapminder, x = ~continent, y = ~gdpPercap, color = ~continent, type = "box")

Now a plot with logarithmic y-axis scaling, as controlled through the layout command:

plot_ly(gapminder, x = ~continent, y = ~gdpPercap, color = ~continent, type = "box") %>%
  layout(yaxis = list(type = "log"))

Dot plots

Now let’s look at an example where we calculate the average departure delay for flights out of LGA for each destination airport, and produce a plot that contains that information. In this plot the dot size represents the number of flights from LGA to that destination.

p <- flights %>% 
  filter(origin == "LGA") %>%
  group_by(dest) %>%
  summarize(av_dep_delay = mean(dep_delay, na.rm = TRUE),
            count = n()) %>%
  filter(count > 50) %>%
  mutate(dest = reorder(dest, av_dep_delay)) %>%
  ggplot(aes(x = dest, y = av_dep_delay, 
             size = count)) + 
  geom_point(alpha = 0.5) +
  scale_size_area() +
  ylab("Average departure delay") +
  xlab("Destination airport") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

p

ggplotly(p)

Scatterplots

Now here’s a scatterplot example with the diamonds data. We’ll start by subsampling the data so we don’t have so many points. The sample_n command makes it easy to sample a subset of the rows of the data.

diamonds.sub <- diamonds %>%
  sample_n(2000)
p <- ggplot(diamonds.sub, aes(x = carat, y = price, color = color)) + 
  geom_point() 
p

ggplotly(p)

The default behavior for ggplotly is to provide the values of all aesthetic mappings in the hover text It is also possible to customize what gets displayed. The most general way of doing this is to specify a text argument that contains the information you want to see. In the example below we specify text to be the caract, clarity, color and cut of the diamond. The paste command pastes together values into a single string, with values separated by the sep argument. Setting sep = "\n" leads every element to be displayed on a new line.

p <- ggplot(diamonds.sub, aes(x = carat, y = price, color = color, 
                              text = paste(carat, clarity, color, cut, sep = "\n"))) + 
  geom_point() 
p

ggplotly(p, tooltip = "text")
p <- ggplot(diamonds.sub, aes(x = carat, y = price, color = color)) + 
  geom_point(alpha = 0.5) + 
  geom_smooth()
p
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplotly(p)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Line charts

Here we’ll have a look at how home sales have varied over time. We’ll focus first on sales in Austin, TX.

p <- txhousing %>%
  filter(city == "Austin") %>%
  ggplot(aes(x = month, y = sales, group = year)) + 
  geom_line()

ggplotly(p)

ggplot and plotly make it really easy to create animations across time (or across any other variable of interest). To do this, you simply need to specify a frame variable.

p <- txhousing %>%
  filter(city == "Austin") %>%
  ggplot(aes(x = month, y = sales, frame = year)) + 
  geom_line()

ggplotly(p)

You can animate certain layers while keeping others static. It all depends on when you specify the frame variable. Here’s an example where we have all of the years in the background, with the current year highlighted in blue.

p <- txhousing %>%
  filter(city == "Austin") %>%
  ggplot(aes(x = month, y = sales)) +
  geom_line(aes(group = year), alpha = 0.2) +
  geom_line(aes(frame = year), color = "steelblue", size = 2)
## Warning: Ignoring unknown aesthetics: frame
ggplotly(p)

Let’s have a look at several cities at the same time. Note that we’re using the animation_opts() function here to change properties of the plotly animation. frame controls the amount of time between transitions (in milliseconds)

p <- txhousing %>%
  filter(city %in% c("Austin", "Dallas", "Houston", "San Antonio")) %>%
  ggplot(aes(x = month, y = sales)) +
  geom_line(aes(group = year), alpha = 0.2) +
  geom_line(aes(frame = year), color = "steelblue", size = 1) +
  facet_grid(. ~ city)
## Warning: Ignoring unknown aesthetics: frame
ggplotly(p) %>%
  animation_opts(frame = 1000)

Through the animation options you can also change how the frames transition from one to the next by setting the easing parameter. There are many options. See here.

ggplotly(p) %>%
  animation_opts(frame = 1000, easing = "elastic")

Animating the gapminder data

First we’ll look at how life expectancy changes over time across countries. We’ll start the animation in 1952, with the countries ordered by their minimum life expectancy.

p <- gapminder %>%
  mutate(country = reorder(country, lifeExp, function(.x) .x[1])) %>%
  ggplot(aes(x = country, y = lifeExp, color = continent, size = pop)) +
  geom_point(aes(frame = year)) +
  theme(axis.text.x = element_text(angle = 60, vjust = 1, hjust = 1)) 
## Warning: Ignoring unknown aesthetics: frame
ggplotly(p) %>%
  animation_opts(1000)

Here’s an animated plot that shows life expectancy and GDP evolving over time. The redraw = FALSE option means that the base plot won’t be redrawn at every transition.

p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
  geom_point(alpha = 0.1) +
  geom_point(aes(frame = year, ids = country)) +
  scale_x_continuous(trans = "log10")
## Warning: Ignoring unknown aesthetics: frame, ids
ggplotly(p) %>% 
  animation_opts(1000, redraw = FALSE)

Want to learn more?

There’s a ton more that one can do with interactive graphics (and tables!) in R.

Some of the examples used in today’s lecture were borrowed from Carson Sievert’s awesome slides. I encourage you to have a further look through those slides to see some of the other things you can do with ggplotly. Things like joint “brushing” and “filtering” are particularly useful if you’re designing interactive dashboards.

You should also have a look at htmlwidgets, which you can learn about here.