Lecture 14: The End

Prof. Alexandra Chouldechova

Agenda

  • What have we learned?

  • Where do you go from here?

    • Useful packages you should know about
    • Shiny demo

What have we learned?

Packages

  • base, stats
  • MASS
    • Contains a lot of simple data sets
  • ggplot2
    • Awesome graphics
  • plyr
    • Enables simple syntax for split-apply-combine operations
    • mapvalues() is from here
  • dplyr

Programming basics

  • Loops, apply/sapply/lapply alternatives
  • Functions
  • If-else statements

Tabular summaries

  • table()
  • tapply()
  • aggregate()
  • plyr functions
  • dplyr::summarise()

Graphical summaries



ggplot2

Statistics: Quantitative outcomes

  • t-tests
    • Does the mean of y differ between 2 groups?
  • \( k \)-way ANOVA (analysis of variance)
    • Does the mean of y differ across various combinations of \( k \) factors?
  • linear regression
    • (How) does the mean of y differ across various covariates?
    • Interpreting coefficients of categorical variables
    • Interpreting interaction terms
    • Using anova() to compare 2 nested models

Statistics: Binary outcomes

  • odds ratios
  • fisher test, chi-squared test
    • (2 x 2 tables) Is smoking associated with lung cancer?
    • (j x k tables) Is there an association between political party affiliation and gender?
  • logistic regression
    • how to fit it with the glm() command.

Data challenges

  • Missing values
  • Corrupted data
  • Collinearity
    • pairs() and GGally::ggpairs() plots
  • Regression diagnostics

Where do we go from here?

Data import/export

foreign - Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, …

xlsx - Read/write Excel data

RSQLite - SQLite Interface for R

RMySQL - MySQL Inferface for R

Data summarization and manipulation

tidyr - Tools for reshaping your data into “tidy” formatting

R for Data Science - New book by Garrett Grolemund and Hadley Wickham, available for free online.

  • Introduces the “tidyverse” set of R pacakges and workflows

The handy Data wrangling cheatsheet provides a quick reference to the various dplyr and tidyr functions.

Interfacing R with other languages

Rcpp - Call C++ functions from R.

RPython - Call Python functions from R.

R Notebooks make it even easier to interface with Python, C++, SQL, and bash

Visualization, interactive graphics

shiny - A web application framework for R

ggvis - Interactive web-based graphics

plotly - Make ggplots interactive

htmlwidgets - “Bring the best of JavaScript data visualization to R”

Todo

  • Course evaluations
    • I really appreciate your feedback
    • Today is the last day to submit evaluations.