Lecture 1: Introduction and Basics

Prof. Alexandra Chouldechova
94-842

What are we trying to accomplish?

Here's a sample analysis.

The analysis was shown only in class and is not viewable in this version of the notes.

Agenda

  • Course overview

  • Introduction to R, RStudio and R Notebooks/R Markdown

  • Programming basics

How this class will work

  • No programming knowledge presumed

  • Some stats knowledge presumed. E.g.:

    • Hypothesis testing (t-tests, confidence intervals)
    • Linear regression
  • Class attendance is mandatory

  • Class will be very cumulative

Mechanics

  • Two 80 minute lectures a week:
    • First 60-80 minutes: concepts, methods, examples
    • Last 0-20 minutes: short labs (time permitting)
  • Class participation (10%)
  • Quizzes (10%)
  • Weekly homework (35%)
  • Final project (2.5 weeks) (45%)
    • Disclaimer: To pass the class, you must achieve a passing score on the final project (at least 23 / 45)

Mechanics

  • Class participation (10%)

    • Labs: Each lecture has an accompanying lab assignment.
    • Friday Lab sessions give you an opportunity to work on the labs
    • Course website shows how participation grade will be calculated
  • Quizzes (10%)

    • 4 quizzes in the second half of term. Dates TBA.
  • Homework assignments (35%)

    • There will be 5 weekly HW assignments
    • Single lowest HW score will be dropped
    • HW assigned on Thursdays, due Thursdays at 2:50pm
    • Late homework will not be accepted for credit
  • Final project (45%)

    • You will write a report analysing a policy question using a publicly available data set

Course resources

  • Assignments, office hours, class notes, grading policies, useful references on R: http://www.andrew.cmu.edu/~achoulde/94842/

  • Canvas for gradebook and for turning in homework

  • Piazza for forum

    • Please post class/homework related question on Piazza instead of emailing the teaching staff
  • Check the class website for everything else

  • No required textbook, but several are recommended:

    • Garrett Grolemund and Hadley Wickham, R for Data Science
    • Phil Spector, Data Manipulation with R
    • Winston Chang, The R Graphics Cookbook

Goal of this class

This class will teach you to use R to:

  • Generate graphical and tabular data summaries
  • Perform statistical analyses (e.g., hypothesis testing, regression modeling)
  • Produce reproducible statistical reports using R Markdown and R Notebooks
  • Integrate R with other tools (e.g., databases, web, etc.)

Why R?

  • Free (open-source)
  • Programming language (not point-and-click)
  • Excellent graphics
  • Offers broadest range of statistical tools
  • Easy to generate reproducible reports
  • Easy to integrate with other tools

The R Console

Basic interaction with R is through typing in the console

This is the terminal or command-line interface

The R Console

  • You type in commands, R gives back answers (or errors)

  • Menus and other graphical interfaces are extras built on top of the console

  • We will use RStudio in this class

  1. Download R: http://lib.stat.cmu.edu/R/CRAN

  2. Then download RStudio: http://www.rstudio.com/

RStudio is an IDE for R

RStudio has 4 main windows ('panes'):

  • Source
  • Console
  • Workspace/History
  • Files/Plots/Packages/Help

Console pane

  • Use the Console pane to type or paste commands to get output from R

  • To look up the help file for a function or data set, type ?function into the Console

    • E.g., try typing in ?mean
  • Use the tab key to auto-complete function and object names

Source pane

  • Use the Source pane to create and edit R and Rmd files
  • The menu bar of this pane contains handy shortcuts for sending code to the Console for evaluation

Files/Plots/Packages/Help pane

  • By default, any figures you produce in R will be displayed in the Plots tab
    • Menu bar allows you to Zoom, Export, and Navigate back to older plots
  • When you request a help file (e.g., ?mean), the documentation will appear in the Help tab

RStudio: Panes overview

  1. Source pane: create a file that you can save and run later

  2. Console pane: type or paste in commands to get output from R

  3. Workspace/History pane: see a list of variables or previous commands

  4. Files/Plots/Packages/Help pane: see plots, help pages, and other items in this window.

RStudio: Source and Console panes

RStudio: Console

RStudio: Toolbar