---
# title: "70207 - Probability and Statistics for Business Applications"
author: "Professor Anh Nguyen"
date: "Spring 2020, last updated `r format(Sys.time(), '%d %B, %Y')`"
output:
bookdown::gitbook:
# toc: false
css: "../note_style.css"
split_by: "chapter"
toc_depth: 2
number_sections: false
---
# Syllabus
#### Instructor
Professor Anh Nguyen
Office: Tepper 5209
anhnguyen@cmu.edu
Office hours: Monday 11:30am-1pm or by appointment
#### Course website
- Canvas course: https://canvas.cmu.edu/courses/13151/
- Piazza discussion page: http://piazza.com/cmu/spring2020/70207/home
**All questions related to class materials should be posted on Piazza**. Please use the following link to sign up http://piazza.com/cmu/spring2020/70207. The TAs will post responses to questions on Piazza daily (except on weekends and holidays) at 5pm. Questions posted after 5pm will be answered the following business day.
#### Teaching Assistants
- Yijin Kim, yijink@andrew.cmu.edu
Office Hours: Friday 2-3:30PM, Location: Tepper 3805
- Majid Mahzoon, mmahzoon@andrew.cmu.edu
Office Hours: Tuesday 2-3:30PM, Location: Tepper 3805
#### Lectures
- Lectures 1: Mondays and Wednesdays, 9:30 - 10:20am, Tepper 2612.
- Lectures 2: Mondays and Wednesdays, 10:30 - 11:20 am, Tepper 2612.
#### Recitations
Please bring your laptop to recitations
- Recitations A & B: Friday, 9:30 - 10:20 am, Tepper 2612.
- Recitations C & D: Friday, 10:30 - 11:20 am, Tepper 2702.
#### Class overview
This class is a part of the core statistics sequence for undergraduate business majors, preceding 70-208. This course discussed univariate probability concepts to understand the behavior of a single random variable. Throughout this course, we will look at how the assessment of univariate variables is applied in business contexts.
#### Learning objectives
By the end of the course, you will be able to do the following
1. Determine how to model real-world stochastic variables.
1. Determine the accuracy of average concepts.
1. Test hypotheses involving single variables.
1. Identify particular features of data in business settings.
#### Textbook
**Optional**: “Statistics for Business: Decision Making and Analysis”, 3rd edition by Robert Stone and Dean Foster. The book is available at the CMU bookstore.
#### Class topics and readings:
1. Data (Chapter 2, 3, 4)
- Categorical Data
- Numerical Data
- Time Series
3. Probability and Random Variables (Chapter 7, 9, 11, 12)
- Probability Models for Counts
- Normal Probability Model, Student's t-distribution
- Probit and Logit Model
4. Inference (Chapter 13, 14, 15, 16, 17, and 18)
- Samples
- Confidence Intervals
- Statistical Tests
- Comparing Sample Means
- Chi-Squared Tests
#### Classroom etiquette
You are expected to attend all lectures and recitations on time and to stay for the entire class period. You are expected to participate in lectures and recitations, both by answering questions and by asking your own questions.
#### Lecture and recitation materials
Lecture notes will be posted on Canvas prior to the start of each class. These are just outlines, and not substitutes for attending each class. You are expected to attend class and take notes.
In recitations, the TA will bring a question sheet to each class. The TA will work through these problems with you during class. This sheet will be posted on Canvas, along with any data we work with during recitation. The answers will be posted after class.
#### Grading
The grading is as follows:
- Homeworks 9\%
- Quizzes 6\%
- 3 Midterms 45\% (9%, 18%, 18%)
- Final exam 30\%
- Participation 10\%
Participation grades are based off practice problems we will do in lecture as well as recitation attendance. Periodically, Professor Nguyen will go over practice problems to class. I will give you a few minutes to solve the problem and submit the answers on canvas, and then will go over the answers. Your answers to these questions will not be graded, but these exercises will give you an opportunity to learn if you are keeping up with the material. You will receive full credits only if you make an attempt to answer the questions. Half of your participation grade will be determined based on these attendance quizzes (where I will excuse one attendance quiz over the course of the semester). The TAs will record attendance at each recitation. The second half of your participation grade will be based on recitation attendance (where we again will excuse one absence).
#### Homeworks
There are 3 homeworks in this course. You may work in groups to solve them. If you do work in a group, you must still write your answers by yourself, and write the names of the people that you worked with on your assignment. It is not acceptable to simply copy a classmates answer as part of group work – you may discuss the problem together, but write your answer independently. Copied homeworks will be treated as academic integrity violations. Problem sets are aimed to help you learn the material, and therefore it is important that you think through each problem and understand them to help prepare you for the exams. Homeworks will periodically include case studies to give you an opportunity to work with data sets on your own. We will discuss the case studies in class on the day they are due. Homeworks will be posted on Canvas on Wednesdays, and due on the following Wednesday. The homeworks must be turned in during lecture. Paper copies must be turned in, and no late assignments will be accepted. Solutions will be posted on Canvas after the homeworks are turned in. TAs will return the homeworks in the recitation section after the problem sets are graded.
#### Quizzes
There will be 6 quizzes. The quizzes can be found on Canvas and are to be submitted via Canvas by every Thursday at 5PM. The solutions to the quizzes will be discussed in the recitations. Quizzes submitted late will not be graded under any circumstances.
#### Exam Dates
The exam dates are as follows:
- Midterm 1: February 5
- Midterm 2: March 4
- Midterm 3: April 8
- Final: TBA
Midterms take place in class. The date of the final exam will be set by the university later in the semester. The only way to change the date of your final exam is if you have 3 exams in a 24 hour period. We will not allow you to take the exam at a different time for any other reason.
If you cannot make an exam, contact me to justify your absence. Excused absences include CMU activities, serious illness, and family emergencies. For medical emergencies, we need documentation of your illness from a doctor. If you have an excused absence from an exam, we will place the extra weight on your final exam grade. There are no make-up exams.
If you think there was a mistake in the grading of your exam, you need to turn in a written request for a regrade. This must be done within 1 week of when the exams are returned in class. You need to explain the source of the error in the grading. I will review your request and the exam paper and determine if a mistake was made. When reviewing the request, the entire assessment will be reviewed.
#### Accommodations for Students with Disabilities
If you have a disability and are registered with the Office of Disability Resources, I encourage you to use their online system to notify me of your accommodations and discuss your needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.
#### Statement of Support for Students Health and Well-being
Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.
# Schedule
*Note: subject to change.*
Last updated: `r format(Sys.time(), '%d %B, %Y')`
- January 13: Introduction; Display data, data types.
- January 15: Categorical data.
- January 17: Recitation 1
- January 20: *Martin Luther King day; no classes.*
- January 22: Numerical data (Quiz 1)
- January 24: Recitation 2
- January 27: Numerical data
- January 29: Time series (Homework 1)
- January 31: Recitation 3
- February 3: Probability, Independence
- February 5: **Midterm 1**
- February 7: Recitation 4
- February 10: Random Variables
- February 12: Association between Random Variables (Quiz 2)
- February 14: Recitation 5
- February 17: Probability Models for Counts
- February 19: Normal Distribution
- February 21: Recitation 6
- February 24: Normal Distribution
- February 26: Other distributions (Homework 2)
- February 28: Recitation 7
- March 2: Review
- March 4: **Midterm 2**
- March 6: *No recitations*
- March 9: *Spring Break; no classes.*
- March 11: *Spring Break; no classes.*
- March 13: *Spring Break; no recitations.*
- March 16: Samples
- March 18: Samples (Quiz 3)
- March 21: Recitation 8
- March 23: Sample Properties, Sampling Variation
- March 25: Confidence Intervals (Quiz 4)
- March 27: Recitation 9
- March 30: Confidence Intervals
- April 1: Hypothesis testing (Homework 3)
- April 3: Recitation 10
- April 6: Review
- April 8: **Midterm 3**
- April 10: Recitation 11
- April 13: Hypothesis testing
- April 15: Comparing samples (Quiz 5)
- April 17: *Spring Carnival; no recitation.*
- April 20: Comparing samples
- April 22: Comparing samples (Quiz 6)
- April 24: Recitation 12
- April 27: Chi-Squared Test
- April 29: Final Exam Review
# R guide^[Much of this material was taken from Professor Dennis Epple's statistics course. All errors are mine.]
If you are already familiar with R, you can skip this guide.
## Install R
Download and install R. Even if you have previously installed R, it is a good idea for you to re-install to be sure your R is up to date. During installation, you may be asked to affiliate with a site (a CRAN Mirror). It doesn’t matter which you choose. I scrolled down to USA and chose the CMU stat department.
https://www.r-project.org/
## Updating R Studio
If you don’t already have R Studio, go to the next step. If you already have R Studio, update as follows. Launch R Studio. On the main ribbon, click Help/Check for Updates. If there are updates, follow the update instructions.
## Installing R Studio
If you have not previously installed R Studio, download and install the free desktop version of R Studio.
https://rstudio.com/products/rstudio/download/
You will probably want to print this tutorial so that you have it to refer to as you work your way through the R script that you will load below.
## Some basics of R
You can type commands one-at-a-time by the blue arrow. For example, suppose you want to know the square root of 7. Click by the blue arrow in your R studio screen and type: 7^.5 Hit Enter to see the answer.
You can enter commands as described above, but it is typically more convenient to collect a set of commands in an R file. You can then save the file and avoid retyping commands that you wish to use again in the future. We will use the term “script” to refer to a file that contains R commands. The script for this tutorial is named: R_Script_Tutorial_1.R
For R to recognize a file as a set of commands, the file must end, as above, with .R
Choose a folder on your computer to contain your R materials. This will be your “working directory”
### Set Your Working Directory
To set a working directory, you can either type the following command
```{r, eval=FALSE}
setwd(YOURDIRECTORY)
```
where $YOURDIRECTORY$ is the directory to a folder you would like to set as your working folder. The directory must be inside a quotation mark. For example, on my computer, I choose my desktop as the working directory:
```{r, eval=FALSE}
setwd('/Users/anhnguye/Desktop')
```
Alternatively, click the following sequence on the ribbon in upper left window: Session/Set working Directory/Choose Directory
Then, navigate to the directory containing your R script and click Select Folder.
### Install Packages
R code is shared and organized using “packages”. Many R procedures require a combination of several commands.
For this course, I would like you to install a package called $ggplot2$ that will helps us produce charts and graphs.
The command is the following
```{r, eval=FALSE}
install.packages('ggplot2')
```
These packages will take a while to download and install. Much text will fly by in the Console as this is proceeding. Some of the text will be in red font, but that’s OK when packages are downloading. You may be asked whether you want to update some packages you already have. If so, click yes. You will only need to execute these three commands when you want to install updates. You will do updating infrequently, e.g., once or twice per year.
### Load Packages
Packages need to be loaded each time you launch R. Run the following command to load $ggplot2$.
```{r, eval=FALSE}
library('ggplot2')
```
Alternatively, you can also run
```{r, eval=FALSE}
require('ggplot2')
```
**Note**: If you are writing a script, the package only needs to be loaded once (usually by placing the command at the beginning of the script).
### Sample R script
Save the following commands into a text file in your working directory, name it *sample_script.R*. Also, save the following data file into your working directory folder.
Link: TBA
```{r, eval=FALSE}
setwd('yourworkingdir')
require('ggplot2')
chocolate_data = read.csv('data_chocolate.csv')
ggplot(chocolate_data, aes(x=Company, y=Market.Share)) +
geom_bar(stat='identity')
```
The output looks like this:
```{r, eval=TRUE, echo=FALSE}
setwd('../Data_upload/')
require('ggplot2')
chocolate_data = read.csv('data_chocolate.csv')
ggplot(chocolate_data, aes(x=Company, y=Market.Share)) +
geom_bar(stat='identity')
```
Note that R understands that if a command doesn't look like it's completed, R will go to the next line.
For example, BOTH of the following commands work:
```{r, eval=FALSE}
ggplot(chocolate_data, aes(x=Company, y=Market.Share)) +
geom_bar(stat='identity')
ggplot(chocolate_data, aes(x=Company, y=Market.Share)) + geom_bar(stat='identity')
```
But **NOT** this
```{r, eval=FALSE}
ggplot(chocolate_data, aes(x=Company, y=Market.Share))
+ geom_bar(stat='identity')
```
To run all commands in an Rscript, we use the function `load`
```{r, eval=FALSE}
load('sample_script.R')
```