This course introduces students to R, a widely used statistical programming language. Students will learn to manipulate data objects, produce graphics, analyse data using common statistical methods, and generate reproducible statistical reports. They will also gain experience in applying these acquired skills in various policy areas.
By the end of the class, students learn to:
Instructor: Jeremy C. Weiss email@example.com, where yyy=jeremyweiss
Office: HBH 2101F
Office Hours: Jeremy C. Weiss, Thursdays 1pm, virtual (see Canvas for link)
Teaching Assistants (zzz), append @andrew.cmu.edu to zzz:
This Website: http://www.andrew.cmu.edu/~jweiss2/21f_r/
All course materials will be posted on this site.
Homework submission: Assignments to be submitted via Canvas.
Prerequisites: Students must be enrolled in a graduate program in Heinz College. Special permission can be granted by the College.
All of the course materials on this page are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
While there are no required textbooks for this class. However, I highly recommend the freely available R for Data Science by Garrett Grolemund and Hadley Wickham.
There are many resources online that may help you to learn R. A few that are particularly relevant for this course are listed below.
Your grade in this course will be determined by a series of 5 weekly homework assignments (40%), lab submission (10%), quizzes (10%) and a final project (40%).
Weekly assignments will take the form of a single R Markdown text file: namely, code snippets integrated with captions and other narrative. Except where otherwise noted, assignments are typically due on Wednesdays at 1:30pm ET on the dates indicated on Canvas.
Your assignment score for the course will be calculated by averaging your four (4) highest homework scores. That is, your lowest homework score will not count toward your grade.
Each homework assignment will have 5 problems, each of which may have several parts. Your score for each assignment will be assigned according to the scheme outlined in the rubric below.
Total: 10 points
Correctness : Each homework will have 5 problems, which will often have multiple parts. Each of the 5 problems will be worth 2 points. Deductions will be made at the discretion of the grader.
Style: Coding style is very important. With the exception of Homework 1, you will receive a deduction of up to 1 point if you do not adhere to good coding style.
No deduction if your homework is submitted with:
-0.5 if coding style is acceptable, but fails on a couple of the criteria above.
-1 if coding style is overall poor and fails to adhere to many of the above criteria.
The Lab session is scheduled for Fridays. Lab attendance is encouraged, but is not mandated due to the challenges this would present for students in remote timezones. During the lab sessions, students will get hands-on practice with the week’s material by working on assigned lab activities. Members of the teaching staff will be available over Zoom to introduce the activities and to answer any questions you may have. Tasks may include but are not limited to: running or modifying code from the lecture, pair coding, or completing short coding exercises. During weeks where Friday sessions are cancelled due to holidays, you are still required to submit the labs in order for them to count toward your “participation” score.
All thirteen (13) scheduled lectures will have an associated lab component. Your Lab participation score for the course will be calculated based on the number of labs that you submit, as indicated in the table below.
There will be 3-4 short quizzes scheduled during the later weeks of class. Dates and times will be announced in advance. The purpose of these quizzes is to assess your understanding of various concepts that are central to the class. Your score on the quizzes will count for 10% of your final grade.
The final project for the class will ask you to explore a broad policy question using a large publicly available dataset. This project is intended to provide students with the complete experience of going from a study question and a rich data set to a full statistical report. Students will be expected to (a) explore the data to identify important variables; (b) perform statistical analyses to address the policy question; (c) produce tabular and graphical summaries to support their findings; and (d) write a report describing their methodological approach, findings, and limitations thereof.
Regardless of grading basis, students must receive a score of at least 50% on the final project in order to pass the class.
Your final course grade will be calculated according to the following breakdown.
Homework is to be submitted by 1:30pm ET on Wednesdays on the due date indicated, unless an alternate due date is announced. Late homework will not be accepted for credit.
Note that your lowest homework score will not count toward your grade, so you can miss one homework without it counting toward your course grade.
You are encouraged to discuss homework problems with your fellow students. However, the work you submit must be your own. You must acknowledge in your submission any help received on your assignments. That is, you must include a comment in your homework submission that clearly states the name of the student, book, or online reference from which you received assistance.
Submissions that fail to properly acknowledge help from other students or non-class sources will receive no credit. Copied work will receive no credit. Any and all violations will be reported to Heinz College administration.
All student are expected to comply with the CMU policy on academic integrity. This policy can be found online at http://www.cmu.edu/academic-integrity/.
The course collaboration policy allows you to discuss the problems with other students, but requires that you complete the work on your own. Every line of text and line of code that you submit must be written by you personally. You may not refer to another student’s code, or a “common set of code” while writing your own code. You may, of course, copy/modify lines of code that you saw in lecture or lab.
The following discussion of code copying is taken from the Computer Science and Engineering Department at the University of Washington. You may find this discussion helpful in understanding the bounds of the collaboration policy.
“[It is] important to make sure that the assistance you receive consists of general advice that does not cross the boundary into using code or answers written by someone else. It is fine to discuss ideas and strategies, but you should be careful to write your programs on your own.”
“You must not share actual program code with other students. In particular, you should not ask anyone to give you a copy of their code or, conversely, give your code to another student who asks you for it; nor should you post your solutions on the web, in public repositories, or any other publicly accessible place. [You may not work out a full communal solution on a whiteboard/blackboard/paper and then transcribe the communal code for your submission.] Similarly, you should not discuss your algorithmic strategies to such an extent that you and your collaborators end up turning in [essentially] the same code. Discuss ideas together, but do the coding on your own.”
“Modifying code or other artifacts does not make it your own. In many cases, students take deliberate measures – rewriting comments, changing variable names, and so forth – to disguise the fact that their work is copied from someone else. It is still not your work. Despite such cosmetic changes, similarities between student solutions are easy to detect. Programming style is highly idiosyncratic, and the chance that two submissions would be the same except for changes of the sort made easy by a text editor is vanishingly small. In addition to solutions from previous years or from other students, you may come across helpful code on the Internet or from other sources outside the class. Modifying it does not make it yours.”
“[I] allow exceptions in certain obvious instances. For example, you might be assigned to work with a project team. In that case, developing a solution as a team is expected. The instructor might also give you starter code, or permit use of local libraries. Anything which the instructor explicitly gives you doesn’t normally need to be cited. Likewise, help you receive from course staff doesn’t need to be cited.” If you have any questions about any of the course policies, please don’t hesitate to ask. You may post your questions on Piazza or ask me directly.
Computing: The statistical computing package we will use in this course is R, which is available on many campus computers. You may download your own copy from http://www.r-project.org. We require that you use R Markdown to complete your assignments, which is enabled very nicely with RStudio.
Communication: Assignments and class information will be posted on Canvas and the class website.
Email: The Piazza forum should be used for general course-related questions that may be of interest to others in the class. For other types of questions (e.g., to report illness, request various permissions) please contact Dr. Weiss via email. Please include the course code 94842 in the subject line of your email.
Disability Services: If you have a disability and need special accommodations in this class, please contact the instructor. You may also want to contact the Disability Resources office at 412-268-2013.
Note 1: Links will go live as the course progresses.
Note 2: The course schedule is subject to change.
Note 3: Recordings of classes (at least 1 section per lecture) are here: recordings
|Week 1: Introduction and basics|
Lecture 1 Introductions. Installing R on personal machines. Retrieving R packages.
Basics of R, RStudio, R Markdown.
Basic data types and operations: numbers, characters and composites.
Vectors, creating sequences, common functions.
Homework 0 assigned.
Lecture 2 Importing tabular data.
Simple summaries of categorical and continuous data.
R style basics
|Week 2: Data frames, functions, loops, if/else|
Lecture 3 More on data frames and lists.
Writing functions in R.
HW 1 due
|Week 3: Data summaries and Graphics|
|Lecture 5 notes html|
Introduction to ggplot2 graphics
Homework 3 assigned.
HW 2 due
|Week 4: Statistical tests and models||Lecture 7
Tests for 2x2 tables
Lab 8 Rmd
HW 3 due
|Week 5: Linear regression||
Tests for jxk tables
Plotting error bars
HW 4 due, Quiz Friday
|Week 6: Regression, more graphics|
Interaction terms in regression
HW 5 due Wednesday
|Week 7: Interactive graphics|
Final project introduction
Lab 13 R
Lab 13 solutions R