18-785: Data, Inference and Applied Machine Learning


Instructor: Prof. Patrick McSharry (patrick@mcsharry.net)

Semester: Fall 2023

Lecture:

Kigali: 

Tue. and Thu. 

3:00 pm - 4:20 pm CAT 

Zoom Link

Pittsburgh: 

Tue. and Thu. 

8:00 am - 9:20 am ET 

Zoom Link


Recitation:

Kigali: 

Friday 

3:00 pm - 3:50 pm CAT 

Zoom Link

Pittsburgh: 

Friday 

8:00 am - 8:50 am ET 

Zoom Link


Course Overview

This course will provide the methods and skills required for utilizing data and quantitative models to automate predictive analytics and make improved decisions. From descriptive statistics to data analysis to machine learning the course will demonstrate the process of collecting, cleaning, interpreting, transforming, exploring, analyzing and modeling data with the goal of extracting information, communicating insights and supporting decision-making. The advantages and disadvantages of linear, nonlinear, parametric, nonparametric and ensemble methods will be discussed while exploring the challenges of both supervised and unsupervised learning. The importance of quantifying uncertainty, statistical hypothesis testing and communicating confidence in model results will be emphasized. The advantages of using visualization techniques to explore the data and communicate the outcomes will be highlighted throughout. Applications will include visualization, clustering, ranking, pattern recognition, anomaly detection, data mining, classification, regression, forecasting and risk analysis. Participants will obtain hands-on experience during project assignments that utilize publicly available datasets and address practical challenges.

Course Syllabus

The full course syllabus contains additional information and can be found in PDF format HERE. 


Outcomes

After completing this course, students should be able to:

Textbooks

There will be no required textbooks, though we suggest the following to help you to study (all available online):

Piazza

We will use Piazza for class discussions. Please go to the course Piazza site to join the course forum (note: you must use a cmu.edu email account to join the forum). We strongly encourage students to post on this forum rather than emailing the course staff directly (this will be more efficient for both students and staff). Students should use Piazza to: 

Academic Integrity

The course Academic Integrity Policy must be followed when doing assignments and on the message boards at all times. Details on ECE's Academic Integrity Policy can be found in the course syllabus and HERE. 


Grading

The grades for this course will be based on students’ performance on seven homework assignments, a final exam and class participation. Homework assignments will be done individually and turned in via Canvas by the designated due date. Late work will be acceptable until 24 hours past the deadline, but it will lose 10%. The assignments will be graded based on both a writing report and code used to achieve results presented in the report. Class participation will be evaluated based on student’s contribution to discussions both in-class and on the Piazza Discussion Board. When posting or reacting to online discussion threads, students are expected to use their own words and the post should be relevant to the topic under discussion. Make sure to introduce, summarize and explain the article in your own words to enlighten the audience on the point the article is making.

The following is the weight distribution of the grades:

Class participation                                5%

Kahoot Quiz                                         2.5%

Piazza Participation                              2.5%

Homework Assignment 1                      5%

Homework Assignment 2                      10%

Homework Assignment 3                      10%

Homework Assignment 4                      10%

Mid-Semester Exam (Multiple Choice)  10%

Homework Assignment 5                      10%

Homework Assignment 6                      12.5%

Homework Assignment 7                      12.5%

Kaggle Project                                       Bonus 2.5%

Final Exam (Multiple Choice)                10%


Teaching Assistants Contact Info

Name 

Email

Office Hours

Zoom Meeting ID

Christian Iradukunda

Christianiradukunda2@gmail.com

Tuesday 6:00PM - 8:00PM CAT

https://cmu.zoom.us/my/christian.iradukunda

Isaac Coffie 

coffie@andrew.cmu.edu 

Wednesday 6:00PM - 8:00PM CAT

https://cmu.zoom.us/my/pages.coffie

Tanya Akumu

takumu@andrew.cmu.edu

Thursday 6:00PM - 8:00PM CAT

https://cmu.zoom.us/my/tanya.akumu

Eliphaz Niyodusenga

eniyodus@andrew.cmu.edu

Friday 5:00PM - 7:00PM CAT

https://cmu.zoom.us/my/eliphaz

Akem Aristide Tanyi-Jong

aaristid@alumni.cmu.edu

Saturday 7:00PM - 9:00PM CAT

https://cmu.zoom.us/my/akem.aristide

Innocent Mukoki

imukoki@andrew.cmu.edu

Sunday 8:00PM - 10:00PM CAT

https://cmu.zoom.us/my/imukoki


Schedule (Subject to Change)

Date

Topics

Slides

Assignment

Tues, Sep 01

Measurement

[Week 1A]

Assignment 1 released

Thurs, Sep 03

Data Collection

[Week 1]


Fri, Sep 04

Recitation



Mon, Sep 07



Assignment 1 due

Tues, Sep 08

Data Manipulation

[Week 2A]

Assignment 2 released

Thurs, Sep 10

Data Exploration

[Week 2]


Fri, Sep 11

Recitation



Tues, Sep 15

Descriptive Statistics

[Week 3A]


Thurs, Sep 17

Distributions

[Week 3]


Fri, Sep 18

Recitation



Mon, Sep 21



Assignement 2 due

Tues, Sep 22

Statistical Hypotheses

[Week 4A]

Assignement 3 released

Thurs, Sep 24

Quantifying Confidence

[Week 4]


Fri, Sep 25

Recitation



Tues, Sep 29

Trends

[Week 5A]


Thurs, Oct 01

Decision Making

[Week 5]


Fri, Oct 02

Recitation



Mon, Oct 05



Assignent 3 due

Tues, Oct 06

Forecasting

[Week 6A]

Assignent 4 released

Thurs, Oct 08

Model evaluation

[Week 6]


Fri, Oct 09

Recitation



Mon, Oct 19



Assignment 4 due

Thurs, Oct 22

Mid-Semester Exam



Tues, Oct 27

Statistical Learning A

[Week 7A]

Assignent 5 released

Thurs, Oct 29

Statistical Learning B

[Week 7]


Fri, Oct 30

Recitation



Tues, Nov 03

Linear Models A

[Week 8A]


Thurs, Nov 05

Linear Models B

[Week 8]


Fri, Nov 06

Recitation



Mon, Nov 09



Assignment 5 due

Tues, Nov 10

Nonlinear Models A

[Week 9A]

Assignment 6 released

Thurs, Nov 12

Nonlinear Models B

[Week 9]


Fri, Nov 13

Recitation



Tues, Nov 17

Supervised Learning A

[Week 10A]


Thurs, Nov 19

Supervised Learning B

[Week 10]


Fri, Nov 20

Recitation



Mon, Nov 23



Assignment 6 due

Tues, Nov 24

Unsupervised Learning A

[Week 11A]

Assignment 7 released

Thurs, Nov 26

Unsupervised Learning B

[Week 11]


Fri, Nov 27

Recitation



Tues, Dec 01

Ensemble Approaches A

[Week 12A]


Thurs, Dec 03

Ensemble Approaches B

[Week 12]


Fri, Dec 04

Recitation



Mon, Dec 07



Assignment 7 due 

Kaggle competition due

Thurs, Dec 17

Final Exam