Lectures, time and location:
Recitations for Pittsburgh: Fridays 1:30pm-2:50pm Eastern Time, HBH A301
Recitations for Adelaide: Thursdays 5pm-7pm Australian Central Time, Classroom 1
Instructor: George Chen (email: georgechen ♣ cmu.edu) ‐ replace "♣" with the "at" symbol
Teaching assistants for Pittsburgh: Daniel Chen (dpchen ♣ andrew.cmu.edu), Emaad Manzoor (emaad ♣ cmu.edu)
Teaching assistants for Adelaide: Erick Rodriguez (erickger ♣ andrew.cmu.edu)
Office hours:
Contact: Please use Piazza (follow the link to it within Canvas) and, whenever possible, post so that everyone can see (if you have a question, chances are other people can benefit from the answer as well!).
Companies, governments, and other organizations now collect massive amounts of data such as text, images, audio, and video. How do we turn this heterogeneous mess of data into actionable insights? A common problem is that we often do not know what structure underlies the data ahead of time, hence the data often being referred to as "unstructured". This course takes a practical approach to unstructured data analysis via a two-step approach:
We will be coding lots of Python and working with Amazon Web Services (AWS) for cloud computing (including using GPU's).
Prerequisite: If you are a Heinz student, then you must have either (1) passed the Heinz Python exemption exam, or (2) taken 95-888 "Data-Focused Python" or 16-791 "Applied Data Science". If you are not a Heinz student and would like to take the course, please contact the instructor and clearly state what Python courses you have taken/what Python experience you have.
Helpful but not required: Math at the level of calculus and linear algebra may help you appreciate some of the material more
Grading: Homework 20%, quiz 1 40%, quiz 2 40%
Syllabus: [pdf]
🔥 Previous version of course (including lecture slides and demos): 95-865 Spring 2019 mini 3 🔥
Date | Topic | Supplemental Material |
---|---|---|
Part I. Exploratory data analysis | ||
Week 1: Oct 21-25 Reminder: Section A2 meets on Tuesdays and Thursdays, B2 meets on Mondays and Wednesdays, and K2 meets on Tuesdays (the Adelaide section gets all lectures for the week in a single ~3 hour session) |
Lecture 1: Course overview, analyzing text using frequencies
Recitation 1: Basic Python review HW1 released (check Canvas)! |
|
Week 2: Oct 28-Nov 1 |
Lecture 3: Finding possibly related entities
Recitation 2: More on PCA, bookkeeping with np.argsort HW1 due Thursday 11:59pm Eastern time |
Causality additional reading:
PCA additional reading (technical):
Python examples for dimensionality reduction:
Additional dimensionality reduction reading (technical):
|
Week 3: Nov 4-8 |
Lecture 5: t-SNE
Recitation 3: t-SNE, review session for quiz 1
HW2 released start of the week |
Additional dimensionality reduction reading (technical):
Additional clustering reading (technical): |
Week 4: Nov 11-15 |
Lecture 7: More clustering, topic modeling
HW2 due Thursday 11:59pm Eastern Time Recitation 4: Quiz 1 |
Python cluster evaluation:
Additional reading (technical): |
Part 2. Predictive data analysis | ||
Week 5: Nov 18-22 |
George is in Adelaide this week and will attempt to give Pittsburgh lectures remotely. His usual Pittsburgh office hours are cancelled (if you would like to meet via Skype, please email to schedule).
Lecture 9: Introduction to predictive analytics, model validation
Mike Jordan's Medium article on where AI is at (April 2018): Recitation 5: More classical classification models, ROC curves HW3 released start of the week |
Video introduction on neural nets:
Additional reading: |
Week 6: Nov 25-29 |
Lecture 11: Image analysis with CNNs (also called convnets)
In lecture 12 (i.e., the second half of Adelaide's week 6 lecture), I mentioned that I will post a demo on interpreting CNNs. The demo is available here:
Recitation 6 (Adelaide only; Pittsburgh has Thanksgiving break): Word embeddings as self-supervised learning, review session for quiz 2 |
CNN reading:
LSTM reading: |
Week 7: Dec 2-6 |
Lecture 12 (Pittsburgh only): same as Adelaide Lecture 12
HW3 due Thursday 11:59pm Eastern Time Recitation 7: Quiz 2 |