Lectures, time and location:
Recitations for all three sections: Fridays 3pm4:20pm HBH A301
Instructor: George Chen (georgechen [at symbol] cmu.edu)
Teaching assistants: Emaad Manzoor (emaad [at symbol] cmu.edu), Yucheng Huang (huangyucheng [at symbol] cmu.edu)
Office hours (starting second week of class):
Contact: Please use Piazza (follow the link to it within Canvas) and, whenever possible, post so that everyone can see (if you have a question, chances are other people can benefit from the answer as well!).
Companies, governments, and other organizations now collect massive amounts of data such as text, images, audio, and video. How do we turn this heterogeneous mess of data into actionable insights? A common problem is that we often do not know what structure underlies the data ahead of time, hence the data often being referred to as "unstructured". This course takes a practical approach to unstructured data analysis via a twostep approach:
We will be coding lots of Python and working with Amazon Web Services (AWS) for cloud computing (including using GPU's).
Prerequisite: If you are a Heinz student, then you must have either (1) passed the Heinz Python exemption exam, or (2) taken 95888 "DataFocused Python" or 16791 "Applied Data Science". If you are not a Heinz student and would like to take the course, please contact the instructor and clearly state what Python courses you have taken/what Python experience you have.
Helpful but not required: Math at the level of calculus and linear algebra may help you appreciate some of the material more
Grading: Homework 20%, midmini quiz 35%, final exam 45%. If you do better on the final exam than the midmini quiz, then your final exam score clobbers your midmini quiz score (thus, the quiz does not count for you, and instead your final exam counts for 80% of your grade).
Syllabus: [pdf]
ðŸ”¥ Previous version of course (including lecture slides and demos): 95865 Fall 2018 mini 2 ðŸ”¥
Date  Topic  Supplemental Material 

Part I. Exploratory data analysis  
MonTue Jan 1415 Reminder: Sections A3 and B3 meet Mondays and Wednesdays; Section C3 meets Tuesdays and Thursdays 
Lecture 1: Course overview, basic text processing, and frequency analysis
HW1 released (check Canvas)! 

WedThur Jan 1617 
Lecture 2: Basic text analysis demo, cooccurrence analysis


Fri Jan 18 
Recitation 1: Basic Python review


MonTue Jan 2122 
No class due to MLK Jr. Day (even though Tuesday is not a holiday, to keep the three sections synchronized, there will be no class on Tuesday for Section C3) 

WedThur Jan 2324 
Lecture 3: Finding possibly related entities, PCA, Isomap 
Causality additional reading:
PCA additional reading (technical): 
Fri Jan 25 
Recitation 2: TBA HW1 due 11:59pm, HW2 released 

MonTue Jan 2829 
Lecture 4: tSNE 
Python examples for dimensionality reduction:
Additional dimensionality reduction reading (technical):

WedThur Jan 3031 
Lecture 5: Introduction to clustering, kmeans, Gaussian mixture models 
Additional clustering reading (technical): 
Fri Feb 1 
Recitation 3: tSNE 

MonTue Feb 45 
Lecture 6: Clustering and clustering interpretation demo, automatic selection of k with CH index 
Additional clustering reading (technical):
Python cluster evaluation: 
WedThur Feb 67 
Lecture 7: Hierarchical clustering, topic modeling 
Additional reading (technical): 
Fri Feb 8 
Recitation 4: Quiz review session HW2 due 11:59pm, HW3 released 

Part 2. Predictive data analysis  
MonTue Feb 1112 
Lecture 8: Introduction to predictive analytics, nearest neighbors, evaluating prediction methods, decision trees 

WedThur Feb 1314 
Lecture 9: Support vector machines, decision boundaries, ROC curves 

Fri Feb 15  Midmini quiz (same time/place as recitation); in case of space issues, we do have an overflow room booked (HBH 1002)  
MonTue Feb 1819 
Lecture 10: Introduction to neural nets and deep learning
Mike Jordan's Medium article on where AI is at (April 2018): 
Video introduction on neural nets:
Additional reading: 
WedThur Feb 2021 
Lecture 11: Image analysis with CNNs (also called convnets) 
CNN reading:

Fri Feb 22  Recitation 5: Final exam review 

MonTue Feb 2526 
Lecture 12: Time series analysis with RNNs, roughly how learning a deep net works (gradient descent and variants) 
LSTM reading:
Videos on learning neural nets (warning: the loss function used is not the same as what we are using in 95865):
Recent heuristics/theory on gradient descent variants for deep nets (technical): 
WedThur Feb 2728 
Lecture 13: Interpreting what a deep net is learning, other deep learning topics, wrapup
Gary Marcus's Medium article on limitations of deep learning and his heated debate with Yann LeCun (December 2018): 
Some interesting reads (technical): 
Fri Mar 1 
Final exam (same time/place as recitation); in case of space issues, we do have an overflow room booked (HBH 1002) HW3 due 11:59pm 

Mini3 final exam week Mar 47 
No class 