Class time and location:
Instructor: George Chen (georgechen [at symbol] cmu.edu)
Teaching assistants: Emaad Manzoor (emaad [at symbol] cmu.edu), Mallory Nobles (mnobles [at symbol] andrew.cmu.edu)
Office hours:
Contact: Please use Piazza (follow the link to it within Canvas) and, whenever possible, post so that everyone can see (if you have a question, chances are other people can benefit from the answer as well!).
Companies, governments, and other organizations now collect massive amounts of data such as text, images, audio, and video. How do we turn this heterogeneous mess of data into actionable insights? A common problem is that we often do not know what structure underlies the data ahead of time, hence the data often being referred to as "unstructured". This course takes a practical approach to unstructured data analysis via a two-step approach:
We will be coding lots of Python and working with Amazon Web Services (AWS) for cloud computing (including using GPU's).
Prerequisite: Python coding experience
Helpful but not required: Math at the level of calculus and linear algebra may help you appreciate some of the material more
Grading: Homework 20%, mid-mini quiz 35%, final exam 45%. If you do better on the final exam than the mid-mini quiz, then your final exam score clobbers your mid-mini quiz score (thus, the quiz does not count for you, and instead your final exam counts for 80% of your grade).
Syllabus: [pdf]
Warning: As this course is still relatively new, the lecture slides are a bit rough and may contain bugs. To provide feedback/bug reports, please directly contact the instructor, George (georgechen [at symbol] cmu.edu). The Spring 2018 mini-3 course website is available here.
Date | Topic | Supplemental Material |
---|---|---|
Part I. Exploratory data analysis | ||
Mon Mar 19 |
Course overview, basic text processing and frequency analysis HW0 released! (Check Canvas) |
Some Python resources: |
Wed Mar 21 |
Basic text analysis demo, co-occurrence analysis |
|
Fri Mar 23 |
HW0 due 11:59pm, HW1 released! |
|
Mon Mar 26 |
Wrap up co-occurrence analysis, scatter plots, correlation, causation,
visualizing high-dimensional data: PCA |
Causality additional reading:
PCA additional reading: |
Wed Mar 28 |
Manifold learning (isomap, t-SNE) |
Python examples for dimensionality reduction:
Additional dimensionality reduction reading: |
Mon Apr 2 |
Introduction to clustering, k-means, Gaussian mixture models (GMMs) HW1 due 10:30am, HW2 released! |
Additional clustering reading: |
Wed Apr 4 |
DP-GMMs, DP-means, CH index, hierarchical clustering |
Python cluster evaluation:
Additional reading: |
Mon Apr 9 |
Clustering (wrap-up), topic modeling, intro to predictive data analysis |
Additional reading: |
Tue Apr 10 |
Just for this week, George's office hours are Tuesday 5pm-7pm, HBH 2216 (and not on Wednesday!) |
|
Wed Apr 11 | Mid-mini quiz | |
Part 2. Predictive data analysis | ||
Mon Apr 16 |
Introduction to predictive analytics, some classics of classification: nearest neighbors, evaluating
prediction methods, naive Bayes |
|
Wed Apr 18 |
Support vector machines, decision trees and forests HW3 released |
|
Mon Apr 23 |
Intro to neural nets and deep learning
Mike Jordan's Medium article (from just a few days ago!) on where AI is currently at: HW2 due 10:30am |
Video introduction on neural nets:
Additional reading: |
Wed Apr 25 |
Image analysis with CNNs (also called convnets) |
Additional reading: |
Mon Apr 30 |
Time series analysis with RNNs, roughly how learning a deep net works (gradient descent and variants) |
LSTM reading:
Videos on learning neural nets: |
Wed May 2 |
Interpreting what a deep net is learning, other deep learning topics, wrap-up HW3 due 10:30am |
|
Tue May 8 |
Final exam 1pm, HBH 1002 |