Class time and location:
Instructor: George Chen (contact info on website)
Teaching assistants: Castiel Huang (Adelaide), Emaad Manzoor, Rashmi Raghunandan, Runshan Fu, Yoonjung Kim
Office hours:
Contact: Please use Piazza (follow the link to it within Canvas) and, whenever possible, post so that everyone can see (if you have a question, chances are other people can benefit from the answer as well!). If you have a question specific to you and that others cannot benefit from, you can reach out to all of the course staff (TA's + instructor) by emailing:
uda-course-f17 [at symbol here!] lists.andrew.cmu.edu
Companies, governments, and other organizations now collect massive amounts of data such as text, images, audio, and video. How do we turn this heterogeneous mess of data into actionable insights? A common problem is that we often do not know what structure underlies the data ahead of time, hence the data often being referred to as "unstructured". This course takes a practical approach to unstructured data analysis via a two-step approach:
We will be coding lots of Python and working with Amazon AWS for cloud computing (including using GPU's).
Prerequisite: Python coding experience
Helpful but not required: Math at the level of calculus and linear algebra may help you appreciate some of the material more
Grading: Equal weights on HW1, HW2, HW3, final exam
Warning: As this is the first offering of this course, the slides are a bit rough and may contain bugs. To provide feedback/bug reports, please directly contact the instructor, George (email info is on his homepage).
Date | Topic | Supplemental Material |
---|---|---|
Part 1. Exploratory data analysis | ||
Tuesday Oct 24, 2017 |
Course introduction, basic text processing, frequency analysis |
Python review by Emaad:
Some Python resources: |
Thursday Oct 26, 2017 |
Finding possibly related features:
co-occurrence analysis, scatter plots, correlation, causation HW1 released! (Check Canvas) |
Causality additional reading: |
Tuesday Oct 31, 2017 |
Visualizing high-dimensional vectors: PCA, introduction to manifold learning, Isomap, t-SNE |
Python examples for dimensionality reduction:
Additional reading: |
Thursday Nov 2, 2017 |
Clustering: introduction, k-means, Gaussian mixture models |
Additional reading: |
Tuesday Nov 7, 2017 |
Automatically choosing the number of clusters: DP-GMM's, DP-means, CH index (see also gap statistic) HW1 due at 4:30pm, HW2 released! |
Python cluster evaluation:
Additional reading: |
Thursday Nov 9, 2017 |
Hierarchical clustering, topic modeling |
Additional reading: [see Section 14.3.12 "Hierarchical Clustering" of the book "Elements of Statistical Learning"] [David Blei's general intro to topic modeling] |
Part 2. Predictive data analysis | ||
Tuesday Nov 14, 2017 | ||
Thursday Nov 16, 2017 | No class — optional AWS tutorial instead at the same time as class (led by Yoonjung) | |
Tuesday Nov 21, 2017 |
Adaptive nearest neighbor methods: decision trees and their use
in ensembles (such as in random forests, AdaBoost, gradient tree
boosting), and why they're nearest neighbor methods HW2 due at 4:30pm 11:59pm Wed Nov 22, HW3 released! |
Python code example:
Additional reading: |
Thursday Nov 23, 2017 | Thanksgiving: no class | |
Tuesday Nov 28, 2017 |
Introduction to deep learning |
Video introduction on neural nets:
Additional reading: |
Thursday Nov 30, 2017 |
Deep learning for analyzing images and time series |
Additional reading: |
Tuesday Dec 5, 2017 |
Wrap-up of deep learning and of 95-865 HW3 due at 4:30pm |
Videos on learning neural nets: |
Thursday Dec 7, 2017 | Review session | |
Friday Dec 15, 2017 | Final exam: 1pm-4pm, HBH 1202 |
Date | Topic | Supplemental Material |
---|---|---|
Part 1. Exploratory data analysis | ||
Friday Oct 27, 2017 |
Course introduction, basic text processing, frequency analysis
Finding possibly related features:
co-occurrence analysis, scatter plots, correlation, causation HW1 released! (Check Canvas) |
Python review by Emaad:
Some Python resources:
Causality additional reading: |
Friday Nov 3, 2017 |
Visualizing high-dimensional vectors: PCA, introduction to manifold learning, Isomap, t-SNE
Clustering: introduction, k-means, Gaussian mixture models |
Python examples for dimensionality reduction:
Additional reading on dimensionality reduction:
Additional reading on clustering: |
Wednesday Nov 8, 2017 |
HW1 due at 8am, HW2 released! |
|
Friday Nov 10, 2017 |
Automatically choosing the number of clusters: DP-GMM's, DP-means, CH index (see also gap statistic)
Hierarchical clustering, topic modeling |
Python cluster evaluation:
Additional reading: |
Part 2. Predictive data analysis | ||
Week of Friday Nov 17, 2017 | ||
Wednesday Nov 22, 2017 |
HW2 due at 8am 3:29pm Thu Nov 23, HW3 released! |
|
Week of Friday Nov 24, 2017 |
Adaptive nearest neighbor methods: decision trees and their use
in ensembles (such as in random forests, AdaBoost, gradient tree
boosting), and why they're nearest neighbor methods
Introduction to deep learning |
Python code example that includes adaptive nearest neighbor methods (not deep learning):
Additional reading:
Video introduction on neural nets: |
Friday Dec 1, 2017 |
Deep learning for analyzing images and time series, wrap-up of deep learning and 95-865 |
Additional reading:
Videos on learning neural nets: |
Wednesday Dec 6, 2017 |
HW3 due at 8am |
|
Friday Dec 8, 2017 | Final exam: 9am-12pm, classroom 1 |