Date

Lectures and Readings

Out / Due

3/21
Review

(Recitation) Lecture 0: Set up
 Installation of Hadoop and Spark on your local machine
 Setting up AWS clusters
Please take this Python miniquiz before the course and take this Python minicourse if you need to learn Python or refresh your Python knowledge.


3/21

Lecture 1: Introduction
 Big Data applications
 Technologies for handling big data
 Apache Hadoop and Spark overview


3/23
3/28

Lecture 2: Hadoop Fundamentals
 Hadoop architecture
 HDFS and the MapReduce paradigm
 Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark

HW1 out 
3/28
3/30

Lecture 3: Introduction to Apache Spark
 Big data and hardware trends
 History of Apache Spark
 Spark's Resilient Distributed Datasets (RDDs)
 Transformations and actions


4/4

Lecture 4: Machine Learning Overview
 Basic machine learning concepts
 Steps of typical supervised learning pipelines
 Linear algebra review
 Computational complexity / Big O notation review
 
4/6
4/11

Lecture 5: Linear Regression and Distributed ML Principles
 Linear regression
 formulation and closedform solution
 gradient descent
 grid search
 Distributed machine learning principles
 computation, storage, and communication
 HW1 due HW2 out

4/13
4/18

Lecture 6: Logistic Regression and Clickthrough Rate Prediction
 Online advertising
 Linear classification
 Logistic regression
 working with probabilistic predictions
 categorical data and onehotencoding
 feature hashing for dimensionality reduction

HW2 due HW3 out 
4/20

No classes; Spring Carnival


4/18
4/25

Lecture 7: Principal Component Analysis and Neuroimaging
 Exploratory data analysis
 Principal Component Analysis (PCA)
 Formulations and solution
 Distributed PCA


4/27

Lecture 8: Big Data ML with MLlib
 kmeans Clustering
 Decision Trees and Random Forests
 Recommenders

HW3 due HW4 out 
5/2

Lecture 9: Introduction to SparkSQL
 Working with tables in Spark
 Higherlevel declerative programming


5/4

Lecture 10: Analyzing Networks with GraphX
 Understanding network structure
 Computing graph statistics

HW4 due Project out 
TBD

Final Exam
