skip to page content SBU
95-869 Big Data and Large Scale Computing
Fall 2017

Home
Syllabus
Assignments
Notes

Tentative Syllabus


Date

Lectures and Readings

Out
/ Due


 

Review

Please take this Python mini-quiz before the course and take this Python mini-course if you need to learn Python or refresh your Python knowledge.

   

10/24

 

Lecture 1: Introduction

  • Big Data applications
  • Technologies for handling big data
  • Apache Hadoop and Spark overview



10/26

10/31

Lecture 2: Hadoop Fundamentals

  • Hadoop architecture
  • HDFS and the MapReduce paradigm
  • Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark




HW0 out


10/31

11/2

Lecture 3: Introduction to Apache Spark

  • Big data and hardware trends
  • History of Apache Spark
  • Spark's Resilient Distributed Datasets (RDDs)
  • Transformations and actions


HW1 out

   

11/7

Lecture 4: Machine Learning Overview

  • Basic machine learning concepts
  • Steps of typical supervised learning pipelines
  • Linear algebra review
  • Computational complexity / Big O notation review


HW1 due    HW2 out



   

11/9


11/14

Lecture 5: Linear Regression and Distributed ML Principles

  • Linear regression
    • formulation and closed-form solution
    • gradient descent
    • grid search
  • Distributed machine learning principles
    • computation, storage, and communication


11/16


11/21

Lecture 6: Logistic Regression and Click-through Rate Prediction

  • Online advertising
  • Linear classification
  • Logistic regression
    • working with probabilistic predictions
    • categorical data and one-hot-encoding
    • feature hashing for dimensionality reduction
HW2 due    HW3 out




11/23

No class: Thanksgiving

HW3 due    HW4 out

   

11/21

11/28

Lecture 7: Principal Component Analysis and Neuroimaging

  • Exploratory data analysis
  • Principal Component Analysis (PCA)
  • Formulations and solution
  • Distributed PCA

 

11/30

 

Lecture 8: Big Data ML with MLlib

  • k-means Clustering
  • Decision Trees and Random Forests
  • Recommenders
HW4 due (Dec 3)    HW5 out (Dec 3)



12/5

Lecture 9: Introduction to SparkSQL

  • Working with tables in Spark
  • Higher-level declarative programming

   

12/7

Lecture 10: Analyzing Networks with GraphX

  • Understanding network structure
  • Computing graph statistics
HW5 due (Dec 10)  

12/12 6:00PM

Final Exam