skip to page content SBU
95-869 Big Data and Large Scale Computing
Spring 2024

Home
Syllabus
Assignments
Notes

Tentative Syllabus


Week

Lectures and Readings

Out
/ Due


 

Review

Please take some of these Python mini-quizzes before the course and take this Python mini-course if you need to refresh on your Python knowledge.

   

Week 1

 

Lecture 1: Introduction

  • Big Data applications
  • Technologies for handling big data
  • Apache Hadoop and Spark overview




Week 2

Lecture 2: Hadoop Fundamentals

  • Hadoop architecture
  • HDFS and the MapReduce paradigm
  • Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark




HW0 out


Lecture 3: Introduction to Apache Spark

  • Big data and hardware trends
  • History of Apache Spark
  • Spark's Resilient Distributed Datasets (RDDs)
  • Transformations and actions




HW1 out

   

Week 3

Lecture 4: Machine Learning Overview

  • Basic machine learning concepts
  • Steps of typical supervised learning pipelines
  • Linear algebra review
  • Computational complexity / Big O notation review

   



Week 4

Lecture 5: Linear Regression and Distributed ML Principles

  • Linear regression
    • formulation and closed-form solution
    • gradient descent
    • grid search
  • Distributed machine learning principles
    • computation, storage, and communication
HW1 due    HW2 out







Week 5

Lecture 6: Logistic Regression and Click-through Rate Prediction

  • Online advertising
  • Linear classification
  • Logistic regression
    • working with probabilistic predictions
    • categorical data and one-hot-encoding
    • feature hashing for dimensionality reduction
HW2 due    HW3 out



   

 

Week 6

Lecture 7: Principal Component Analysis and Neuroimaging

  • Exploratory data analysis
  • Principal Component Analysis (PCA)
  • Formulations and solution
  • Distributed PCA
HW3 due    HW4 out


 

Week 7

Lecture 8: Big Data ML with MLlib

  • k-means Clustering
  • Decision Trees and Random Forests
  • Recommenders
HW4 due   HW5 out



Lecture 9: Introduction to SparkSQL

  • Working with tables in Spark
  • Higher-level declarative programming

   

Bonus Lecture

Lecture 10: Analyzing Networks with GraphX

  • Understanding network structure
  • Computing graph statistics
HW5 due   

See here

Final Exam