skip to page content SBU
95-869 Big Data and Large Scale Computing
Spring 2022


Tentative Syllabus


Lectures and Readings

/ Due



Please take some of these Python mini-quizzes before the course and take this Python mini-course if you need to refresh on your Python knowledge.


Week 1


Lecture 1: Introduction

  • Big Data applications
  • Technologies for handling big data
  • Apache Hadoop and Spark overview

Week 2

Lecture 2: Hadoop Fundamentals

  • Hadoop architecture
  • HDFS and the MapReduce paradigm
  • Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark

HW0 out

Lecture 3: Introduction to Apache Spark

  • Big data and hardware trends
  • History of Apache Spark
  • Spark's Resilient Distributed Datasets (RDDs)
  • Transformations and actions

HW1 out


Week 3

Lecture 4: Machine Learning Overview

  • Basic machine learning concepts
  • Steps of typical supervised learning pipelines
  • Linear algebra review
  • Computational complexity / Big O notation review


Week 4

Lecture 5: Linear Regression and Distributed ML Principles

  • Linear regression
    • formulation and closed-form solution
    • gradient descent
    • grid search
  • Distributed machine learning principles
    • computation, storage, and communication
HW1 due    HW2 out

Week 5

Lecture 6: Logistic Regression and Click-through Rate Prediction

  • Online advertising
  • Linear classification
  • Logistic regression
    • working with probabilistic predictions
    • categorical data and one-hot-encoding
    • feature hashing for dimensionality reduction
HW2 due    HW3 out



Week 6

Lecture 7: Principal Component Analysis and Neuroimaging

  • Exploratory data analysis
  • Principal Component Analysis (PCA)
  • Formulations and solution
  • Distributed PCA
HW3 due    HW4 out


Week 7

Lecture 8: Big Data ML with MLlib

  • k-means Clustering
  • Decision Trees and Random Forests
  • Recommenders
HW4 due   HW5 out

Lecture 9: Introduction to SparkSQL

  • Working with tables in Spark
  • Higher-level declarative programming


Bonus Lecture

Lecture 10: Analyzing Networks with GraphX

  • Understanding network structure
  • Computing graph statistics
HW5 due   

See here

Final Exam