| 
                        
                        Week
 | 
                        
                        Lectures and Readings
 | 
                          
                              Out/ Due
 
 | 
                    
                 
                 
                 
                 
                 
                 
                 
                 
                 
                     | 
   | Review
                             
Please take some of these Python mini-quizzes before the course and take this Python mini-course if you need to refresh on your Python knowledge. |  | 
                 
                 
                 
                 
                    
                      |     Week 1   | 
                         Lecture 1: Introduction
                        
                        	Big Data applications	Technologies for handling big data	Apache Hadoop and Spark overview |  | 
                    
                     | 
                      | 
 
 
 Week 2 | Lecture 2: Hadoop Fundamentals
                        
                    	Hadoop architecture HDFS and the MapReduce paradigm	Hadoop ecosystem: Mahout, Pig, Hive, HBase, Spark   | 
 HW0 out
 | 
     
     
     
     
     
     
         | 
 
 | Lecture 3: Introduction to Apache Spark
                 
                 
                 	Big data and hardware trends 	History of Apache Spark	Spark's Resilient Distributed Datasets (RDDs)	Transformations and actions | 
 HW1 out
 | 
     
     
         |     Week 3 | 
             
              Lecture 4: Machine Learning Overview 
                 
                 	Basic machine learning concepts	Steps of typical supervised learning pipelines	Linear algebra review	Computational complexity / Big O notation review | 
     
     
     
     
   
    |     
 
 Week 4 | 
        
         Lecture 5: Linear Regression and Distributed ML Principles
            
            	Linear regression
                
                formulation and closed-form solution gradient descent  grid search	Distributed machine learning principles 
                computation, storage, and communication  | HW1 due    HW2 out 
 
 
 
 |  | 
    | 
 
 
 
 Week 5 | Lecture 6: Logistic Regression and Click-through Rate Prediction
            
           	Online advertising	Linear classification	Logistic regression
               	working with probabilistic predictions	categorical data and one-hot-encodingfeature hashing for dimensionality reduction | HW2 due    HW3 out 
 
 | 
   
    |       Week 6 | 
        
         Lecture 7: Principal Component Analysis and Neuroimaging
            
            
            	Exploratory data analysis	Principal Component Analysis (PCA)	Formulations and solution	Distributed PCA | HW3 due    HW4 out 
 |  | 
    | 
   Week 7 | Lecture 8:  Big Data ML with MLlib
                      
                      k-means Clustering	Decision Trees and Random Forests	Recommenders | HW4 due   HW5 out | 
    | 
 
 
 | Lecture 9: Introduction to SparkSQL
            
            	Working with tables in Spark	Higher-level declarative programming |  | 
    |     Bonus Lecture | 
        
         Lecture 10:  Analyzing Networks with GraphX
            
            
             	Understanding network structure
            	Computing graph statistics | HW5 due | 
    | 
 See here
             | 
        
            Final Exam 
             |