skip to page content SBU
Carnegie Mellon University
95-828 Machine Learning for Problem Solving
Spring 2018

Home
Syllabus
Assignments
Notes

Assignments

ASSIGNMENTS ARE DUE AT THE BEGINNING OF LECTURE ON THE DUE DATE


COURSEWORK:

Coursework consist of (grading in parentheses):
  • Homework (40%)
  • Midterm exam (15%)
  • Final exam (25%)
  • Project (20%)

NOTE: All assignments (except projects) are to be done individually. Please see the Collaboration policy.

IMPORTANT DATES:

Assignment Note Out Due Weight
Homework 1
EDA, Linear Models, Naive Bayes
Jan 30
Feb 20
10%
Homework 2
Logistic Regression, Decision Trees, Model Selection and Evaluation
Feb 20
Mar 12
10%
Homework 3
SVM, Kernels, Instance-based learning, Ensembles
Mar 12
Apr 12
10%
Homework 4
Clustering, Semi-supervised learning, Topic modeling, Collective classification
Apr 12
May 3
10%
Midterm Exam
(in class)
Mar 8
--
15%
Final Exam

May 8
8:30-11:30AM
--
25%
Project proposal
List of [datasets] [more project ideas]
--
Feb 22
1%
Midway report

--
Apr 10
5%
Project presentation
(in class)
--
May 1 & 3
5%
Project final writeup

--
May 1 & 3
9%

HOMEWORK:

Homework should be turned in at the beginning of the class on the day it is due. If you are taking late day(s), please upload all your homework (code and .pdf) on Canvas to mark the submission time and ALSO submit a hard copy next time you are in class (or ask a friend to return your hard copy in class). Note down the number of late days you used on top of the first page of your hard copy.

We ask that you submit all your code that was used to complete the assignment electronically only via Canvas (no print outs, unless explicitly stated otherwise in the HW question).


EXAMS:

There will be a midterm and a final exam. Note: Both the midterm and the final will be open book, notes, papers, etc., but you are not allowed to use a computer. The tentative dates are posted above, the finalized dates will be announced during the semester.


PROJECTS:

Your class project is an opportunity for you to explore an interesting machine learning problem of your choice in the context of a real-world data set.
NOTE: Your class project must be about new things you have done this semester; you cannot use results you have developed in previous semesters.

Projects can be done by you as an individual, or in teams of 2 or 3 students (we recommend the latter). Each course TA will be assigned to a subset of the project teams (to be announced once we know the list of teams) and will consult with you on your ideas. Of course, the final responsibility to define and execute an interesting piece of work is yours.

Your project will be worth 20% of your final class grade, broken into four main deliverables:

  • Project proposal (1% of the course grade)
  • Project milestone report (5% of the course grade) (** 4 pages maximum **, including references) describing the results of your first experiments by the milestone due date (see above). Note that, as with any conference, the page limits are strict. Reports over the limit will not be graded.
  • Final project writeup (9% of the course grade)  preferably in ACM format (** 8 pages maximum, 4 pages minimum **, including references; page limit is strict)
  • Final project presentation (last week in-class) (5% of the course grade)
Remark: You will get the most out of the project if you interact with the TA assigned to your team as well as me during the development of your ideas. Talk to us especially before choosing your research problem and dataset. And please feel free to come talk to us about your ideas as often as you would like.

Project Proposal:

You must turn in a brief project proposal (** 1 page maximum **) on the due date (see above), in class. A list of suggested projects and data sets are posted at the links in the assignments table above.

Project proposal format: Proposals should be 1 page maximum. This should include the following information as clearly as possible, so that we can give you feedback.
  • Names and Andrew IDs of team members on top of the page. Maximum team size is 3 students. Each team should submit only 1 proposal.
  • Project title
  • Problem: What is the exact problem? What are the use scenarios? How exactly would solving this problem add (business) value? etc.
  • Data set: What is a data instance? How do you plan to acquire and pre-process/setup the data?
  • Project idea (approximately 2 paragraphs): What precisely is the machine learning problem? Is it supervised or unsupervised? How will you approach it?
  • Papers to read. Include 2-3 relevant papers. You will probably want to read at least one of them before submitting your proposal.
  • What will you complete by the project milestone due date? Preliminary experimental results are expected by the midway.

Project Writeups:


Your write-ups should include the information detailed below, in approximately the order given. Your write-up need not have corresponding sections or bullet points, but course staff should be able to find the information without searching too hard. Be as precise/specific as you can.

The Midway Report will be a relatively incomplete version of the final write up. It should include similar sections and address similar questions, but need not contain all the details. Think of the mid-way report as a preliminary version of the final draft. It is a status report, including preliminary results, issues that you are facing in developing your project, and how you plan to modify your approach to tackle some of those issues moving forward.
  • Introduction/Motivation/Problem Definition (15%)
    • What is it that you are trying to solve/achieve? Who cares and why does it matter?
    • Identify, define, and motivate the problem that you are addressing.
    • How (precisely) will a machine learning solution address the problem?

  • Data Understanding and Preparation (15%)
    • Identify and describe the data (and data sources) that will support machine learning to address the problem.
    • Include various aspects of the data such as its size (GB/TB/etc), type(s), format, etc.
    • Specify how these data are integrated to produce the format required for machine learning.

  • Methodology (30%)
    This is where you give a detailed description of your primary contributions. It is especially important that this part be clear and well written so that we can fully understand what you did.
    • How did you approach the problem? What challenges did you face? In what (unique) ways did you handle those challenges?
    • Specify the type of model(s) built and/or information/knowledge extracted.
    • Discuss choices for machine learning algorithm: what are other alternatives, and what are their pros and cons (in the context of the problem and as compared to your proposed solution)?
    • Discuss why and how this model should "solve" the problem (i.e., improve along some dimension of interest).

  • Evaluation and Results (30%)
    We are interested in seeing a clear and conclusive set of experiments which successfully evaluate the problem you set out to solve. Make sure to interpret the results and talk about what we can conclude and learn from your approach.
    • How do you evaluate your machine learning solution to the specific question(s) you have addressed?
    • What do these evaulation methods tell you about your solution?
    It is not so important how well your method performs but rather, (a) how thorough and careful your evaluation is, and (b) how interesting and clever your results and findings are.

  • Style and writing (10%)
    Overall writing, grammar, organization, figures and illustrations.
You are suggested to use the ACM conference format (2 columns, single-spaced, see example) to write your project reports. Reports should be 8 pages maximum, 4 pages minimum, including references and appendix; this page limit is strict.
Use external sources where appropriate, and provide clear citations and bibliography. All team members should contribute to the analysis and write-up.

Project Presentations:

In the last 2 lectures you will present to the class the results of your research. Depending on the number of project teams, I will give you a time limit for your presentation beforehand (typically 5-8 minutes), and your presentation will be expected to remain within the time limit. Please keep in mind that this is a very important skill to master: if a VC or a corporate board member tells you they will give you 5 minutes to present your idea or proposal, you present your proposal in 5 minutes -- not 7 or 10. Going over the time by more than one minute will be reflected negatively in your grade, but I will warn you when you are getting close.
  • Think of this as an oral version of your final project writeup.
  • Present your work in a meaningful and interesting flow (eg, motivation, problem definition, data description, challenges, proposed methods, results and their interpretation).
  • Make sure to include enough details and background of your methodology (similar to a conference talk).
  • See here and here for some how-to on giving a good/bad talk.

Datasets for Project:

We provide a long list of potential data sources for your project right here. The project is open-ended and you are expected to come up with your own project description and problem definition. In addition to your technical approach, we will evaluate your creativity in formulating an interesting and important problem for the project.



Last modified by Leman Akoglu, Dec 2017