Project Due: April 8, 2020 at 11:59pm
Late Policy: Up to two slip days can be used for the final submission.
Please submit all required documents to CMS.
This is a partner project. You can either work alone, or work ONLY with your chosen partner. Failure to adhere to this rule (by e.g. copying code) may result in an Academic Integrity Violation.
Overview: In this project, you will be testing some of the methods we discussed in class for accelerating stochastic gradient descent on the same MNIST task as in Programming Assignment 1. These methods are all used in practice for machine learning systems at even the largest scales, and the goal of this assignment is to give you some experience working with them so that you can build your intuition for how they work.
Background: In the last programming assignment, we looked at minibatched SGD with sequential sampling (Algorithm 4 from that project):
As usual, denotes the number of epochs, or passes through the dataset, used in training, and the total number of iterations here will be , where we assume that divides evenly. When using stochastic gradients in this programming assignment, we are going to base our algorithms off of this one, by using for all our stochastic training algorithms both (1) minibatching and (2) sequential scan through the data.
Instructions: This project is split into three parts: the implementation and evaluation of momentum on top of gradient descent, the implementation and evaluation of momentum with stochastic gradients, and the exploration of Adam (an algorithm that uses adaptive learning rates).
Part 1: Momentum with gradient descent.
Part 2: Momentum with SGD.
Part 3: ADAM.
What to submit:
Setup:
pip3 install -r requirements.txt
to install the required python packages