In this project, we will be visualizing and manipulating AlexNet [1]:
For this project, we are using Caffe, an open-source deep learning library that has an efficient implementation of AlexNet. Other similar libraries include Torch, Theano, and TensorFlow.
Some parts of this assignment were adapted/inspired from a Stanford cs231n assignment. The parts that are similar have been modified heavily and ported to caffe.
The assignment is contained in an IPython Notebook; see below.
[1] Krizhevsky et al, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS 2012
There is a written part to be separately completed by each person. All submissions should be PDF and include your name/netid.
Download the written part here.
The coding part will be completed in teams of 2.
There are many pieces to the assignment, but each piece is just a few lines of code.
Unit tests: to help verify the correctness of your solutions, you can run pytest in a shell (same directory as the notebook):
Running out of memory: the VM should be able to hold exactly one AlexNet in memory by default, which is enough to complete the assignment. If you run the unit tests with the notebook open, you will either need to close the notebook server or give the VM more memory (2GB or more).
This section contains images to illustrate what kinds of qualitative results we expect.
Saliency: we expect that pixels related to the class have a higher value. Left: Input image. Right: saliency.
Fooling image
These images look nearly identical, and yet AlexNet will classify each image on the middle as "snail". If you look really closely you can notice some tiny visual differences. The right image shows the difference magnified by 5x (with 0 re-centered at gray).Class visualization
These images are classified as 100% belonging to different classes by AlexNet. If you run these for longer or adjust the hyperparameters, you may see a more salient result.
Many classes don't give very good results; here we show some of the better classes.
strawberry | throne | mushroom |
tarantula | flamingo | king penguin |
goblet | sax | llama |
cloak | moped | indigo bunting |
bulbul | squirrel monkey | cock |
Feature inversion
Note that we could probably obtain higher quality reconstructions if we ran the optimization for longer, or added a better regularizer. To keep things simple, your images only need to be mostly converged.
original | conv1 | conv2 |
![]() |
||
conv3 | conv4 | conv5 |
fc6 | fc7 | fc8 |
Last updated 28 April 2016