GR5242 HW01 Problem 3: Early stopping and basic "deep dream"
Fill in your code below ############# YOUR CODE HERE ############# and answers to reflection questions in the text box with Your Answer Here
In this exercise, you will explore some basic methods for preventing overfitting (early stopping and dropout) and explore model introspection by a basic version of Alex Mordvintsev's famous "deep dream" experiment.
Early stopping and dropout
If we train a model which has lots of parameters (like a neural network) on a relatively simple task, and if your training dataset is relatively small, you are at risk of overfitting to the training dataset. Overfitting can lead to worse performance on data that wasn't used during training, such as test datasets or real new datapoints that your model will be applied to in production.
One way to avoid overfitting is called " early stopping" : split your training dataset into two pieces, which we'll call the "training" and " validation" splits. Then, train your model on the training split until the loss on the validation split stops going down. At this point, we have some evidence that the model is starting to memorize the training set, since its performance on the validation set is not improving. This method is not foolproof, but it's easy to use and gives one answer to the question " When should I stop training?" which you would have to answer anyway.
Another way to avoid overfitting is called "dropout" : during training, neurons are randomly turned off. This makes it harder for the model to memorize specific inputs. The neural network architecture (defined for you) below will make use of this.
Deep dream
The goal of "deep dream" is to produce an image which produces strong activity in a unit in your neural network. This will help us understand what that unit is doing, since we can see what kinds of data it responds strongly to. We will perform. a very simple version of the original deep dream experiment: find the input image which maximizes the activity of a neuron in a neural network trained to classify the MNIST digits. This will allow us to get some idea of what the network thinks a 0 is, or a 4 is, et cetera.
Setup cells
In [ ]: # Imports import numpy as np import matplotlib.pyplot as plt import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms from torch.utils.data import Subset from torch.optim.lr_scheduler import StepLR from torch.utils.data import DataLoader, Dataset torch.__version__ In [ ]: # Load MNIST using torchvision transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ]) MNIST_train = datasets.MNIST('../data', train=True, download=True, transform=t MNIST_test = datasets.MNIST('../data', train=False, transform=transform) In [ ]: print('training samples:', len(MNIST_train)) print('testing samples:', len(MNIST_test)) # Access a specific data point (e.g., the 10th data point) index = 10 # Change this to the index you want to access sample_image, label = MNIST_train[index] # Display the label and other information print("MNIST raw data") print(f"Data at index {index}:") print(f"Label: {label}") print(f"Image shape: {sample_image.shape}")
Now let's show some example images
Question 1: Training and validation split
Using numpy and pytorch , randomly split the data in to a training and validation split. The training split should include 2/3 of the original data, and the validation split should include the remaining 1/3.
(Hint): you can try creating random lists of indices that go into training and validation, then use torch.utils.data.Subset() (imported as Subset ) to split the torchvision dataset.
In [ ]: ### QUESTION # Please fill in the following cells by splitting the datasets # `x_train_and_val` and `y_train_and_val`. You must assign name # results using the variable names below. # Create a train/validation split ############# YOUR CODE HERE ############### In [ ]: # Define a simple MLP network in PyTorch class MLP(nn.Module): def __init__(self): super(MLP, self).__init__() self.flatten = nn.Flatten() self.fc1 = nn.Linear(28*28, 256) self.fc2 = nn.Linear(256, 256) self.fc3 = nn.Linear(256, 256) self.fc4 = nn.Linear(256, 10) self.dropout = nn.Dropout(0.2) def forward(self, x): x = self.flatten(x) x = F.relu(self.fc1(x)) x = self.dropout(x) x = F.relu(self.fc2(x)) x = self.dropout(x) x = F.relu(self.fc3(x)) x = self.dropout(x) x = self.fc4(x) return x # Set up loss and optimization model = MLP() criterion = nn.CrossEntropyLoss() ptimizer = optim.Adam(model.parameters()) train_dataloader = DataLoader(train_data, batch_size=32, shuffle=True) test_dataloader = DataLoader(test_data, batch_size=32, shuffle=True) val_dataloader = DataLoader(val_data, batch_size=32, shuffle=True)
Question 2: Early stopping
Write a for loop which alternates between training the model on 1 pass through the training split (also known as 1 epoch of training) and checking whether we should stop early by measuring the validation loss and seeing if it is still decreasing.
Please fill in the code to perform. the validation step, including logic for early stopping. The code should have similar structure to how you might optimize on training data or evaluate on testing data.
Please also print your validation loss at each epoch.
As an extra check, we can look at the loss on the test dataset.
Question 3: Test data written answer question
Would it have been good practice to use the test dataset instead of the validation split to perform. early stopping above? Why or why not?
Your Answer Here
No, as we then cannot use the testing dataset for actual evaluation of the model. Ourempirical risk estimate would become biased due to the use of the data in our training loop, even though we did not perform. optimization steps with it.
Question 4: Basic "deep dream"
(4.a) Implement the basic "deep dream"
Our goal in this part of the problem is to find input images which maximally activate the output neuron corresponding to a particular class (for MNIST, that means corresponding to a particular digit). Let's pick the target class 0 .
We'll do this the same way we trained our neural net: start from random images, and use stochastic gradient descent on some cost function to improve the images.
Below, we have included code for randomly initializing the images, and for using the Adam optimizer to minimize a cost function. You are asked to fill in the cost function so that minimizing the cost function leads to maximizing the value of the " 0" neuron in the output layer when the neural net is given dream_images as input. Make sure to use dream_images in your definition of cost_function .
(Hint): What value of the model output might you want to optimize if you want an image that is most likely labeled 0 ? Remember to sum over the batch of data as well, and that you will be minimizing the cost. Remember as well that our optimizer is taking the dream images as parameters, not the model parameters.
(4.b) After running the cell above to look at the result, do you have any reactions to what appears? In general, what can this basic version of deep dream tell us about our model?
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。