FashionMNIST with PyTorch & fastAI
[Part 2] Solving FashionMNIST for Google Code-In for Julia
Task Statement :
Fashion MNIST is a good way to introduce the concept of autoenoders and for classification tasks. Write an efficient Fashion MNIST implementation using Flux and benchmark it against equivalent implementations in TensorFlow and PyTorch. A good extension might be to have it run smoothly on GPUs too. The FashionMNIST dataset can be easily obtained and unpackaged into ready-to-use Julia data types with the help of MLDatasets.jl. A working example of using Flux for classification of handwritten digits from the MNIST dataset can be found here, for students who are already familiar with basic image detection techniques and want to hit the ground running. Flux's documentation can be found here.
I am going to use a pretrained (CNN) called resnet34. (Only thing I understood after watching first three fastAI lectures that use this thing for image-classification tasks.) Hoping to understand more theory by reading this article
But honestly, I don't know the complete theory behind a CNN myself. I'm still trying to learn it from the lectures given in the Deep Learning Specialisation. I comletely know how to build simple multilayer perceptrons though and the theory behind them too. xD So I'll also try to make some of them on data-set.
Also the fastAI course followed a top-down approach to things, so yeah some concepts remain unclear but with reference to some of the image classification tasks we did in lectures 1 and 2 in the course, I was able to make this !
Julia code will be submitted seperately.
P.S: Special thanks to my mentor Kartikey Gupta for all his support and his implementation in Keras which provided me a path to write the notebook.
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# Any results you write to the current directory are saved as output.
import pandas as pd
fmnist_test = pd.read_csv("../input/fashionmnist/fashion-mnist_test.csv")
fmnist_train = pd.read_csv("../input/fashionmnist/fashion-mnist_train.csv")
%reload_ext autoreload
%autoreload 2
%matplotlib inline
#autoreload reloads modules automatically before entering the execution of code typed. It is beneficial to update matplotlib functions
# everytime a cell is run.
from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
torch.cuda.is_available()
torch.backends.cudnn.enabled
print(os.listdir('../input/'))
PATH = "../input/"
TMP_PATH = "/tmp/tmp"
MODEL_PATH = "/tmp/model/"
arch = resnet34
sz = 14
#collapse
fmnist_test = pd.read_csv("../input/fashionmnist/fashion-mnist_test.csv")
fmnist_train = pd.read_csv("../input/fashionmnist/fashion-mnist_train.csv")
#Shape of the data-sets.
print(f'fmnist_train shape : {fmnist_train.shape}') #60,000 rows and 785 columns
print(f'fmnist_test shape : {fmnist_test.shape}') #10,000 rows and 785 columns
#Seeing some of the data distribution.
fmnist_train.head(7)
- As from we can see, the first column depicts the label of the image which, from the official repository of the data-set are:
Labels
Each training and test example is assigned to one of the following labels:
Label | Description |
---|---|
0 | T-shirt/top |
1 | Trouser |
2 | Pullover |
3 | Dress |
4 | Coat |
5 | Sandal |
6 | Shirt |
7 | Sneaker |
8 | Bag |
9 | Ankle boot |
#I'll be now splitting 20% of the training data into validation data-set.
fmnist_valid = fmnist_train.sample(frac=0.2)
print(fmnist_valid.shape, '| Shape of Validation Set')
#Dropping the label's column since we would be predicting that.
fmnist_train = fmnist_train.drop(fmnist_valid.index)
print(fmnist_train.shape, '| Shape Training Set')
#Defining labels to predict
labels = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
#Getting the images as X (reshaping the images into 28x28) and labels (flattened) as y from the data-sets. (Changing the dimensions)
def split(data):
'''returns a tuple (X, y) where
X : the training inputs which is in (samples, height, width, channel) shape
y : flattened (one-D) label vector
'''
y = data['label'].values.flatten()
X = data.drop('label', axis=1).values
X = X.reshape(X.shape[0], 28, 28)
return (X,y)
X_train, y_train = split(fmnist_train)
X_valid, y_valid = split(fmnist_valid)
X_test, y_test = split(fmnist_test)
print("Training Set Shape")
print(X_train.shape,'\n',y_train.shape)
print("Validation Set Shape")
print(X_valid.shape,'\n',y_valid.shape)
print("Test Set Shape")
print(X_test.shape,'\n',y_test.shape)
Some image processing tasks
Normalising image data (learnt here)
Scaling the values of the individual pixels from 0->255 to 0->1 for reduced computational complexity.
and adding image missing colour channels (don't understand this concept, saw this in many models on the same, will try to dig deep to learn more)
X_train = X_train.astype('float64') / 255
X_valid = X_valid.astype('float64') / 255
X_test = X_test.astype('float64') / 255
X_train = np.stack((X_train,) * 3, axis=-1)
X_valid = np.stack((X_valid,) * 3, axis=-1)
X_test = np.stack((X_test,) * 3, axis=-1)
index = 42 #THE ANSWER TO LIFE, THE UNIVERSE AND EVERYTHING is a Pullover.
plt.imshow(X_train[index,], cmap='gray')
plt.title(labels[y_train[index]])
#Code inspiration from Kartikey's Keras implementation of the same
plt.figure(figsize=(10, 10))
for i in range(25):
plt.subplot(5, 5, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(X_train[i], cmap='gray')
plt.title(labels[y_train[i]])
plt.show()
data = ImageClassifierData.from_arrays(PATH, trn=(X_train,y_train), classes=[0,1,2,3,4,5,6,7,8,9],val=(X_valid, y_valid), tfms=tfms_from_model(arch, 28), test=X_test)
learn = ConvLearner.pretrained(arch, data, precompute=True, tmp_name=TMP_PATH, models_name=MODEL_PATH)
learn.fit(7e-3, 3, cycle_len=1, cycle_mult=2)
We get around a 85.5517 which is good and not inflated like the 99% percent on MNIST data-sets. From what I've scavenged from the web, the oneshot high accuracy of fast-ai library can be explained via:
- TTA involves taking a series of different versions of the original image (for example cropping different areas, or changing the zoom) and passing them through the model. The average output is then calculated for the different versions and this is given as the final output score for the image.
- Dropout combats overfitting and so would have proved crucial in winning on a relatively small dataset such at CIFAR10. Dropout is implemented automatically by fast ai when creating a learn object, though can be altered using the ps variable (not used here though)
log_predicns, _ = learn.TTA(is_test=True)
prods = np.exp(log_predicns)
prods = np.mean(prods, 0)
accuracy_np(prods, y_test)
-PseudoCodeNerd