Artificial Intelligence: Training a Convolutional Neural Network (CNN) for Computer Vision with Transfer Learning

Click this button to return to main.

This work is a snippet reconstructed from an academic coursework 95-891 Introduction to Artificial Intelligence at Carnegie Mellon University.

<aside> ⚠️

This work has NOT been built from the ground up. The partially prepared ‘skeleton code’ was provided by the Teaching Team of 95-891.

</aside>

import matplotlib.pyplot as plt
# import intel_npu_acceleration_library
from collections import OrderedDict

from copy import deepcopy
import torch
from torch import nn
from torch import optim
from torchvision import datasets, transforms, models

from sklearn.metrics import confusion_matrix
import seaborn as sns
import numpy as np

# download the data and unzip it. You should have three sets of data
data_dir = 'flowers'
train_dir = data_dir + '/train'
valid_dir = data_dir + '/valid'
test_dir = data_dir + '/test'

from PIL import Image  #show one example of image

Image.open("flowers/train/1/image_06734.jpg")

# Process with some data transformation, do not change
data_transforms = {
    'training': transforms.Compose([transforms.RandomResizedCrop(224),
                                    transforms.RandomHorizontalFlip(),
                                    transforms.RandomRotation(30),
                                    transforms.ToTensor(),
                                    transforms.Normalize([0.485, 0.456, 0.406],
                                                         [0.229, 0.224,
                                                          0.225])]),

    'validation': transforms.Compose([transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize(
                                          [0.485, 0.456, 0.406],
                                          [0.229, 0.224, 0.225])]),

    'testing': transforms.Compose([transforms.Resize(256),
                                   transforms.CenterCrop(224),
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406],
                                                        [0.229, 0.224,
                                                         0.225])])
}
# This decides how many images to process per training/validation
batch_size = 256

image_datasets = {
    'training': datasets.ImageFolder(train_dir,
                                     transform=data_transforms['training']),
    'testing': datasets.ImageFolder(test_dir,
                                    transform=data_transforms['testing']),
    'validation': datasets.ImageFolder(valid_dir,
                                       transform=data_transforms['validation'])
}

dataloaders = {
    'training': torch.utils.data.DataLoader(image_datasets['training'],
                                            batch_size=batch_size,
                                            shuffle=True),
    'testing': torch.utils.data.DataLoader(image_datasets['testing'],
                                           batch_size=batch_size,
                                           shuffle=False),
    'validation': torch.utils.data.DataLoader(image_datasets['validation'],
                                              batch_size=batch_size,
                                              shuffle=True)
}

# get the length of each dataloaders; wit batch size=256, you should have 26 batches of training samples, each with 256 images
training_len = len(dataloaders['training'])
test_len = len(dataloaders['testing'])
validation_len = len(dataloaders['validation'])
class_to_idx = image_datasets['training'].class_to_idx

print('We have ', training_len, 'batches of training images;', 'each with',
      batch_size, 'images')
print('We have ', validation_len, 'batches of validation images;', 'each with',
      batch_size, 'images')

# Let us preview size of each batch
print('Single batch', next(iter(dataloaders['training']))[0].shape)

We have  26 batches of training images; each with 256 images
We have  4 batches of validation images; each with 256 images
Single batch torch.Size([256, 3, 224, 224])

256: Batch size is 256
3: We're using 3 channels, meaning it has RGB values assigned to its pixels.
224: Height of image 224 vertical pixels
224: Width of input image 224 horizontal pixels

Flip and rotation of images can make the data 'appear' bigger and diverse, making the model learn better in relevant features. This can make the model don't rely on specific visual patterns in the training set. Our model will be more generalizable and resillient to slight appearance variation.
If we apply transformation for test or validation sets, the above effect can be cancelled out: The model can be evaluated higher than what actually is. We need to make sure the model sees the flower, not the variations.

I use an Alexnet-like convolutional neural network as the pretrained model.

https://bouzouitina-hamdi.medium.com/alexnet-imagenet-classification-with-deep-convolutional-neural-networks-d0210289746b