Dog Breed Classification — Udacity Capstone

8 min readMar 23, 2021

For my Udacity Data Scientist Nano-degree capstone, I chose to work on the dog breed classification project.

Project Overview

In this project, I have trained Convolutional Neural Networks (CNN’s) to classify the breed of dog in an image! CNN’s are a class of deep neural networks that are incredibly powerful at analyzing and learning patterns in visual data. However, CNN’s can require large datasets and many compute hours to train. In order to reach an acceptable level of classification accuracy, with a limited amount of training data, it can be incredibly useful to apply transfer learning during training.

Transfer learning is a process where you take many of the feature extractors learned from a previously solved problem as a starting point for training your model. With this method, we can take advantage of the fact that some models have been trained for thousands of GPU hours using millions of training images. Using this method, it is possible to get higher accuracy with less training data and training time.

Project goal

My goal is to create a dog breed classifier that correctly classifies at least 75% of the dogs in the test dataset. After this test accuracy is accomplished, I will then use the dog breed classifier in the following program.

The program takes an image as input, and completes two tasks. The first task completed by the program is to determine if a dog or a human is present in the image. If neither is present, the program will ask the user to enter a new photo that is clear and contains either a dog or human. If a dog is detected in the image, the program will predict which breed of dog is in the image. If a human is detected in the image, the program will determine what dog breed the human most closely resembles.

Metrics

The classification metric that I will be using to evaluate each model is the accuracy metric. To measure the accuracy of a model, I will use the model to predict the dog breed of all images in the test set, and then calculate the fraction of images the model correctly predicted.

Although the classes of the dataset are not perfectly balanced, accuracy is a great metric to use in this case because of the number of classes. If the model were to just guess and predict that every image is the most common dog breed in the train set, the model would accurately predict less than one percent of the test images.

Strategy to solve the problem:

Import and explore the dog and human dataset
Build a face detector using OpenCV
Build a dog detector using Resnet50
Create a CNN to Classify Dog Breeds using keras, and evaluate it using accuracy achieved on the test set
Because the CNN above will not perform well on the test set, experiment with using VGG-16 and Resnet50 as starting points to train the CNN, and see if it is possible to achieve our desired accuracy

Putting it all together to build a program

After an acceptable accuracy is achieved, I will write an algorithm that takes in a file path to an image, determines if a dog or human is present, and if either is present, predict what the closest resembling dog breed of the human or dog in the image is. Finally, I will use this algorithm on some images from my computer and online.

Import and explore the data

In order to train and test our dog breed classifier, Udacity was kind enough to provide me with many labeled images of dogs. By exploring the data, we see that there were 8300 labeled dog images, where the label is one of 133 dog breeds. In order to train a bog breed classifier, I have split the data into a train, validation and test dataset.

Before training, it was important to understand the distribution of dog breed in the data. We can see below that the data is not perfectly balanced between breeds, but the lowest represented breed still has more than 25 training example images.

Data Visualization

Also before training, it was important to visualize some of the images to determine what type of environments the dogs were photographed in. If all of the dogs were photographed indoors, the model may not work so well on images that were taken outdoors.

Below we can see just two examples that give a good representation of what the overall dataset looks like. The dogs in this dataset were photographed in a wide variety of environments, which may make our model more difficult to train, but when it is trained it should work in many environments.

Data Preprocessing

When using TensorFlow as backend, keras CNN’s require a 4D array as input, with shape (number of samples, rows, columns, channels). First I created a function that loads an image and resizes it to a square image that is 224×224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor.

def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

Then, in order to speed up training time, I combined all of the 4d image tensors into a 4d tensor with all of the images in it, using the function below:

def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

Before we can use the Resnet50 model, we must do some preprocessing to our images, which I did by using keras’s preprocess_input function.

There is one more preprocessing step, which isn’t explicitly required, but is common practice. Currently our images have pixel values between 0 and 255. By dividing each pixel value by 255, each pixel will be transformed into a value between 0 and 1.

train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

Implementation

Creating a CNN to Classify Dog Breeds (from Scratch)

Using keras, it’s pretty simple to build a multilayer CNN. I do so using the following code:

model = Sequential()
model.add(Conv2D(input_shape=(224, 224, 3), filters=16, kernel_size=2, activation='relu', name='conv2d_1'))
model.add(MaxPooling2D(pool_size=(2,2), name='max_pooling2d_1'))
model.add(Conv2D(filters=32, kernel_size=2, activation='relu', name='conv2d_2'))
model.add(MaxPooling2D(pool_size=(2,2), name='max_pooling2d_2'))
model.add(Conv2D(filters=64, kernel_size=2, activation='relu', name='conv2d_3'))
model.add(GlobalAveragePooling2D(name='global_average_pooling2d_1'))
model.add(Dense(units=133, activation='relu', name='dense_1'))

After building the model, we can summarize it with the following command:

model.summary()

After compiling the above model, and fitting it using the train and validation data, I then tested the model on our test images, and saw that the model predicted the dog breed correctly for 0.96% of the test images.

In order to get a much better accuracy, we will use transfer learning, and see how much of an improvement we can get.

Refinement

Creating a CNN to Classify Dog Breeds (using Transfer Learning)

To reduce training time without sacrificing too much accuracy, it is possible to train a CNN using transfer learning. As you can see in the previous step, our model accuracy was nowhere close to acceptable. In order to try to improve the accuracy, I first tried using the VGG-16 model as a starting point. By training this model on our train and validation data, the model was able to predict the dog breed correctly on 46% of the test images. Below is the model summary:

The accuracy increase is huge improvement, but is sill pretty far off from where we want to get. Next, I experimented with using Resnet50 as a starting point, and reached an accuracy level of 80% on the testing images, which I was happy with. Below is the model summary:

Putting it into a program

I built a function that takes an image path as input, and first determines if a dog or a human is present in the image. If neither is present, the program asks the user to enter a new image. If a dog is present, the program predicts what breed of dog is in the image using the Resnet50 transfer learning model. If a human is present, the program determines what breed of dog the human most closely resembles, using the Resnet50 transfer learning model. The program outputs the image and the prediction to the screen.

from IPython.display import Image def predict_bread_dog_or_human(img_path):
    
    display(Image(img_path, width = 300))
    
    if dog_detector(img_path):
        breed = Resnet50_predict_breed(img_path).split('.')[-1]
        print(f'The predicted breed of dog in this image is {breed}')
    
    elif face_detector(img_path):
        breed = Resnet50_predict_breed(img_path).split('.')[-1]
        print(f'The dog breed with closest resemblence of the provided face is {breed}')
        
    else:
        print('Neither a dog or a human was detected. Please make sure that the image is clear, and then try again.')

Below you can see some examples of the program in action. Apparently I show some resemblance to a Silky Terrier.

Here’s one prediction the program got wrong:

Here’s one prediction the program got right:

Results

The following is the accuracy each model received on the test set.

CNN from scratch: 1%
CNN from VGG-16: 46%
CNN from Resnet50: 80.6%

Model Evaluation, validation, and justification

The CNN trained with Resnet50 achieved an acceptable accuracy of 80.6% on the test set. I believe this accuracy is acceptable because of the purpose of the application. Using the program is meant to be fun, so it should be decently accurate, but a little bit of inaccuracy can actually add to the fun!

One reason for the differences in accuracy between models is the number of parameters used in each. The CNN from scratch used 19,189 parameters, the VGG-16 model used 68,229 parameters, and the Resnet50 model used 272,517 parameters. More parameters may lead to overfitting, and learning overly specific details of the training set. However, the accuracies above were calculated on the test set with data that was not used in training. Because of this, we can see that overfitting is not an issue because of the increased test accuracy.

Reflection

Building a CNN from scratch using Keras may have simple, but it clearly was not effective, only classifying one percent of test images correctly. Thankfully transfer learning was there to help. After experimenting with two different pre-trained models as starting points, I stumbled upon the Resnet50 model, which helped me achieve an accuracy of 80.6% on the test set, which I deemed to be acceptable.

Improvements

I would like to try the following experiments to see if they can improve the accuracy even higher:

Augment the training data to create even more training examples
Try out different pre-trained models

Thanks for reading!

Link to github: https://github.com/chrismurphy5/dog-breed-classifier