Image Classification of Rock-Paper-Scissors Pictures using Convolutional Neural Network (CNN)
This project is my submission for Dicoding Academy — Machine Learning for Beginner Class. The project will talk about the use of Convolutional Neural Network for Image Classification of rock-paper-scissors images. To build the model, we will use Python with help of Tensorflow and Keras packages. Please refer here for the full notebook. Note that some of the packages might not work in Jupyter Notebook because I used Colab Notebook (e.g.: from google.colab import files).
Convolutional Neural Network (CNN ) is a deep learning algorithm that was designed for computer vision, such as image and video data.
Dataset rock-paper-scissors
This is a glimpse of the structure of a directory file named “rockpaperscissors.zip” that we will use in this project. You can found the dataset here on kaggle.
Import Packages/Library
The packages that we will use are TensorFlow 2 and Keras. In Tensorflow 2, Keras package is integrated with Tensorflow, which means once you have imported Tensorflow, you can use Keras.
As for ImageDataGenerator, we will use that for image pre-processing in the image augmentation stage.
Download and Extract Dataset
Use wget
command to download the dataset from the link.
To be able to use the dataset, we need to extract them first using zipfile
library.
Check Directory
We use listdir
from os
library to read the content of “rockpaperscissors” directory.
The same code goes for “rockpaperscissors/rps-cv-images” directory. This is the directory that will be used for the base_dir.
Data Pre-processing with Image Augmentation
Image augmentation is a part of the data pre-processing phase. Image augmentation is used to generate new versions of training images from the existing dataset. Some examples of image augmentations are flip images horizontally or vertically, rotate, zoom, and many more. With Keras, we can use the ImageDataGenerator
function to create image augmentation.
In this project, we will use 6 parameters in ImageDataGenerator as you can see in 3–8 lines below:
rescale
= 1./255 : rescale the pixel value 1./255 from 0–255 range to 0–1 range.rotation_range
= 20 : rotate the image in range 0–20 degree.horizontal_flip
= True : rotate the image horizontally.shear_range
= 0.2 : shear angle in counter-clockwise direction in 0.2 degree range.fill_mode
= ‘wrap’ : filled outside boundary points with wrap (abcdabcd|abcd|abcdabcd) mode.validation_split
= 0.4 : split images by 40% of total dataset for the validation step.
Image Data Generator also helps us to automatically label the picture based on their directory so we don’t have to input each of the pictures and its labels.
For more explanation of each parameter, you can read the documentation here.
Prepare the Data for the Model
Before moving to the data modeling step, we will take the current path to a target directory and generates batches of augmented data. We use subset
parameter because we have set the percentage of data for the training and validation phase in the previous step (validation_split
parameter).
- base_dir : path to the target directory (base_dir).
target_size
= (100, 150) : all images found will be resized to 100 pixel x 150 pixel.class_mode
= ‘categorical’ : determines the type of label arrays that are returned, we use ‘categorical’ because we have 3 classes of image.subset
: subset of data ‘training’ or ‘validation’ ifvalidation_split
is set inImageDataGenerator
.
Build CNN Architecture
Since we are going to build a model to classify pictures, we are going to create a two-dimensional (2D) CNN architecture. Basically, a 2D CNN is formed by an input layer, convolutional layer, max-pooling layer, a layer used to flatten the 2D array, and finally output layer. In this project, we will build a model using 4 layers of convolutional and max-pooling.
The Convolutional layer is used to extract the attributes of the picture while the max-pooling layer will help to reduce the size of each picture from the convolutional process so that the speed of training will be much faster.
The first layer is an input layer with a shape of 100 x 150 RGB array of pictures represented by input_shape
= (100, 150, 3). In the same line of code (second line), we have the first 2D convolutional layer with 32 nodes, 3 x 3 filter, and ReLU (Rectified Linear Unit) activation function. The use of ReLU is to increases the non-linearity of the model since the pictures are naturally non-linear. You can read more here about the function of ReLU activation function in the convolutional layer.
In the next line of code, we have a 2D max-pooling layer with a size of 2 x 2. Max-pooling works by choosing a pixel with maximum value and will result in a new picture of the same size as the max-pooling layer (2 x 2).
After the picture being processed in a max-pooling layer, the pictures then will process again to the next convolutional layer then max-pooling layer, and so on. After the last max-pooling layer, the array of pictures will be flattened to a 1-dimensional array and processed again in the hidden layer.
After that, the array of pictures that are already in 1-dimensional form is moved to the output layer and processed using activation function again. This time, instead of ReLU, we will use softmax function. Softmax activation function is used when the case is a multi-class classification. Since we have 3 classes, the amount of output nodes will be 3.
Compile the Model: Determine the Loss Function and Optimizer
After we finished building the architecture, we then move to compile the pre-built model and specify the loss function, optimizer, and evaluation metrics. Since this project is about a multi-class classification case, we will use the categorical cross-entropy loss function. As for the optimizer, we will use adam optimizer as this adaptive optimizer works well in most cases. Finally, to monitor the model performance we will evaluate them using accuracy metrics.
Use Callback for Early Stopping
To fasten the speed of training, we will use the callback function for early stopping when the model has reached the accuracy threshold. Early stopping is useful to reduce the model’s tendency to be overfitted. As for this project, we will state the accuracy threshold is 98%.
Train the Model
So we finally up to the final stage of building an Image Classifier. In this stage, we are going to train the pre-built model in a total of 20 epochs using the training dataset that has already been prepared in train_generator
and evaluates the model using the validation dataset that prepared in validation_generator
.
In this training process, we will divide the training samples into 25 batches and validation samples into 5 batches. We also state the verbose value as 2 to make the training process available in the output cell. Then, since we implement an early stopping method, the training process will stop when it achieves a minimum of 98% accuracy.
Predict an Image to Check the Model
Yeay, finally the image classifier model is ready to be used!
To be able to classify new data, we build a script to upload new pictures then load them into the model by model.predict
function.
When predicting new data, there might be some incorrect predictions. This is due to the training dataset we used previously. In the training dataset, all of the pictures are using a green-colored background, so if we don’t use similar background the model might predict the result incorrectly.
The example above does not seem to show the error, but after some trial and error process, I discover that most of the time, the model will miss-classify a picture that does not use green background.
Hello reader,
Thank you for reading this post up to this part, really appreciate it!
Also, if you think this post is useful, please don’t hesitate to leave your trace in the comment section or claps, it would mean a lot to me! :)
Anyway, you can also visit here to see a tutorial of CNN by TensorFlow official site.