A comparison of models will be made using the MNIST dataset, a collection of images representing handwritten digits.
More details of the datasets here.
In this report only the code concerning neural networks will be reported, not that of all the other statistical models, as they are not a topic of interest. In the conclusion you will find, however, the comparison of the results.
Run the code below if you wish to set in seed and obtain reproducible results.
Data will be downloaded automatically with the following commands, which are available thanks to the keras
library.
Normalize the input data, rescaling them between 0 and 1.
Transform the target variable into a categorical variable (using one-hot encoding).
Specify the neural network’s architecture:
model <- keras_model_sequential() %>%
layer_dense(units = 256, activation = "relu", input_shape = c(28 * 28)) %>%
layer_dense(units = 128, activation = "relu", input_shape = c(28 * 28)) %>%
layer_dense(units = 64, activation = "relu", input_shape = c(28 * 28)) %>%
layer_dense(units = 10, activation = "softmax")
model
## Model
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## dense (Dense) (None, 256) 200960
## ________________________________________________________________________________
## dense_1 (Dense) (None, 128) 32896
## ________________________________________________________________________________
## dense_2 (Dense) (None, 64) 8256
## ________________________________________________________________________________
## dense_3 (Dense) (None, 10) 650
## ================================================================================
## Total params: 242,762
## Trainable params: 242,762
## Non-trainable params: 0
## ________________________________________________________________________________
Compile the model:
adam
, with the learning rate equals to \(0.001\);model %>% compile(
optimizer = optimizer_adam(lr = 0.001),
loss = "categorical_crossentropy",
metrics = c("accuracy")
)
Train the neural network.
history <- model %>% fit(
x = array_reshape(train_images, c(60000, 28 * 28)),
y = train_labels,
epochs = 10,
batch_size = 32,
validation_split = 0.2,
verbose = 1
)
The plot, representing the loss function and the accuracy in relation to the number of epochs, is shown below.
Evaluating the model on the test set.
results <- model %>% evaluate(
x = array_reshape(test_images, c(10000, 28 * 28)),
y = test_labels,
verbose = 0
)
print(paste("Loss on test data:", results["loss"]))
## [1] "Loss on test data: 0.0877719819545746"
## [1] "Accuracy on test data: 0.979099988937378"
Specify the neural network’s architecture:
conv
->pool
-> conv
->pool
-> conv
;model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu", input_shape = c(28, 28, 1)) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_flatten() %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 10, activation = "softmax")
model
## Model
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param #
## ================================================================================
## conv2d (Conv2D) (None, 26, 26, 32) 320
## ________________________________________________________________________________
## max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
## ________________________________________________________________________________
## conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
## ________________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 5, 5, 64) 0
## ________________________________________________________________________________
## conv2d_2 (Conv2D) (None, 3, 3, 64) 36928
## ________________________________________________________________________________
## flatten (Flatten) (None, 576) 0
## ________________________________________________________________________________
## dense_4 (Dense) (None, 64) 36928
## ________________________________________________________________________________
## dense_5 (Dense) (None, 10) 650
## ================================================================================
## Total params: 93,322
## Trainable params: 93,322
## Non-trainable params: 0
## ________________________________________________________________________________
Compile the model:
adam
, with the learning rate equals to \(0.001\);model %>% compile(
optimizer = optimizer_adam(lr = 0.001),
loss = "categorical_crossentropy",
metrics = c("accuracy")
)
Train the neural network.
history <- model %>% fit(
x = array_reshape(train_images, c(60000, 28, 28, 1)),
y = train_labels,
epochs = 10,
batch_size = 32,
validation_split = 0.2,
verbose = 1
)
The plot, representing the loss function and the accuracy in relation to the number of epochs, is shown below.
Evaluating the model on the test set.
results <- model %>% evaluate(
x = array_reshape(test_images, c(10000, 28, 28, 1)),
y = test_labels,
verbose = 0
)
print(paste("Loss on test data:", results["loss"]))
## [1] "Loss on test data: 0.0345294252038002"
## [1] "Accuracy on test data: 0.991999983787537"
Model | Accuracy |
---|---|
Convolutional neural network | 99.2% |
Deep neural network | 97.9% |
Support vector machine | 97.6% |
Gradient boosting | 97.1% |
K-nearest neighbours | 96.3% |
Random forest | 96.1% |
Classification tree | 87.6% |