Streaming Media

Document Type

Event

Start Date

1-12-2022 5:00 PM

Description

The asl-alphabet dataset, hosted by Kaggle, is a collection of 87000 color images sized 200x200x3, grouped into 29 classes of 3000 images apiece (dataset A). The classes consist of the 26 English letters plus three classes for space, delete and nothing. As a proof of concept, the dataset was first truncated by deleting 2700 images from each class, leaving only the first 300 images per class and totaling 8700 images, or 10% of the original number of images (dataset B). Then, transfer learning was applied to dataset B using Alexnet with Imagenet weights. >99% accuracy with dataset B was readily achieved. Dataset B was then processed into a third dataset (dataset C), by converting each image to black and white and also compressing its size to 1/16 the size of the original. This too quickly yielded >99% accuracy with Alexnet. Next, a custom neural network was used against the small datasets B and C with similar results of >99% accuracy. Dataset A would have taken too long to train in its entirety, so this was not completed for the current project, however the same image processing steps as outlined above were applied to dataset A to yield a fourth dataset (dataset D), containing 87000 black and white reduced-size images. Using the custom neural network, after 6 epochs and 10+ hours' training, the model was consistently classifying testing images with >99% accuracy. This suggests that the problem of image classification can, at least in some contexts, be heavily simplified in order to speed up training by reducing computational complexity, or in other words, by forcing the neural network to focus only on what counts the most. This idea, neatly summed up by the newly-coined term "parsimonics", prioritizes the essential over the superfluous, by posing the question: "what is the minimum amount of data that can be used from the original dataset without sacrificing classification accuracy above a given threshold?"

Share

COinS
 
Dec 1st, 5:00 PM

GR-245 Parsimonics: Achieving High Classification Accuracy even with High Dimensional Image Reduction

The asl-alphabet dataset, hosted by Kaggle, is a collection of 87000 color images sized 200x200x3, grouped into 29 classes of 3000 images apiece (dataset A). The classes consist of the 26 English letters plus three classes for space, delete and nothing. As a proof of concept, the dataset was first truncated by deleting 2700 images from each class, leaving only the first 300 images per class and totaling 8700 images, or 10% of the original number of images (dataset B). Then, transfer learning was applied to dataset B using Alexnet with Imagenet weights. >99% accuracy with dataset B was readily achieved. Dataset B was then processed into a third dataset (dataset C), by converting each image to black and white and also compressing its size to 1/16 the size of the original. This too quickly yielded >99% accuracy with Alexnet. Next, a custom neural network was used against the small datasets B and C with similar results of >99% accuracy. Dataset A would have taken too long to train in its entirety, so this was not completed for the current project, however the same image processing steps as outlined above were applied to dataset A to yield a fourth dataset (dataset D), containing 87000 black and white reduced-size images. Using the custom neural network, after 6 epochs and 10+ hours' training, the model was consistently classifying testing images with >99% accuracy. This suggests that the problem of image classification can, at least in some contexts, be heavily simplified in order to speed up training by reducing computational complexity, or in other words, by forcing the neural network to focus only on what counts the most. This idea, neatly summed up by the newly-coined term "parsimonics", prioritizes the essential over the superfluous, by posing the question: "what is the minimum amount of data that can be used from the original dataset without sacrificing classification accuracy above a given threshold?"