Date of Submission
Spring 5-7-2018
Degree Type
Thesis
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
Committee Chair/First Advisor
Dan Chia-Tien Lo
Track
Big Data
Chair
Dan Chia-Tien Lo
Committee Member
Mingon Kang
Committee Member
Kai Qian
Abstract
In this work, I examine a dataset of Amazon product metadata and propose a heterogeneous multiple classifier system for the task of identifying best-selling products in multiple categories. This system of classifiers consumes the product description and the featured product image as input and feeds them through binary classifiers of the following types: Convolutional Neural Network, Na¨ıve Bayes, Random Forest, Ridge Regression, and Support Vector Machine. While each individual model is largely successful in identifying best-selling products from non best-selling products and from worst-selling products, the multiple classifier system is shown to be stronger than any individual model in the majority of cases of identifying best-selling products from non best-selling products, and achieves up to 83.3% accuracy, depending on the product category. To my best knowledge, this research is the first application of ensemble learning to Amazon product data of this type and the first use of product images and Convolutional Neural Networks to predict product success.