Date of Submission

Spring 5-7-2018

Degree Type

Thesis

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

Track

Big Data

Faculty Advisor

Dan Chia-Tien Lo

Chair

Dan Chia-Tien Lo

Committee Member

Mingon Kang

Committee Member

Kai Qian

Abstract

In this work, I examine a dataset of Amazon product metadata and propose a heterogeneous multiple classifier system for the task of identifying best-selling products in multiple categories. This system of classifiers consumes the product description and the featured product image as input and feeds them through binary classifiers of the following types: Convolutional Neural Network, Na¨ıve Bayes, Random Forest, Ridge Regression, and Support Vector Machine. While each individual model is largely successful in identifying best-selling products from non best-selling products and from worst-selling products, the multiple classifier system is shown to be stronger than any individual model in the majority of cases of identifying best-selling products from non best-selling products, and achieves up to 83.3% accuracy, depending on the product category. To my best knowledge, this research is the first application of ensemble learning to Amazon product data of this type and the first use of product images and Convolutional Neural Networks to predict product success.

Share

COinS