Detecting Relevance and Spam in Product Reviews
Disciplines
Data Science | Other Computer Sciences
Abstract (300 words maximum)
Online retailers such as Amazon allow customers to make review on purchased products, which can lead to challenges in managing irrelevant and deceptive product reviews. These reviews can be misleading to customers and decrease the level of trust that they have in the marketplace. This project aims to create a framework for detecting review relevance and spam through semantic analysis. Term Frequency-Inverse Document Frequency with logistic regression will be used to establish a baseline, while SBERT embeddings will provide semantic similarity between a review and the product description. To advance this project further, an Aspect-Based Sentiment Analysis with Explicit Sentiment Augmentations (ABSA-ESA) will be used. This is an advanced deep learning approach to transform implicit sentiment into an explicit statement. A transformer model such as RoBERTa will be tuned on both the original and augmented data in order to capture the aspect-sentiment relationships. These aspects can be cross-referenced with known product attributes to assess relevance and inconsistencies between sentiment and ratings along with other irregularities can be used assess spam detection. The framework will be trained and evaluated using the Amazon Fine Food and Commerce Review datasets, the UCI SMS Spam dataset will inform spam detection, and the SemEval laptop and restaurant datasets will be used for ABSA-ESA development. Evaluation metrics will include accuracy, precision, recall, F1-score, and area under the precision-recall curve. This framework is designed to produce a high-precision system capable of distinguishing relevant reviews from spam or irrelevant reviews. The goal is to improve transparency and reliability in online review systems such as the one used on Amazon.
Use of AI Disclaimer
no
Academic department under which the project should be listed
CCSE – Computer Science
Primary Investigator (PI) Name
Md Abdullah Al Hafiz Khan
Detecting Relevance and Spam in Product Reviews
Online retailers such as Amazon allow customers to make review on purchased products, which can lead to challenges in managing irrelevant and deceptive product reviews. These reviews can be misleading to customers and decrease the level of trust that they have in the marketplace. This project aims to create a framework for detecting review relevance and spam through semantic analysis. Term Frequency-Inverse Document Frequency with logistic regression will be used to establish a baseline, while SBERT embeddings will provide semantic similarity between a review and the product description. To advance this project further, an Aspect-Based Sentiment Analysis with Explicit Sentiment Augmentations (ABSA-ESA) will be used. This is an advanced deep learning approach to transform implicit sentiment into an explicit statement. A transformer model such as RoBERTa will be tuned on both the original and augmented data in order to capture the aspect-sentiment relationships. These aspects can be cross-referenced with known product attributes to assess relevance and inconsistencies between sentiment and ratings along with other irregularities can be used assess spam detection. The framework will be trained and evaluated using the Amazon Fine Food and Commerce Review datasets, the UCI SMS Spam dataset will inform spam detection, and the SemEval laptop and restaurant datasets will be used for ABSA-ESA development. Evaluation metrics will include accuracy, precision, recall, F1-score, and area under the precision-recall curve. This framework is designed to produce a high-precision system capable of distinguishing relevant reviews from spam or irrelevant reviews. The goal is to improve transparency and reliability in online review systems such as the one used on Amazon.