Date of Submission
Summer 7-20-2021
Degree Type
Thesis
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
Committee Chair/First Advisor
Dr. Dan Lo
Track
Big Data
Committee Member
Dr. Reza Meimandi Parizi
Committee Member
Dr. Yong Shi
Abstract
Big data analytics is gaining popularity for enterprises in optimizing their business processes ranging from retailers, supply chains, to online shopping stores. Existing practical raw data are far from usable to achieve the goal. Therefore, a good data pre-processing approach is required and is a key step to success. We propose to research on the effectiveness of data pre-processing and the business process based on a real world database. Our methodology involves natural language processing. Our key goal is to study appropriate methods with big data analysis techniques that can handle errors, ambiguity, and repeated descriptions caused by human languages. In this study, we did a simple language similarity checking to understand the database status. We also applied a logical representation system in our database to prove this concept.