Date of Submission
Spring 5-13-2019
Degree Type
Thesis
Degree Name
Master of Science in Computer Science (MSCS)
Department
Computer Science
Committee Chair/First Advisor
Dr. Chih-Cheng Hung
Track
Others
Machine Learning
Chair
Dr. Chih-Cheng Hung
Committee Member
Dr. Mingon Kang
Committee Member
Dr. Xiaohua Xu
Abstract
Imbalanced data is a common problem in machine learning where the number of observations that belong to one class is significantly lower than other classes. Due to the skewed distribution among the classes, most classification algorithms fail to classify minority instances effectively. The class imbalance problem can be found in many domains such as credit card fraud detection and rare diseases diagnosis.
Imbalanced data is a prominent issue also in remote sensing images (RSI) which are used to obtain information of earth resources and the surrounding environment. RSI are collected by special cameras that capture information from a specific wavelength range in the electromagnetic spectrum. These RSI play an essential role in agriculture, military and weather forecasting. Accurate classification of RSI is a challenging task because these images may consist of areas with a scarce number of pixels known as the minority class. Similarly, due to this imbalanced class distribution within RSI, most classification algorithms are unable to classify RSI correctly. There are three main approaches to handle imbalanced data; data level approach, algorithm level approach, and cost-sensitive approach. However, these approaches fail to detect the minority class in most imbalanced RSI effectively.
In this research, a new Constrained Box Algorithm (CBA) is proposed to detect the minority class in RSI accurately. The proposed algorithm finds the minority class by an iterative process of discovering appropriate decision boundaries through clustering. The CBA effectively reduces the misclassification of majority instances as a part of the minority class by restricting the maximum number of majorities within a considered decision boundary. This process can eliminate the majority instances from the initial boundary set. A threshold parameter is used in the search process to find acceptable boundaries. The set of accepted boundaries are then used to discover the minority instances within the test images. Experimental results demonstrate that the minority class was correctly identified in the RSI.
Comments
Imbalanced data is a common problem in machine learning where the number of observations that belong to one class is significantly lower than other classes. Due to the skewed distribution among the classes, most classification algorithms fail to classify minority instances effectively. The class imbalance problem can be found in many domains such as credit card fraud detection and rare diseases diagnosis.
Imbalanced data is a prominent issue also in remote sensing images (RSI) which are used to obtain information of earth resources and the surrounding environment. RSI are collected by special cameras that capture information from a specific wavelength range in the electromagnetic spectrum. These RSI play an essential role in agriculture, military and weather forecasting. Accurate classification of RSI is a challenging task because these images may consist of areas with a scarce number of pixels known as the minority class. Similarly, due to this imbalanced class distribution within RSI, most classification algorithms are unable to classify RSI correctly. There are three main approaches to handle imbalanced data; data level approach, algorithm level approach, and cost-sensitive approach. However, these approaches fail to detect the minority class in most imbalanced RSI effectively.
In this research, a new Constrained Box Algorithm (CBA) is proposed to detect the minority class in RSI accurately. The proposed algorithm finds the minority class by an iterative process of discovering appropriate decision boundaries through clustering. The CBA effectively reduces the misclassification of majority instances as a part of the minority class by restricting the maximum number of majorities within a considered decision boundary. This process can eliminate the majority instances from the initial boundary set. A threshold parameter is used in the search process to find acceptable boundaries. The set of accepted boundaries are then used to discover the minority instances within the test images. Experimental results demonstrate that the minority class was correctly identified in the RSI.