Date of Submission

Spring 5-13-2019

Degree Type

Thesis

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

Committee Chair/First Advisor

Dr. Chih-Cheng Hung

Track

Others

Machine Learning

Chair

Dr. Chih-Cheng Hung

Committee Member

Dr. Mingon Kang

Committee Member

Dr. Xiaohua Xu

Comments

Imbalanced data is a common problem in machine learning where the number of observations that belong to one class is significantly lower than other classes. Due to the skewed distribution among the classes, most classification algorithms fail to classify minority instances effectively. The class imbalance problem can be found in many domains such as credit card fraud detection and rare diseases diagnosis.

Imbalanced data is a prominent issue also in remote sensing images (RSI) which are used to obtain information of earth resources and the surrounding environment. RSI are collected by special cameras that capture information from a specific wavelength range in the electromagnetic spectrum. These RSI play an essential role in agriculture, military and weather forecasting. Accurate classification of RSI is a challenging task because these images may consist of areas with a scarce number of pixels known as the minority class. Similarly, due to this imbalanced class distribution within RSI, most classification algorithms are unable to classify RSI correctly. There are three main approaches to handle imbalanced data; data level approach, algorithm level approach, and cost-sensitive approach. However, these approaches fail to detect the minority class in most imbalanced RSI effectively.

In this research, a new Constrained Box Algorithm (CBA) is proposed to detect the minority class in RSI accurately. The proposed algorithm finds the minority class by an iterative process of discovering appropriate decision boundaries through clustering. The CBA effectively reduces the misclassification of majority instances as a part of the minority class by restricting the maximum number of majorities within a considered decision boundary. This process can eliminate the majority instances from the initial boundary set. A threshold parameter is used in the search process to find acceptable boundaries. The set of accepted boundaries are then used to discover the minority instances within the test images. Experimental results demonstrate that the minority class was correctly identified in the RSI.

Abstract

Imbalanced data is a common problem in machine learning where the number of observations that belong to one class is significantly lower than other classes. Due to the skewed distribution among the classes, most classification algorithms fail to classify minority instances effectively. The class imbalance problem can be found in many domains such as credit card fraud detection and rare diseases diagnosis.

Imbalanced data is a prominent issue also in remote sensing images (RSI) which are used to obtain information of earth resources and the surrounding environment. RSI are collected by special cameras that capture information from a specific wavelength range in the electromagnetic spectrum. These RSI play an essential role in agriculture, military and weather forecasting. Accurate classification of RSI is a challenging task because these images may consist of areas with a scarce number of pixels known as the minority class. Similarly, due to this imbalanced class distribution within RSI, most classification algorithms are unable to classify RSI correctly. There are three main approaches to handle imbalanced data; data level approach, algorithm level approach, and cost-sensitive approach. However, these approaches fail to detect the minority class in most imbalanced RSI effectively.

In this research, a new Constrained Box Algorithm (CBA) is proposed to detect the minority class in RSI accurately. The proposed algorithm finds the minority class by an iterative process of discovering appropriate decision boundaries through clustering. The CBA effectively reduces the misclassification of majority instances as a part of the minority class by restricting the maximum number of majorities within a considered decision boundary. This process can eliminate the majority instances from the initial boundary set. A threshold parameter is used in the search process to find acceptable boundaries. The set of accepted boundaries are then used to discover the minority instances within the test images. Experimental results demonstrate that the minority class was correctly identified in the RSI.

Share

COinS