Date of Submission

Spring 4-27-2017

Degree Type

Thesis

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

Committee Chair/First Advisor

Dr. Dan Chia-Tien Lo

Track

CyberSecurity

Chair

Dr. Dan Chia-Tien Lo

Committee Member

Dr. Jing Selena He

Committee Member

Dr. Kai Qian

Abstract

The ubiquitous advance of technology has been conducive to the proliferation of cyber threats, resulting in attacks that have grown exponentially. Consequently, researchers have developed models based on machine learning algorithms for detecting malware. However, these methods require significant amount of extracted features for correct malware classification, making that feature extraction, training, and testing take significant time; even more, it has been unexplored which are the most important features for accomplish the correct classification.

In this Thesis, it is created and analyzed a dataset of malware and clean files (goodware) from the static and dynamic features provided by the online framework VirusTotal. The purpose was to select the smallest number of features that keep the classification accuracy as high as the state of the art researches. Selecting the most representative features for malware detection relies on the possibility reducing the training time, given that it increases in O(n²) with respect to the number of features, and creating an embedded program that monitors processes executed by the OS. Thus, feature selection was made taking the most important features.

In addition, classification algorithms such as Random Forest, Support Vector Machine and Neural Networks were used in a novel combination that not only showed an increase in accuracy, but also in the training speed from hours to just minutes. Next, the model was tested on one additional dataset of unseen malware files. Results showed that “9” features were enough to distinguish malware from goodware files within an accuracy of 99.60%.

CAC_MSCS_KSU_Thesis_02_06_2017.pdf (6327 kB)
Final Version of the Thesis

Download

Included in

Other Computer Engineering Commons

COinS

Master of Science in Computer Science Theses

Feature Selection and Improving Classification Performance for Malware Detection

Date of Submission

Degree Type

Degree Name

Department

Committee Chair/First Advisor

Track

Chair

Committee Member

Committee Member

Abstract

Included in

Search

Authors

Browse

Links

Useful Links

Master of Science in Computer Science Theses

Feature Selection and Improving Classification Performance for Malware Detection

Author

Date of Submission

Degree Type

Degree Name

Department

Committee Chair/First Advisor

Track

Chair

Committee Member

Committee Member

Abstract

Included in

Share

Search

Authors

Browse

Links

Useful Links