Date of Award

Fall 12-20-2021

Degree Type

Thesis

Degree Name

Master of Science in Information Technology (MSIT)

Department

Information Technology

First Advisor

Dr. Mohammed Aledhari

Abstract

The discovery of MicroRNA (miRNA) sparked medical breakthroughs, leading to the development of drugs/vaccines and biomarkers for some terminal diseases such as cancer. Due to the relatively short length of microRNAs, research has shown that discovering them on their own is a difficult task; thus, the focus has shifted to predicting precursor miRNAs, which are longer than miRNAs. Computational techniques evolve because of flaws discovered in existing designs, discoveries, and the desire to make the process as seamless as possible. Most researchers in recent studies indicated that the use of few input features and a lack of domain understanding of selected input features could impact the accuracy of the results, causing significant bias and making the models appear to be a 'black box.' This study aims to gain more insight into the features selection used in building these models and ensure that all relevant sub-characteristics/features discovered are included in future models. The studies cited in this work were chosen based on the year of publication, the use of deep neural networks for prediction, and the population-focused solely on human pre-miRNA. AMSTAR 2, which is the critical appraisal tool that was used to assess the risk of bias and the quality of evidence of this review. In contrast, the textual narrative method is used to synthesize the results.

According to the findings of this study, a total of 5 studies were reviewed, with each of them building their models with one feature, except for one study that used 58 input features. Compared to other machine learning classifiers, the model with 58 input features has more than 99 percent accuracy, outperforming them all. This study proposes that entropy, minimum free energy, stem-loop structures, sequence information, and secondary structures are essential in miRNA prediction. This study's limitations stemmed from insufficient datasets to train the model to avoid over or under-fitting. Finally, because there is a better understanding of the basis of prediction, the inclusion of more significant sub-features from the input features with domain understanding of the selected feature in the model brings clarity to biologists who use the model. Future research should look into how these models with more distinct input features can predict the pre-miRNAs of organisms other than humans.

Available for download on Tuesday, December 20, 2022

Share

COinS