Date of Award

Spring 4-17-2024

Degree Type


Degree Name

Ph.D. in Data Science and Analytics


Data Science and Analytics

Committee Chair/First Advisor

Dr. Dominic Thomas, Dr. Md Abdullah Al Hafiz Khan

Second Advisor

Dr. Monica Nandan

Third Advisor

Dr. Sherry Ni

Fourth Advisor

Dr. Yong Pei


Identifying behavioral health is paramount for law enforcement officers to provide appropriate follow-up community care. In the current practice, law enforcement offices manually identify these behavioral health cases to allow the designation of the relevant follow-up resources. Police reports generated by officers' response to 911 calls remain an untapped resource for identifying such incidents. Therefore, we advocate for the incorporation of manual annotations from experts, natural language processing (NLP), active learning, advanced machine learning, and ensemble techniques to detect behavioral health cases within police reports. In this dissertation, we develop tools and frameworks to automatically detect behavioral health cases from police public narrative reports by identifying behavioral health indicator signals based on a collaborative cross-disciplinary team that includes academics and practitioners. This dissertation presents several noteworthy contributions. Firstly, we introduce a framework tailored for annotating and expanding ground truth samples of behavioral health cases. Secondly, we propose a human-in-the-loop active learning approach, leveraging our novel uncertainty-based informative cluster sampling strategy to improve active learning. This querying strategy selects the most informative and diverse samples for expert annotation. Thirdly, we introduce an adaptive attention-aware fusion model, which combines behavioral health keyword cues with self-attention contextual extraction to enhance semantic understanding and detection performance. Lastly, we present a novel hierarchical domain-enhanced ensemble learning framework constructed with multiple sub-models for detecting keyword presence in sentences and extracting contextual meanings by concatenating embedding representations, creating twelve sub-models addressing different perspectives of a single police report based on keyword information influenced by different biases within collaborators. Our experimental results showed that the proposed adaptive attention-aware fusion model outperforms state-of-the-art classifiers on a dataset of 300 manually annotated ground truth police reports, achieving an accuracy of 87.58% and an F1-score of 85.67%. After applying our querying strategy to our proposed model to detection of behavioral health, we achieved an accuracy of 92% and an F1-score of 91.1%. Also, our proposed model achieves an accuracy score of 93.75%, and an F1-score of 93.61% on unseen samples. Additionally, our proposed model demonstrates its interpretability by extracting the keywords associated with each behavioral health category. Further analysis by retraining our proposed model on a new set of $3,293$ manually annotated ground truth police reports achieved an accuracy score of 96.57%, and an F1-score of 96.20%. Finally, our proposed hierarchical domain-enhanced ensemble learning framework effectively functions as a weak sub-model learner, adept at detecting and interpreting the contextual nuances of each keyword within its respective sentence. This framework achieves an accuracy of 99.79% and an F1 score of 99.76%.

Available for download on Wednesday, May 07, 2025