Subscribe to RSS Feed (Opens in New Window)

Proceedings
2026
Wednesday, April 1st
9:00 AM

AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model

Yagizhan Bilal Durak, Sam Houston State University
Ashley Morgan-Olvera, Texas Invasive Species Institute, Sam Houston State University
Ahsan Ul Islam, Sam Houston State University
Iftekhar Ibne Basith, Sam Houston State University
Shahidul Islam, Kennesaw State University
Syed Hasib Akhter Faruqui, Sam Houston State University

Kennesaw, Georgia

9:00 AM - 10:15 AM

Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable or no internet connectivity. At the same time, the rapid growth of artificial intelligence (AI) and large language models (LLMs) has created new opportunities to deliver practical decision support tools directly to end users in agriculture through compact and deployable systems. This work addresses (i) generating a structured insect information dataset to be used for LLM training, and (ii) adapting a lightweight LLM model (<= 7B) by fine tuning it for possible future edge device uses in agricultural pest management. The textual data collection was done by reviewing and collecting information from available pest databases and published manuscripts on nine selected pest species. These structured reports were then reviewed and validated by a domain expert. From these reports, we constructed question-answer (Q/A) pairs to support model training and evaluation. A LoRA-based fine-tuning approach was applied to multiple lightweight LLMs and evaluated. Initial evaluation shows that Mistral 7B achieves an 88.9% pass rate on the domain-specific Q/A task, substantially outperforming Qwen 2.5 7B (63.9%), and LLaMA 3.1 8B (58.7%). Notably, Mistral demonstrates higher semantic alignment (embedding similarity: 0.865) despite lower lexical overlap (BLEU: 0.097), indicating that semantic understanding and robust reasoning are more predictive of task success than surface-level conformity to reference text in specialized domains. By combining expert organized data, well structured Q/A pairs, semantic quality control, and efficient model adaptation, this work contributes towards providing support for farmer facing agricultural decision support tools and demonstrates the feasibility of deploying compact, high-performing language models for practical field-level pest management guidance.

9:00 AM

Cross Correlation-Based Physiological Synchronization for Multi-Modal Wearable Cognitive Stress Monitoring System

Fariha Alam, Kennesaw State University
Nora Tin, Kennesaw State University
Razvan Voicu, Kennesaw State University

Kennesaw, Georgia

9:00 AM - 10:15 AM

Cognitive stress arises from sustained mental demand, information overload, or task complexity, and requires continuous monitoring for timely detection and intervention in real-world settings. It produces subtle physiological changes, including shifts in heart rate and autonomic activity that require continuous monitoring to detect reliably. Wearable sensors such as photoplethysmography (PPG) and electrodermal activity (EDA) enable non-invasive tracking of these responses. However, multi-modal sensor integration remains challenging, as hardware or software-level synchronization does not consider the physiological response delays. To address this, the study collected PPG and EDA sensor data from ten participants under two cognitive stress-inducing conditions. After preprocessing, heart rate and skin conductance response data were extracted from PPG and EDA signals. A cross-correlation framework was then used to estimate the temporal lag between responses of two modalities, capturing participant-specific and condition-dependent delays. The identified lag was applied to phase-shift EDA signals to align them with heart-rate-based responses. Results showed that heart rate responded quickly to stress onset, whereas EDA exhibited consistent but variable delays across individuals and conditions, an average of 2.1 s in Stroop and 3.7 s in PASAT phase. Incorporating these physiological lags improved synchronization quality and enhanced classification performance compared to unsynchronized data. After synchronization, models accuracy increased for RF (0.5067 to 0.6353), kNN (0.6000 to 0.7185), and CatBoost (0.6022 to 0.6241). These findings demonstrate that accounting for individualized physiological delays strengthens multi-modal fusion and increases the reliability of wearable cognitive stress monitoring systems.

9:00 AM

Vibroacoustic Characterization of Manipulator Robot via Kinematic Discretization

Gershom Richards PE, Kennesaw State University
David A. Guerra-Zubiaga, Kennesaw State University

Kennesaw, Georgia

9:00 AM - 10:15 AM

Using vibroacoustic signals to characterize the health and status of industrial equipment is well-used. Many recent studies leverage a black box approach for developing a vibroacoustic machine learning (ML) architecture capable of identifying anomalous behaviors in the robot operation. While useful for routine automation systems, it is not suitable for dynamic robot systems with flexible or behavioral programming, or robots that will experience consistently changing environmental factors. To address these shortcomings, a discretized approach is explored. The proposed Long Short-Term Memory (LSTM)-Transformer Hybrid ML architecture segments its training data by manipulator joints to incorporate the impact of each subsystem on the overall system response. The RMSE & R2 of the initial and final joint position predictions were 22.818 & 0.482 and 37.899 & 0.278, respectively. This indicates positive influence in the model's ability to predict, demonstrating the potential for this novel approach, with room for model improvement in future research.

10:15 AM

MedUncertainVLM: Multi-Modal Uncertainty Quantification in Vision-Language Models for Clinical Documentation

Shikhar Patel, Northeastern University
Rushabh Darji, Northeastern University

Kennesaw, Georgia

10:15 AM - 11:30 AM

Vision-language models are increasingly deployed for clinical documentation tasks in radiology. A common application involves extracting diagnostic labels from chest radiographs with reports. These models must communicate calibrated uncertainty to avoid influencing patient care incorrectly. Overconfident predictions from poorly calibrated models pose a documented patient safety risk. Existing medical vision-language models produce only point estimates without any confidence bounds. Standard post-hoc calibration methods such as temperature scaling omit cross-modal signals entirely. They cannot detect cases where image and text branches make genuinely inconsistent predictions. This paper presents MedUncertainVLM, a framework that addresses this gap directly and systematically. The system uses deep ensembles on a fine-tuned BioViL-T model with a novel disagreement component. When image-branch and text-branch predictions diverge beyond a learned threshold, uncertainty is flagged. Flagged cases are routed to a human radiologist for review before any decision is finalized. We evaluate the framework on the MIMIC-CXR benchmark across 14 diagnostic labels in full. The held-out test set contains 5,000 samples drawn from the official evaluation split. MedUncertainVLM reduces Expected Calibration Error by 52.8 percent versus the uncalibrated baseline. Micro-averaged F1 is maintained at 84.1 percent across all 14 diagnostic label categories. A selective prediction protocol abstains on 19.4 percent of the highest-uncertainty cases. On the remaining retained predictions, micro-F1 reaches 90.3 percent across all labels. Cross-modal disagreement predicts true classification error better than ensemble variance alone. The disagreement signal achieves an AUROC of 0.81 versus 0.69 for variance-only uncertainty. We further present a lightweight MC Dropout approximation that recovers 89.5% of the ensemble's calibration benefit at one-fifth the inference cost, a threshold sensitivity analysis demonstrating robustness across operating points, and a comparative evaluation against additional medical VLM architectures including MedCLIP and PubMedCLIP.

10:15 AM

QBiasNet: Quantum-Enhanced Variational Classifiers for Ethical Bias Detection in Multimodal AI Models

Nikunj Doshi, Northeastern University
Kavach Shah, Boston University
Shikhar Patel, Northeastern University

Kennesaw, Georgia

10:15 AM - 11:30 AM

Bias in multimodal AI systems that jointly process image and text inputs creates measurable risks in sensitive deployment contexts including public health, financial services, and automated hiring. Classical detection approaches face a fundamental architectural limitation in that they cannot efficiently model intersectional bias. Intersectional bias emerges from the nonlinear interaction of multiple protected attributes simultaneously across visual and linguistic modalities. This paper introduces QBiasNet, a hybrid quantum-classical system that encodes cross-modal CLIP embeddings into an 8-qubit Variational Quantum Circuit (VQC) implemented in PennyLane. By exploiting quantum entanglement, the VQC learns high-order feature correlations that linear probes and shallow neural networks systematically miss. Evaluated on a 12,000-sample benchmark merging the FairFace and WinoBias datasets, QBiasNet achieves 91.4% bias detection accuracy and reduces the intersectional bias false negative rate by 33% relative to the strongest classical baseline. An entanglement ablation study confirms that the structural properties of the VQC are responsible for the observed performance advantage, not its parameter count. These findings suggest that quantum-enhanced classifiers are a practical and theoretically grounded tool for next-generation AI governance and regulatory compliance auditing.

10:15 AM

Quantifying the Impact of Mobile Distractions on College Students' Attention Performance

Sonipriya Paul, Georgia State University
Jessica L. Bolton, Georgia State University
Ashwin Ashok, Georgia State University

Kennesaw, Georgia

10:15 AM - 11:30 AM

This research identifies that it is important to understand how humans sustain and recover attention to their primary activity amid mobile device interruptions. In this paper, we present our key findings and insights from an user-study based measurement of attention performance of graduate students. Ten participants (graduate students) independently completed attention tasks in an office environment (with no background noise) while receiving periodic messages (distraction), with continuous data recording on a wearable electroencephalography (EEG) headset. The attention tasks were designed in line with the standard GO/NO-GO test from psychology domain. Our experimentation included three variations of the GO/NO-GO test and across three scenarios: no distraction, forced to respond to distraction, and forced NOT to respond to distraction. We quantify the attention performance through our proposed Attention Performance Score (APS) metric which is computed using behavioral (attention test accuracy and reaction time) metrics, EEG (channels signal power levels, frontal alpha symmetry) metrics, and recovery time metrics. Our quantitative results reveal that mobile distractions largely affect students' attention negatively which correlates well with the survey responses from the participants on the impact of the distractions on completion of their primary (attention) tasks.

12:30 PM

AI-Driven Enterprise Architecture and Value Realization Framework

Hemant Soni, Independent Researcher

Kennesaw, Georgia

12:30 PM - 1:45 PM

For over 40 years enterprise architecture has promised to align technology investment with organizational strategy. Just a portion of that promise has been fulfilled. Extensive documentation or artifact repositories that describe the enterprise in detail but hardly ever influence its direction in any discernible way has been the field's main output. This paper begins with that diagnosis which was most accurately expressed by Tamm et al. and in the context of the public sector by Dang and Pekkola and inquires as to whether artificial intelligence can solve the underlying structural issue instead of just speeding up the documentation process. We contend that it can but only if the integration is theoretically based as opposed to feature driven. Using necessity arguments rather than design preferences we construct the AI-Driven Enterprise Architecture and Value Realization Framework (AI-EA-VRF). Based on Teece's Dynamic Capabilities theory, Henderson and Venkatraman's Strategic Alignment Model, and Weill and Ross IT governance framework we demonstrate that precisely four components, a Strategic Alignment Engine, an AI Decision Intelligence Module, an Architecture Governance Layer, and a Value Realization Dashboard and that eliminating or collapsing any of them would violate at least one fundamental theoretical premise. Based on Cohen and Levinthal's absorptive capacity construct five-level Organizational Maturity Adoption Model outlines quantifiable entry requirements at each level. A specific empirical research agenda is defined by seven falsifiable hypotheses. Additionally, the paper offers a theoretical rebuttal that pinpoints the boundary conditions under which the competing explanation that well-resourced manual governance can achieve equivalent outcomes fails.

12:30 PM

Enabling Parameter-Efficient Multi-Tasking via Multi-Head Architecture for Edge Device Deployment

Redwanul Islam Arif, Kennesaw State University
Mohammod Akib Khan, Kennesaw State University
Syed Hasib Akhter Faruqui, Sam Houston State University
Sahidul Islam, Kennesaw State University

Kennesaw, Georgia

12:30 PM - 1:45 PM

Deploying Deep Neural Networks (DNNs) on edge devices requires balancing task performance with strict memory and computational constraints. Conventional approaches to multi-task learning often rely on replicating heavy backbones, which is infeasible for resource-constrained environments. In this work, we propose and rigorously analyze a parameter-efficient multi-branch architecture based on a shared, ImageNet-pretrained ResNet-18 backbone. Instead of deploying independent models for distinct visual tasks, we consolidate feature extraction for multiple tasks into a single, unified framework. We first conduct an ablation study to determine the optimal depth of task-specific fully connected (FC) classification heads, maximizing learning capacity while minimizing computational overhead. Subsequently, we systematically investigate architectural branching points, diverging the network at different branching point of the model to identify the optimal balance between early feature sharing and late-stage task specialization that balances accuracy and parameter reuse. Our empirical results demonstrate that an optimized branching strategy significantly reduces the total parameter count compared to independent baseline models, while maintaining highly competitive task-specific accuracy.

12:30 PM

Optimization, Co-Design and NextGen of Agentic AI

Bhawesh Singh, ScienceLogic Inc

Kennesaw, Georgia

12:30 PM - 1:45 PM

As generative artificial intelligence and multi-agent systems transition from experimental research prototypes into the backbone of enterprise-critical infrastructure, the industry has reached a pivotal juncture where computing bottlenecks represent the primary barrier to progress. The historical reliance on "brute-force" scaling, simply increasing model parameter counts, has reached a point of diminishing returns, as the exponential escalation in training costs and energy consumption becomes economically and environmentally unsustainable. To address these challenges, this paper provides a comprehensive examination of six cutting-edge technological pillars currently redefining neural architecture and optimization: Neural Architecture Search (NAS) for automated design, Advanced efficient training techniques, Energy-efficient Neuromorphic computing, Quantum Machine Learning (QML), Hardware-software co-design, and the latest methodologies in Model compression and pruning. For each of these six pillars, the study presents verified, empirical results derived from peer-reviewed research, highlighting the specific techniques that are currently reshaping the landscape of enterprise AI deployment. Beyond identifying successes, the paper meticulously documents the technical and physical constraints that currently limit these technologies. Finally, we integrate the recently proposed AgentOS framework (arXiv:2602.20934, Feb. 24, 2026) as a unifying, OS-level abstraction. We demonstrate how AgentOS serves as the essential connective tissue between these six pillars, providing a standardized architectural layer that manages hardware resources and algorithmic efficiency to support the next generation of scalable, autonomous agentic systems.

12:30 PM

QCausalMed: Hybrid Quantum-AI Approaches for Optimizing Causal Inference in Personalized Oncology

Nikunj Doshi, Northeastern University
Kavach Shah, Boston University
Shikhar Patel, Northeastern University

Kennesaw, Georgia

12:30 PM - 1:45 PM

Estimating individualized treatment effects from observational clinical data is a central challenge. Personalized oncology depends on this estimation for meaningful patient-level decisions. Standard variable selection methods tend to over-adjust for spurious covariates in practice. They also under-adjust for true confounders when the causal graph is high-dimensional. This failure arises because optimal adjustment set identification is NP-hard in the general case. This paper proposes QCausalMed, a hybrid quantum-classical pipeline addressing this problem. The system formulates confounder selection as a Quadratic Unconstrained Binary Optimization problem. This QUBO problem is solved using the Quantum Approximate Optimization Algorithm, or QAOA. The resulting quantum-selected minimal adjustment set then feeds into a TARNet neural outcome model. TARNet estimates individualized treatment effects from the causally validated covariate subset. We evaluate the full pipeline on TCGA Breast Invasive Carcinoma data covering 876 patients. The dataset includes 14 clinical and genomic covariates across 2 treatment arms. QCausalMed achieves a normalized root mean square error of 0.38 on the PEHE metric. This represents a 19.1% improvement over LASSO-adjusted TARNet on the same data. It also represents a 26.9% improvement over CausalForest under identical evaluation conditions. The QAOA circuit uses 5 qubits and is simulated via PennyLane throughout all experiments. The quantum-selected adjustment set contains only 6 variables versus 11 for LASSO selection. This reduction lowers over-adjustment bias while preserving full backdoor criterion validity. These results indicate that quantum combinatorial optimization can meaningfully improve causal identification. Classical variable selection methods are outperformed on this clinical genomics task.

1:45 PM

Connected Automated Vehicles as Emergent Control Agents: Reinforcement Learning for Cooperative Traffic Control in Urban Networks

Ayomide Afolabi, Kennesaw State University
Duleep Rathgamage Don, Kennesaw State University
Mahyar Amirgholy, Kennesaw State University

Kennesaw, Georgia

1:45 PM - 3:00 PM

Traffic congestion continues to pose a significant challenge, particularly in urban areas characterized by large populations and increasing vehicle ownership. Despite the implementation of various traffic management systems to address this issue, there remains substantial potential to introduce more advanced, intelligent and data-driven traffic management systems. These systems could further mitigate traffic congestion when integrated with existing measures. In this paper, we propose a reinforcement learning-based system (CAVRLS) that uses connected automated vehicles (CAVs) in mixed traffic environments to enable real-time centralized cooperative traffic control with the aim of reducing congestion in an inner congested network encircled by an outer uncongested network. To demonstrate CAVRLS integrated with Adaptive Traffic Signal Control Systems (ATCS) effectiveness, we simulate a traffic scenario characterized by a congested inner traffic network encircled by an uncongested outer traffic network. Subsequently, we utilize a Reinforce algorithm to control the following distance (spacing) for CAVs within the congested inner network based on the traffic state of the general network. Our experiment results show that our CAVRLS integrated with ATCS reduces traffic density, increases vehicular speed and traffic flow during the course of the centralized cooperative control implementation in the Boston traffic inner network compared to only using ATCS.

1:45 PM

Distance-Based Clustering for Identifying Patterns in Lifetime Smoking Trajectories

Maryam Eghbalizarch, Kennesaw State University
Niloufar Eghbali, Michigan State University

Kennesaw, Georgia

1:45 PM - 3:00 PM

Cigarette smoking remains a major public health issue in the United States, causing more than 480,000 deaths annually and leading to significant healthcare and economic costs. Understanding smoking behavior over the life course is important for developing effective public health interventions. In this study, the Cancer Intervention and Surveillance Modeling Network (CISNET) Smoking History Generator (SHG), which simulates individual life histories of smoking and mortality in the United States, was used to generate the smoking trajectories. A total of 200,000 individuals (100,000 males and 100,000 females) from the 1960 birth cohort were simulated. An unsupervised distance-based clustering (k-means) algorithm was developed to learn patterns in the smoking trajectories and to group individuals with similar smoking behaviors. The optimal number of clusters was determined using the elbow method and silhouette analysis. Cluster validity was evaluated using the silhouette score, Davies-Bouldin index (DBI), and Calinski-Harabasz index (CHI), yielding values of 0.38, 0.95, and 41,680.45, respectively, indicating acceptable cluster separation and compactness. The results revealed several distinct smoking trajectory patterns that resemble those observed in real populations, including light, moderate, heavy, early quitter, and long-term smoking behaviors. Identifying these trajectory groups provides valuable insights into how smoking behavior evolves over the life course. Such information can help public health to better target prevention and smoking cessation strategies.

1:45 PM

Voice-Controlled Unmanned Aerial System Using a Fine-Tuned Small Language Model for Real-Time Command Parsing

Owais Ahmed, Kennesaw State University
Joseph Stanziano, Kennesaw State University
Ram Sudharsanan, Kennesaw State University
Raj Kondragunta, Kennesaw Mountain High School STEM Magnet Program
Adeel Khalid, Kennesaw State University

Kennesaw, Georgia

1:45 PM - 3:00 PM

This paper presents the concept and architectural design of a voice-controlled Unmanned Aerial System (UAS) that leverages a fine-tuned Small Language Model (SLM) to convert natural language voice commands into structured MAVLink flight instructions in real time. A custom quadcopter platform has been designed and assembled with a Pixhawk flight controller, and a web-based interface has been developed to integrate browser-based speech recognition with on-device SLM inference. We describe a systematic evaluation methodology for five candidate SLMs spanning encoder-decoder and decoder-only architectures, a custom dataset of 5,450 labeled drone command samples covering 11 operational command types and an unknown rejection class, and a QLoRA-based fine-tuning pipeline targeting the best-performing candidate. A dual-layer rejection architecture is proposed to ensure that non-command inputs are reliably filtered. A key advantage of SLMs is their compact footprint: the models evaluated in this work are small enough to run inference on a standard CPU, although a consumer GPU can optionally be used to accelerate processing. The complete system is designed to operate entirely on-device without cloud connectivity, aiming to demonstrate the feasibility of deploying fine-tuned SLMs for safety-critical voice interfaces on edge hardware.

1:45 PM

XAgentMaint: Explainable Agentic Systems for Predictive Maintenance in Industrial Digital Twin Environments

Shikhar Patel, Northeastern University
Sankeerth Adisha, Northeastern University

Kennesaw, Georgia

1:45 PM - 3:00 PM

Predictive maintenance in manufacturing demands more than accurate failure forecasting. Maintenance engineers require transparent and auditable decision rationale to act confidently. Industrial digital twin environments produce rich and continuous streams of sensor data. Existing AI models typically surface only numerical anomaly scores without contextual explanation. This paper presents XAgentMaint, an explainable agentic AI framework for predictive maintenance. The system integrates a Retrieval-Augmented Generation knowledge base with a ReAct reasoning agent. An XGBoost failure classifier serves as the core prediction component throughout the pipeline. All three components are evaluated on the NASA CMAPSS turbofan degradation dataset across all four sub-datasets (FD001-FD004), spanning single-fault and multi-fault scenarios. The RAG layer indexes 1,800 maintenance records and six failure mode documents for retrieval. Retrieved evidence translates classifier predictions into natural language explanations for engineers. A logged decision trail accompanies each explanation to support regulatory auditability. XAgentMaint achieves a Root Mean Square Error of 14.8 Remaining Useful Life cycles on FD001 and demonstrates consistent performance under multi-fault conditions. A systematic ablation study isolates the contribution of each system component: the XGBoost classifier, RAG retrieval layer, and ReAct reasoning agent. Comparative analysis against SHAP-based explanation pipelines demonstrates that XAgentMaint produces explanations rated significantly more actionable by domain practitioners. Explanation fidelity reaches 83.7% as rated by three independent domain expert evaluators. A 22-participant user study measures the practical impact of explanations on engineer performance. The XAgentMaint group completes diagnostic tasks 28% faster than the numerical-score-only group. This difference is statistically significant with p < 0.01 across all tested scenarios. Discussion of locally deployable language models and real industrial maintenance log integration addresses key deployment considerations for production environments.

3:00 PM

Adaptive Task Relevance Modeling in Sequential Neural Training

Priyanka Thakur, University of the Cumberlands

Kennesaw, Georgia

3:00 PM - 4:15 PM

Sequential neural training frameworks typically emphasize the preservation of performance across previously encountered tasks as the primary indicator of learning stability. However, in environments characterized by evolving objectives and task distributions, intelligent systems may benefit from dynamically adapting internal representations in accordance with changing task relevance. This paper explores the role of adaptive task relevance modeling in continual neural learning, where sequentially trained models are permitted to reconfigure behavioral policies based on shifts in task importance over time. We evaluate the impact of relevance-aware adaptation through sequential training experiments conducted on multiple classification datasets under varying adaptation constraints. Results indicate that models incorporating adaptive task relevance mechanisms achieve improved performance on later-stage tasks despite observable reductions in earlier task accuracy.

3:00 PM

InvisGrid: Invisible AI Embeddings in Human-Machine Collaboration for Smart Grid Energy Optimization

Nikunj Doshi, Northeastern University
Kavach Shah, Boston University
Shikhar Patel, Northeastern University

Kennesaw, Georgia

3:00 PM - 4:15 PM

AI advisory tools for smart grid management face a persistent and widely recognized problem. Operators frequently ignore recommendations surfaced through separate AI dashboard panels. This rejection arises from alert fatigue induced by disrupted SCADA workflows and heightened cognitive load. Operators managing complex real-time systems already carry a substantial mental burden throughout their shifts. Adding a new panel to monitor compounds that burden rather than alleviating it meaningfully. This paper proposes invisible AI, a paradigm that resolves this adoption problem differently. Instead of introducing new interface components, AI intelligence is embedded into existing visual elements. We present InvisGrid, which fine-tunes a compact language model on twelve months of simulation logs. The underlying data originates from GridLAB-D smart grid simulations spanning a complete operational year. The model learns to predict demand spikes, renewable curtailment windows, and voltage excursions accurately. Predictions are surfaced as ambient visual cues within existing SCADA interface elements only. These cues include color gradients, micro-animations, and enhanced tooltips on hover interactions. No new UI components are introduced to the operator's dashboard at any point during use. We evaluate InvisGrid in a 100-node, 60 MW GridLAB-D distribution network simulation environment. Thirty-two simulated operator interactions are collected across a between-subjects study design. InvisGrid reduces grid imbalance events by 38 percent versus the baseline condition without AI. It also reduces imbalance events by 15 percent versus an explicit AI panel condition. Operators complete tasks 24 percent faster when using InvisGrid compared to the explicit panel. A NASA Task Load Index evaluation confirms a 44 percent cognitive load reduction versus the panel. These findings establish invisible AI embedding as a high-adoption paradigm for energy systems.

3:00 PM

Multimodal Speech-Based Dementia Detection Using wav2vec2 and Demographic Features

Carlos Blanco, Kennesaw State University
Prajwal Shetty, Kennesaw State University
Tyler Hood, Kennesaw State University

Kennesaw, Georgia

3:00 PM - 4:15 PM

Early detection of dementia is essential for timely clinical intervention, yet existing diagnostic procedures remain costly and resource-intensive. Speech-based analysis has emerged as a promising non-invasive alternative due to its sensitivity to cognitive decline. In this work, we propose a multimodal framework that integrates self-supervised speech representations from wav2vec2 with demographic metadata, including age, gender, and ethnicity. We evaluate our approach against a strong audio-only baseline under a controlled experimental setup. Results demonstrate a +12.5% improvement in validation accuracy and consistent gains in macro F1-score. These findings indicate that demographic information provides complementary predictive signals beyond acoustic features alone, highlighting the effectiveness of multimodal approaches for scalable dementia screening.

3:00 PM

Voice as a Bio-Marker of Surgical Recovery: Classifying Post-Operative ENT Patients Using Vowel Acoustics

Habeeb Kotun Jr., Meharry Medical College
Shumit Saha, Kennesaw State University

Kennesaw, Georgia

3:00 PM - 4:15 PM

Voice has garnered significant interest as a biomarker for diagnosing and managing neurological, respiratory, and cardiovascular diseases. However, upper airway surgeries such as septoplasty, tonsillectomy, and functional endoscopic sinus surgery (FESS) can substantially alter vocal tract anatomy, thereby changing vocal acoustics and potentially confounding voice-based diagnostic systems. This study investigates whether acoustic features from sustained vowels can differentiate between these procedures and a control group. Audio was collected from 105 Spanish-speaking participants 15 days pre surgery and 15 days post surgery. For each vowel, 121 chroma, cepstral, spectral, and temporal features were extracted and then reduced to 90 longitudinally responsive features using mixed repeated-measures ANOVA and post-hoc pairwise contrasts. Ten supervised classifiers (Random Forest, Logistic Regression, Decision Tree, Gaussian Naive Bayes, K-Nearest Neighbors, SVM with linear and RBF kernels, LightGBM, XGBoost, CatBoost) were evaluated for Control vs. Surgery at 15 days post-op, using stratified 5-fold cross-validation with F1 score as the primary metric. At 15 days post-surgery, classification of control versus surgical patients achieved F1 scores in the high-60% to low-70% range for the best models, with LightGBM, CatBoost, and KNN providing the strongest overall performance. Model interpretability using SHapley Additive exPlanations indicated that this early discrimination was driven by a compact set of vowel-based features, particularly MFCC statistics, higher-order delta/delta-squared coefficients, and chroma intensity for specific pitch classes. These results provide proof of concept that surgery-aware voice analytics can detect early post-operative status from sustained vowels and underscore the value of incorporating surgical history and recovery stage into future voice-based monitoring systems.