Location
https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php
Document Type
Event
Start Date
24-11-2025 4:00 PM
Description
Graphics Processing Unit (GPU) resources in High-Performance Computing (HPC) systems are frequently underutilized due to inaccurate user-provided run time estimates. This research develops a machine learning framework for predicting neural network training time from architectural features, dataset size, and other hyperparameters. This approach can be implemented on any HPC systems without requiring hardware access or runtime profiling as other preceding methods do. We sampled neural network models from the NATS-Bench benchmark and used 3 benchmark datasets to generate 400 training configurations. We used these 400 data points to build regression models and found that the best model, Gradient Boosting Regressor, can achieve an R² of 0.961 with prediction errors averaging less than one minute, demonstrating the feasibility of this proposed framework.
Included in
GRP-0214 User-Level GPU Right-Sizing in HPC: A Framework for Predicting Training Runtime
https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php
Graphics Processing Unit (GPU) resources in High-Performance Computing (HPC) systems are frequently underutilized due to inaccurate user-provided run time estimates. This research develops a machine learning framework for predicting neural network training time from architectural features, dataset size, and other hyperparameters. This approach can be implemented on any HPC systems without requiring hardware access or runtime profiling as other preceding methods do. We sampled neural network models from the NATS-Bench benchmark and used 3 benchmark datasets to generate 400 training configurations. We used these 400 data points to build regression models and found that the best model, Gradient Boosting Regressor, can achieve an R² of 0.961 with prediction errors averaging less than one minute, demonstrating the feasibility of this proposed framework.