Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Streaming Media

Document Type

Event

Start Date

24-11-2025 4:00 PM

Description

Graphics Processing Unit (GPU) resources in High-Performance Computing (HPC) systems are frequently underutilized due to inaccurate user-provided run time estimates. This research develops a machine learning framework for predicting neural network training time from architectural features, dataset size, and other hyperparameters. This approach can be implemented on any HPC systems without requiring hardware access or runtime profiling as other preceding methods do. We sampled neural network models from the NATS-Bench benchmark and used 3 benchmark datasets to generate 400 training configurations. We used these 400 data points to build regression models and found that the best model, Gradient Boosting Regressor, can achieve an R² of 0.961 with prediction errors averaging less than one minute, demonstrating the feasibility of this proposed framework.

Share

COinS
 
Nov 24th, 4:00 PM

GRP-0214 User-Level GPU Right-Sizing in HPC: A Framework for Predicting Training Runtime

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Graphics Processing Unit (GPU) resources in High-Performance Computing (HPC) systems are frequently underutilized due to inaccurate user-provided run time estimates. This research develops a machine learning framework for predicting neural network training time from architectural features, dataset size, and other hyperparameters. This approach can be implemented on any HPC systems without requiring hardware access or runtime profiling as other preceding methods do. We sampled neural network models from the NATS-Bench benchmark and used 3 benchmark datasets to generate 400 training configurations. We used these 400 data points to build regression models and found that the best model, Gradient Boosting Regressor, can achieve an R² of 0.961 with prediction errors averaging less than one minute, demonstrating the feasibility of this proposed framework.