Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Streaming Media

Document Type

Event

Start Date

24-11-2025 4:00 PM

Description

Graphics Processing Unit (GPU) resources in High-Performance Computing (HPC) systems are frequently underutilized due to inaccurate user-provided run time estimates. This research develops a machine learning framework for predicting neural network training time from architectural features, dataset size, and other hyperparameters. This approach can be implemented on any HPC systems without requiring hardware access or runtime profiling as other preceding methods do. We sampled neural network models from the NATS-Bench benchmark and used 3 benchmark datasets to generate 400 training configurations. We used these 400 data points to build regression models and found that the best model, Gradient Boosting Regressor, can achieve an R² of 0.961 with prediction errors averaging less than one minute, demonstrating the feasibility of this proposed framework.

Download

Included in

Computer Sciences Commons

COinS

Nov 24th, 4:00 PM

GRP-0214 User-Level GPU Right-Sizing in HPC: A Framework for Predicting Training Runtime

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

C-Day Fall 2025 Doctoral Research

GRP-0214 User-Level GPU Right-Sizing in HPC: A Framework for Predicting Training Runtime

Location

Streaming Media

Document Type

Start Date

Description

Included in

C-Day Links

Search

Authors

Browse

Links

C-Day Fall 2025 Doctoral Research

GRP-0214 User-Level GPU Right-Sizing in HPC: A Framework for Predicting Training Runtime

Presenter Information

Location

Streaming Media

Document Type

Start Date

Description

Included in

Share

C-Day Links

Search

Authors

Browse

Links