Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Document Type

Event

Start Date

24-11-2025 4:00 PM

Description

Modern applications rely on AI models that must perform real-time predictions on resource-constrained edge devices like laptops. The default OS scheduler often increases context switching, which slows down deep learning and LLM inference. Since these models depend heavily on parallel processing, efficient CPU scheduling becomes essential. In this project, we analyzed how core pinning and thread-level parallelism improve inference performance on a Windows system. Using multiple micro-batch sizes. We compare latency, throughput, and per-sample inference time. The goal is to show how simple OS-level optimization can significantly improve real-time performance for both deep learning models and LLM.

Download

Included in

Computer Sciences Commons

COinS

Nov 24th, 4:00 PM

GRP-1265 Optimizing CPU Scheduling for Deep Learning and LLM Inference Using ONNX Runtime

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

C-Day Fall 2025 Doctoral Research

GRP-1265 Optimizing CPU Scheduling for Deep Learning and LLM Inference Using ONNX Runtime

Location

Document Type

Start Date

Description

Included in

C-Day Links

Search

Authors

Browse

Links

C-Day Fall 2025 Doctoral Research

GRP-1265 Optimizing CPU Scheduling for Deep Learning and LLM Inference Using ONNX Runtime

Presenter Information

Location

Document Type

Start Date

Description

Included in

Share

C-Day Links

Search

Authors

Browse

Links