Presenter Information

MOHAMMOD AKIB KHANFollow

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Document Type

Event

Start Date

24-11-2025 4:00 PM

Description

Modern applications rely on AI models that must perform real-time predictions on resource-constrained edge devices like laptops. The default OS scheduler often increases context switching, which slows down deep learning and LLM inference. Since these models depend heavily on parallel processing, efficient CPU scheduling becomes essential. In this project, we analyzed how core pinning and thread-level parallelism improve inference performance on a Windows system. Using multiple micro-batch sizes. We compare latency, throughput, and per-sample inference time. The goal is to show how simple OS-level optimization can significantly improve real-time performance for both deep learning models and LLM.

Share

COinS
 
Nov 24th, 4:00 PM

GRP-1265 Optimizing CPU Scheduling for Deep Learning and LLM Inference Using ONNX Runtime

https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php

Modern applications rely on AI models that must perform real-time predictions on resource-constrained edge devices like laptops. The default OS scheduler often increases context switching, which slows down deep learning and LLM inference. Since these models depend heavily on parallel processing, efficient CPU scheduling becomes essential. In this project, we analyzed how core pinning and thread-level parallelism improve inference performance on a Windows system. Using multiple micro-batch sizes. We compare latency, throughput, and per-sample inference time. The goal is to show how simple OS-level optimization can significantly improve real-time performance for both deep learning models and LLM.