Location
https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php
Document Type
Event
Start Date
24-11-2025 4:00 PM
Description
Modern applications rely on AI models that must perform real-time predictions on resource-constrained edge devices like laptops. The default OS scheduler often increases context switching, which slows down deep learning and LLM inference. Since these models depend heavily on parallel processing, efficient CPU scheduling becomes essential. In this project, we analyzed how core pinning and thread-level parallelism improve inference performance on a Windows system. Using multiple micro-batch sizes. We compare latency, throughput, and per-sample inference time. The goal is to show how simple OS-level optimization can significantly improve real-time performance for both deep learning models and LLM.
Included in
GRP-1265 Optimizing CPU Scheduling for Deep Learning and LLM Inference Using ONNX Runtime
https://www.kennesaw.edu/ccse/events/computing-showcase/fa25-cday-program.php
Modern applications rely on AI models that must perform real-time predictions on resource-constrained edge devices like laptops. The default OS scheduler often increases context switching, which slows down deep learning and LLM inference. Since these models depend heavily on parallel processing, efficient CPU scheduling becomes essential. In this project, we analyzed how core pinning and thread-level parallelism improve inference performance on a Windows system. Using multiple micro-batch sizes. We compare latency, throughput, and per-sample inference time. The goal is to show how simple OS-level optimization can significantly improve real-time performance for both deep learning models and LLM.