Location
https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php
Streaming Media
Document Type
Event
Start Date
15-4-2025 4:00 PM
Description
Recent advances in large language models have significantly increased their capability to write code. While tools such as ChatGPT are useful and represent increased efficiency for many programmers, they represent a major issue when used in academically dishonest ways. To solve the problem of identifying code written by language models, we offer a novel, light-weight classification solution based on a transformer architecture. We compare the performance of three separate transformer models (GraphCodeBERT, PLBART, and CodeBERT) for tokenization and processing and then perform classification using a random forest classifier. Preliminary results indicate that the GraphCodeBERT-based model has a 100% test and train accuracy on detecting human or AI generated code and PLBART has 100% train with 95% test F1-score on categories of AI generators like chatbot, model, IDE extension, or human
Included in
GRM-012 (TCC) Transformer Embedded Synthetic Source Code Multiclass Classification
https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php
Recent advances in large language models have significantly increased their capability to write code. While tools such as ChatGPT are useful and represent increased efficiency for many programmers, they represent a major issue when used in academically dishonest ways. To solve the problem of identifying code written by language models, we offer a novel, light-weight classification solution based on a transformer architecture. We compare the performance of three separate transformer models (GraphCodeBERT, PLBART, and CodeBERT) for tokenization and processing and then perform classification using a random forest classifier. Preliminary results indicate that the GraphCodeBERT-based model has a 100% test and train accuracy on detecting human or AI generated code and PLBART has 100% train with 95% test F1-score on categories of AI generators like chatbot, model, IDE extension, or human