GRM-012 (TCC) Transformer Embedded Synthetic Source Code Multiclass Classification

Rene LisasiFollow
Patrick WuFollow

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

Streaming Media

Document Type

Event

Start Date

15-4-2025 4:00 PM

Description

Recent advances in large language models have significantly increased their capability to write code. While tools such as ChatGPT are useful and represent increased efficiency for many programmers, they represent a major issue when used in academically dishonest ways. To solve the problem of identifying code written by language models, we offer a novel, light-weight classification solution based on a transformer architecture. We compare the performance of three separate transformer models (GraphCodeBERT, PLBART, and CodeBERT) for tokenization and processing and then perform classification using a random forest classifier. Preliminary results indicate that the GraphCodeBERT-based model has a 100% test and train accuracy on detecting human or AI generated code and PLBART has 100% train with 95% test F1-score on categories of AI generators like chatbot, model, IDE extension, or human

Download

Included in

Computer Sciences Commons

COinS

Apr 15th, 4:00 PM

GRM-012 (TCC) Transformer Embedded Synthetic Source Code Multiclass Classification

https://www.kennesaw.edu/ccse/events/computing-showcase/sp25-cday-program.php

GRM-012 (TCC) Transformer Embedded Synthetic Source Code Multiclass Classification

Location

Streaming Media

Document Type

Start Date

Description

Included in

C-Day Links

Search

Authors

Browse

Links

GRM-012 (TCC) Transformer Embedded Synthetic Source Code Multiclass Classification

Presenter Information

Location

Streaming Media

Document Type

Start Date

Description

Included in

Share

C-Day Links

Search

Authors

Browse

Links