24th Annual Symposium of Student Scholars - 2020

Topic clustering of COVID-19 open research dataset(CORD-19) using graph clustering approach

Srivatsa Mallapragada, Kennesaw State UniversityFollow

Disciplines

Abstract (300 words maximum)

Topic clustering is an important approach in text analytics, because labeled documents are rarely available to classify documents for a specific problem. Current problem across the world is the global pandemic COVID-19 disease caused by novel coronavirus, opened up specific problems related to the COVID-19 research. A large corpus of scientific research articles were released as dataset to the world for finding best research articles to support the corona virus vaccine research. This paper utilizes the tf-idf preprocessing technique to create similarity matrix, which is used as weighted edge adjacency matrix for graph clustering. K-Means as a standalone method was also used to compare the results with the graph clustering algorithms. The clustering efficiency is measured by inter and intra-clustering distance metrics. Decision trees are used on the clustered data to compare the clustering algorithms based on the classification accuracy. Finally the conclusions and future directions are provided to retrieve documents specific for COVID-19 out of the entire corpus.

Academic department under which the project should be listed

CCSE - Data Science and Analytics

Primary Investigator (PI) Name

Dr. Joe DeMaio

This document is currently not available here.

COinS

Topic clustering of COVID-19 open research dataset(CORD-19) using graph clustering approach

Symposium of Student Scholars

24th Annual Symposium of Student Scholars - 2020

Topic clustering of COVID-19 open research dataset(CORD-19) using graph clustering approach

Disciplines

Abstract (300 words maximum)

Academic department under which the project should be listed

Primary Investigator (PI) Name

Search

Authors

Browse

Links

At a Glance

Paper of the Day

Symposium of Student Scholars

24th Annual Symposium of Student Scholars - 2020

Topic clustering of COVID-19 open research dataset(CORD-19) using graph clustering approach

Presenters

Disciplines

Abstract (300 words maximum)

Academic department under which the project should be listed

Primary Investigator (PI) Name

Share

Search

Authors

Browse

Links

At a Glance

Paper of the Day