# Similarity of Research Papers Based on Vector Space Model

## Disciplines

Discrete Mathematics and Combinatorics | Theory and Algorithms

## Abstract (300 words maximum)

A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.

## Academic department under which the project should be listed

CSM - Mathematics

## Primary Investigator (PI) Name

Dr. Joe DeMaio

Similarity of Research Papers Based on Vector Space Model

A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.