Similarity of Research Papers Based on Vector Space Model

CSM - Mathematics

Dr. Joe DeMaio

A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.

Discrete Mathematics and Combinatorics | Theory and Algorithms

