Similarity of Research Papers Based on Vector Space Model

Presenters

Primary Investigator (PI) Name

Dr. Joe DeMaio

Department

CSM - Mathematics

Abstract

A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.

Disciplines

Discrete Mathematics and Combinatorics | Theory and Algorithms

This document is currently not available here.

Share

COinS
 

Similarity of Research Papers Based on Vector Space Model

A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.