Similarity of Research Papers Based on Vector Space Model

Presenters

    Primary Investigator (PI) Name

    Dr. Joe DeMaio

    Department

    CSM - Mathematics

    Abstract

    A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.

    Disciplines

    Discrete Mathematics and Combinatorics | Theory and Algorithms

    This document is currently not available here.

    Share

    COinS
     

    Similarity of Research Papers Based on Vector Space Model

    A great number of research papers are published in the different field each year in different conferences proceedings and journals. Conferences proceedings and journals have categories and subcategories within categories that contain similar papers. These papers contain inter-class and intra-class similarities among them. This project aims in constructing a network of papers and rigorous model using text mining algorithms and graph theoretical measures to analyze the relationships among a focused section of scientific papers that are published in a limited time duration. We are interested in discovering most common as well the least common topics that are being researched in the focused subsection. In the first phase of the project, we have used Term Frequency-Inverse Document Frequency and Vector Space Model like cosine similarity to build a network and categorize research papers. In the second phase, we use graph theory techniques to find connectivity of the network. Finally, our goal is to realize what makes the network connected or disconnected.