Streaming Media

Event Website

https://github.com/JoelStansbury/ipypdf

Document Type

Event

Start Date

28-4-2022 5:00 PM

Description

In this paper, we provide a rule-based algorithm for parsing out tabular data from images. Recent advances in Graph Neural Networks have provided significant improvement over previously attainable accuracy and still outperform the best rule-based methodology. Nevertheless, light-weight pipelines, such as the model proposed here, have their utility in terms of ease of integration, low memory requirements, and no dependence on GPU availability. The proposed model achieves 22.3% accuracy on a subset of the PubTabNet dataset. The metric used to quantify accuracy refers to the ability to perfectly identify the number of column and row boundaries present in the image. Advantages over previous algorithms include relative invariance to hierarchical relationships and complete invariance to the presence of gridlines.

Share

COinS
 
Apr 28th, 5:00 PM

GR-164 - Rule-based table parsing

In this paper, we provide a rule-based algorithm for parsing out tabular data from images. Recent advances in Graph Neural Networks have provided significant improvement over previously attainable accuracy and still outperform the best rule-based methodology. Nevertheless, light-weight pipelines, such as the model proposed here, have their utility in terms of ease of integration, low memory requirements, and no dependence on GPU availability. The proposed model achieves 22.3% accuracy on a subset of the PubTabNet dataset. The metric used to quantify accuracy refers to the ability to perfectly identify the number of column and row boundaries present in the image. Advantages over previous algorithms include relative invariance to hierarchical relationships and complete invariance to the presence of gridlines.

https://digitalcommons.kennesaw.edu/cday/Spring_2022/Graduate-Research/3