Event Website
https://github.com/JoelStansbury/ipypdf
Document Type
Event
Start Date
28-4-2022 5:00 PM
Description
In this paper, we provide a rule-based algorithm for parsing out tabular data from images. Recent advances in Graph Neural Networks have provided significant improvement over previously attainable accuracy and still outperform the best rule-based methodology. Nevertheless, light-weight pipelines, such as the model proposed here, have their utility in terms of ease of integration, low memory requirements, and no dependence on GPU availability. The proposed model achieves 22.3% accuracy on a subset of the PubTabNet dataset. The metric used to quantify accuracy refers to the ability to perfectly identify the number of column and row boundaries present in the image. Advantages over previous algorithms include relative invariance to hierarchical relationships and complete invariance to the presence of gridlines.
Included in
GR-164 - Rule-based table parsing
In this paper, we provide a rule-based algorithm for parsing out tabular data from images. Recent advances in Graph Neural Networks have provided significant improvement over previously attainable accuracy and still outperform the best rule-based methodology. Nevertheless, light-weight pipelines, such as the model proposed here, have their utility in terms of ease of integration, low memory requirements, and no dependence on GPU availability. The proposed model achieves 22.3% accuracy on a subset of the PubTabNet dataset. The metric used to quantify accuracy refers to the ability to perfectly identify the number of column and row boundaries present in the image. Advantages over previous algorithms include relative invariance to hierarchical relationships and complete invariance to the presence of gridlines.
https://digitalcommons.kennesaw.edu/cday/Spring_2022/Graduate-Research/3