As part of the development of document management software at EBA d.o.o. Ljubljana, an issue was identified in extracting data from tables in various scanned documents. Until now, a proprietary OCR model was used to address this problem, but it no longer achieves the desired results. Therefore, an analysis of existing solutions was conducted, and Microsoft’s Table Transformer was identified as the most suitable option. This is a deep learning model designed for object detection, used for recognizing tables in PDF files and images. In this thesis, Microsoft’s Table Transformer was studied, adapted, and tested for use in the EBA DMS document management system. A dataset of 296 documents, annotated using VIA (VGG Image Annotator), was used for training the model. For testing, 50 documents prepared by the company were used. The results showed that the Table Transformer achieved slightly lower accuracy in table recognition compared to the existing OCR system, but it achieved slightly higher accuracy in recognizing table structures. Nonetheless, the existing OCR model still slightly outperforms the Table Transformer in table recognition. Based on these findings, it was decided to retain the existing OCR model for now, while continuing to research and improve table recognition methods.
|