Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Prepoznavanje tabel in njihovih struktur iz dokumentov v formatu PDF in slik
ID
Korbar, Bogdan
(
Author
),
ID
Žabkar, Jure
(
Mentor
)
More about this mentor...
PDF - Presentation file,
Download
(1,52 MB)
MD5: 5DC415E590D4966ACA7124E333603990
Image galllery
Abstract
V okviru razvoja programske opreme za upravljanje z dokumenti pri podjetju EBA d.o.o. Ljubljana je bila zaznana težava pri pridobivanju podatkov iz tabel v različnih skeniranih dokumentih. Do sedaj je bil za ta problem uporabljen lasten OCR model, ki pa ne dosega več želenih rezultatov. Zato je bila izvedena analiza obstoječih rešitev, pri čemer se je za najbolj primerno izkazal Microsoftov Table Transformer. To je model globokega učenja, zasnovan za zaznavanje objektov, ki se uporablja za prepoznavanje tabel v PDF-datotekah in slikah. V diplomski nalogi je bil Microsoftov Table Transformer preučen, prilagojen in testiran za uporabo v dokumentnem sistemu EBA DMS. Za učenje modela je bil uporabljen nabor 296 dokumentov, označen s pomočjo VIA (VGG Image Annotator). Za testiranje pa je bilo uporabljenih 50 dokumentov, pripravljenih v podjetju. Rezultati so pokazali, da je Table Transformer dosegel nekoliko nižjo natančnost pri prepoznavi tabel v primerjavi z obstoječim OCR sistemom, vendar je dosegel nekoliko višjo natančnost pri prepoznavi strukture tabel. Kljub temu obstoječi OCR model pri prepoznavi tabel še vedno nekoliko prekaša Table Transformer. Na podlagi teh ugotovitev je bilo sklenjeno, da se obstoječi OCR model zaenkrat obdrži, ob nadaljnjem raziskovanju in izboljševanju metod prepoznavanja tabel.
Language:
Slovenian
Keywords:
prepoznavanje tabel
,
pridobivanje podatkov
,
globoko učenje
,
Table Transformer
,
OCR
,
digitalno upravljanje z dokumenti
,
vrednotenje
Work type:
Bachelor thesis/paper
Typology:
2.11 - Undergraduate Thesis
Organization:
FRI - Faculty of Computer and Information Science
Year:
2024
PID:
20.500.12556/RUL-161307
COBISS.SI-ID:
211904259
Publication date in RUL:
09.09.2024
Views:
409
Downloads:
77
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
KORBAR, Bogdan, 2024,
Prepoznavanje tabel in njihovih struktur iz dokumentov v formatu PDF in slik
[online]. Bachelor’s thesis. [Accessed 19 May 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=161307
Copy citation
Share:
Secondary language
Language:
English
Title:
Recognition of Tables and their structure from PDF documents and images
Abstract:
As part of the development of document management software at EBA d.o.o. Ljubljana, an issue was identified in extracting data from tables in various scanned documents. Until now, a proprietary OCR model was used to address this problem, but it no longer achieves the desired results. Therefore, an analysis of existing solutions was conducted, and Microsoft’s Table Transformer was identified as the most suitable option. This is a deep learning model designed for object detection, used for recognizing tables in PDF files and images. In this thesis, Microsoft’s Table Transformer was studied, adapted, and tested for use in the EBA DMS document management system. A dataset of 296 documents, annotated using VIA (VGG Image Annotator), was used for training the model. For testing, 50 documents prepared by the company were used. The results showed that the Table Transformer achieved slightly lower accuracy in table recognition compared to the existing OCR system, but it achieved slightly higher accuracy in recognizing table structures. Nonetheless, the existing OCR model still slightly outperforms the Table Transformer in table recognition. Based on these findings, it was decided to retain the existing OCR model for now, while continuing to research and improve table recognition methods.
Keywords:
table recognition
,
data extraction
,
deep learning
,
Table Transformer
,
OCR
,
digital document management
,
evaluation
Similar documents
Similar works from RUL:
Searching for similar works...
Similar works from other Slovenian collections:
Back