izpis_h1_title_alt

Ekstrakcija gradnikov PDF datotek
ID Kastelec, Erik (Author), ID Mihelič, Jurij (Mentor) More about this mentor... This link opens in a new window, ID Preželj, Andrej (Co-mentor)

.pdfPDF - Presentation file, Download (13,41 MB)
MD5: 8C0426CA2E82ABF17EF17277AF029A71

Abstract
PDF dokumenti predstavljajo velik del dokumentov v podjetjih in na spletu. Vsebinski podatki iz dokumentov so težko berljivi s pomočjo programske opreme, kar otežuje analizo in iskanje po dokumentih. Podjetja so si želela iskanja nizov v besedilu in tabelah, a odprtokodne rešitve, ki bi to omogočala v celoti, ni bilo. Obstajale so številne rešitve, ki rešujejo del problema, npr. ekstrakcijo besedila, tabel in analizo OCR. Obstoječe metode so bile smiselno nadgrajene in povezane v program in knjižnico PDFScraper, ki proces ekstrakcije in iskanja gradnikov poenostavi. Programska rešitev omogoča široko podporo različnim tipom dokumentov, kjer se dokument primerno pripravi, analizira in omogoči iskanje po njegovih gradnikih.

Language:Slovenian
Keywords:PDF, ekstrakcija, OCR
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2020
PID:20.500.12556/RUL-120076 This link opens in a new window
COBISS.SI-ID:31440899 This link opens in a new window
Publication date in RUL:15.09.2020
Views:866
Downloads:163
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Extraction of elements from PDF documents
Abstract:
PDF documents represent the majority of business and online documents. They focus on a visual representation of a document and do not contain structural information, which complicates analysis by computer software. Companies were looking for an open-source solution for searching through the content inside tables and text, which was not available. A lot of needed functionality was already available and was used and improved to implement an all in one solution called PDFScraper, which contains an easy to use program, as well as a backend library. PDFScraper supports different formats of input, which are appropriately transformed and analysed to make searching possible.

Keywords:PDF, extraction, OCR

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back