izpis_h1_title_alt

Knowledge graph-based document embedding enrichment
ID KOLOSKI, BOSHKO (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window, ID Škrlj, Blaž (Co-mentor)

.pdfPDF - Presentation file, Download (2,99 MB)
MD5: 9CF17C0B03F5D17EBE1611B6A385261A

Abstract
Structured and unstructured textual data requires efficient representation for computation and manipulation. Many different methods have been developed to represent text in numerical form. Some of these methods are based only on statistical metrics, and some introduce the concept of word context. Structured textual data about concepts and entities is stored in knowledge graphs for which different numerical representations have been developed. By using the facts about concepts, semantics can be introduced into the representation of documents. We propose an approach that merges the knowledge base induced numerical representation of texts and entities that appear in the texts, induced from knowledge bases. We analyze the proposed method using two use cases. The results show that the use of external knowledge significantly improves the performance of machine learning models. We show that the proposed method outperforms non-enriched representations.

Language:English
Keywords:knowledge graphs, word embedding, knowledge graph embedding, natural language processing
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2020
PID:20.500.12556/RUL-119701 This link opens in a new window
COBISS.SI-ID:30743555 This link opens in a new window
Publication date in RUL:10.09.2020
Views:1187
Downloads:259
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Obogatitev dokumentnih vložitev z grafi znanja
Abstract:
Strukturirani in nestrukturirani tekstovni podatki zahtevajo učinkovito predstavitev za računanje in obdelavo. Za predstavitev besedila v številčni obliki, je bilo razvitih veliko različnih metod. Del teh metod temelji zgolj na statističnih metrikah, nekatere pa uvedejo koncept konteksta besede. Strukturirane tekstovni podatki o konceptih in entitetah so shranjeni v grafih znanja, za katere so bile razvite številne numerične predstavitve. Z uporabo dejstev o konceptih lahko semantiko vnesemo v predstavitev dokumentov. Predlagamo pristop, ki združuje številčno predstavitev besedil in entitet, ki se pojavljajo v besedilih iz baz znanja. Predlagano metodo analiziramo s pomočjo dveh primerov uporabe. Rezultati kažejo, da uporaba zunanjega znanja bistveno izboljša uspešnost modelov strojnega učenja. Poleg tega pokažemo, da predlagana metoda presega neobogatene predstavitve.

Keywords:podatkovni grafi, vektorske vložitve besed, vložitve podatkovnih grafov, procesiranje naravnega jezika

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back