Knowledge graph-based document embedding enrichment

KOLOSKI, BOSHKO

Knowledge graph-based document embedding enrichment
ID KOLOSKI, BOSHKO (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

, ID Škrlj, Blaž (Comentor)

PDF - Presentation file, Download (2,99 MB)
MD5: 9CF17C0B03F5D17EBE1611B6A385261A

Abstract

Structured and unstructured textual data requires efficient representation for computation and manipulation. Many different methods have been developed to represent text in numerical form. Some of these methods are based only on statistical metrics, and some introduce the concept of word context. Structured textual data about concepts and entities is stored in knowledge graphs for which different numerical representations have been developed. By using the facts about concepts, semantics can be introduced into the representation of documents. We propose an approach that merges the knowledge base induced numerical representation of texts and entities that appear in the texts, induced from knowledge bases. We analyze the proposed method using two use cases. The results show that the use of external knowledge significantly improves the performance of machine learning models. We show that the proposed method outperforms non-enriched representations.

Language:	English
Keywords:	knowledge graphs, word embedding, knowledge graph embedding, natural language processing
Work type:	Bachelor thesis/paper
Typology:	2.11 - Undergraduate Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2020
PID:	20.500.12556/RUL-119701
COBISS.SI-ID:	30743555
Publication date in RUL:	10.09.2020
Views:	1641
Downloads:	288
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Obogatitev dokumentnih vložitev z grafi znanja
Strukturirani in nestrukturirani tekstovni podatki zahtevajo učinkovito predstavitev za računanje in obdelavo. Za predstavitev besedila v številčni obliki, je bilo razvitih veliko različnih metod. Del teh metod temelji zgolj na statističnih metrikah, nekatere pa uvedejo koncept konteksta besede. Strukturirane tekstovni podatki o konceptih in entitetah so shranjeni v grafih znanja, za katere so bile razvite številne numerične predstavitve. Z uporabo dejstev o konceptih lahko semantiko vnesemo v predstavitev dokumentov. Predlagamo pristop, ki združuje številčno predstavitev besedil in entitet, ki se pojavljajo v besedilih iz baz znanja. Predlagano metodo analiziramo s pomočjo dveh primerov uporabe. Rezultati kažejo, da uporaba zunanjega znanja bistveno izboljša uspešnost modelov strojnega učenja. Poleg tega pokažemo, da predlagana metoda presega neobogatene predstavitve.
Keywords:	podatkovni grafi, vektorske vložitve besed, vložitve podatkovnih grafov, procesiranje naravnega jezika

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents