Odprta ekstrakcija informacij za slovenski jezik

BOGATAJ, MIHA

Repository of the University of Ljubljana

Details

Odprta ekstrakcija informacij za slovenski jezik
ID BOGATAJ, MIHA (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,85 MB)
MD5: 52E65E155B5E79AAF1553628D3AB8FB6

Abstract

Odprta ekstrakcija informacij je proces procesiranja naravnega jezika, ki iz posameznih povedi izvleče možne odvisnosti. Odvisnosti so sestavljene iz semantične trojice, kjer prvi člen predstavlja subjekt o katerem poizvedujemo, relacije, ki opiše, kako se prvi člen navezuje na tretjega, in objekt. Sistem odprte ekstrakcije informacij za slovenščino temelji na metodi na podlagi pravil. Sistem je sestavljen iz predprocesorja in ekstraktorja. Vloga predprocesorja je obdelava vhodnega besedila s pomočjo sistema CLASSLA, ki slovnično analizira poved, lematizacija in izgradnja semantičnega drevesa. Vloga ekstraktorja je, da z uporabo pravil poišče relacije v povedi. Ta pravila so bolj kompleksna kot v angleščini, ker je v slovenščini besedni red bolj prost. Slovenščina pozna tudi več sklanjatev, ki omogočajo bolj točno določitev subjekta in objekta. Med najdenimi ekstrakcijami je možno iskanje na dva načina: iskanje povedi in dopolnjevanje parametrov. Iskanje povedi zahteva izpolnjene vse parametre semantične trojice in vrne seznam povedi, ki ustrezajo iskani semantični trojici. Dopolnjevanje parametrov zahteva dva izpolnjena parametra, od katerih je relacija obvezna. Ta način vrne seznam možnih vrednosti za manjkajoč parameter.

Language:	Slovenian
Keywords:	ekstrakcija, informacija, slovenščina
Work type:	Bachelor thesis/paper
Typology:	2.11 - Undergraduate Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2022
PID:	20.500.12556/RUL-136260
COBISS.SI-ID:	105616387
Publication date in RUL:	21.04.2022
Views:	2202
Downloads:	172
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Open information extraction for Slovenian language
Open information extraction is a process of natural language processing that extracts possible dependencies from individual sentences. Dependencies consist of a semantic triple where the first article represents the subject we inquire about, the relations that describe how the first article relates to the third, and the object. The open information extraction system for the Slovenian language is based on a rule-based method. The system consists of a preprocessor and extractor system. The role of the preprocessor is to process input text using the CLASSLA system which grammatically analyzes sentences, lemmatizes, and builds a semantic tree. The role of extractor is to find relationships in sentences using given rules. These rules are more complex than in English because in Slovenian the word order is freer. Slovenian also knows several declensions that enable a more precise definition of the subject and object. It is possible to search for found extractions in two ways: searching for sentences and supplementing the parameters. Sentence search requires that all parameters of the semantic triple are met and returns a list of sentences that match the semantic triple searched for. Complementing the parameters requires two met parameters of which the relation is mandatory. This method returns a list of possible values for the missing parameter.
Keywords:	extraction, information, Slovenian language

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents