Avtomatska izdelava vezljivostnih vzorcev za slovenske glagole

VOJE, KRISTJAN

Avtomatska izdelava vezljivostnih vzorcev za slovenske glagole
ID VOJE, KRISTJAN (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

, ID Gantar, Apolonija (Comentor)

PDF - Presentation file, Download (438,61 KB)
MD5: 1803418D4DFEBEDB32E0E6C47B81918D

Abstract

Za računalniško obdelavo naravnega jezika so ključnega pomena veliki označeni učni korpusi. Ko obravnavamo manjše količine podatkov, lahko te obogatimo s podrobnejšo analizo strukture jezika. Lastnost narav- nega jezika, ki jo bomo obravnavali v diplomskem delu, je vezljivost. Ve- zljivost se nanaša na pomen povedi. Nosilci vezljivosti so pogosto glagoli, lahko pa tudi pridevniki in samostalniki. Določenemu pomenu nosilca ve- zljivosti v teoriji pripada določen vezljivostni vzorec. Vezljivostni vzorci so računalniško dobro berljivi in vsebujejo dovolj informacij za razdvoumljanje pomena nosilca vezljivosti. Naše delo temelji na korpusu ssj500k 2.1. Dobra polovica korpusa vsebuje povedi z ročno označenimi udeleženskimi vlogami, iz katerih smo razbrali vezljivostne vzorce. Pripravili smo program, ki uporabniku omogoča interaktiven pregled vezljivostnih vzorcev v korpusu. Različni pomeni istega glagola tvorijo različne vezljivostne vzorce. Nad stavki v korpusu smo preizkusili nabor algoritmov za gručenje z namenom iskanja vezljivostnih vzorcev, značilnih za določeni pomen glagola. Imple- mentirali smo tri različice Leskovega algoritma in dve različici algoritma k- voditeljev. Podatke za Leskov algoritem smo črpali iz leksikona SloWNet in slovarja SSKJ.

Language:	Slovenian
Keywords:	vezljivostni vzorci, vezljivost, glagol
Work type:	Bachelor thesis/paper
Organization:	FRI - Faculty of Computer and Information Science
Year:	2019
PID:	20.500.12556/RUL-106000
Publication date in RUL:	11.01.2019
Views:	1254
Downloads:	252
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Automatic construction of verb valency patterns for Slovene
Natural language processing greatly depends on a sufficient amount of training data. When handling with smaller datasets, we can enrich our data by analyzing the semantic structure of the language. In our thesis, we will be working with valency. Valency carries information about the meaning of a sentence. While valency is usually a feature of verbs, we can also observe it in adjectives and nouns. Valency forms valency patterns around carriers. In theory, each sense of the valency carrier should form a distinguishable valency pattern. Valency patterns have a small feature space and are fit for training machine learning algorithms. They contain enough information to distinguish the sense of the valency carrier. Our work is based on corpus ssj500k 2.1. Over half of the corpus contains hand-annotated semantic roles from which we extracted valency patterns. We built a program for listing and analyzing the valency patterns. In theory, different verb senses form different valency patterns. We tested a number of clustering algorithms on the corpus sentences. The goal was to cluster the valency frames, based on similar senses, and to find sense specific valency patterns. We implemented three versions of Lesk algorithm and two versions of k-means algorithm. We used data from SloWNet and SSKJ for the knowledge based Lesk algorithms.
Keywords:	valency frame, valency, verb

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents