Ugotavljanje berljivosti besedil z uporabo statističnih mer in strojnega učenja

ANDOVA, ANDREJAANA

Ugotavljanje berljivosti besedil z uporabo statističnih mer in strojnega učenja
ID ANDOVA, ANDREJAANA (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (270,30 KB)
MD5: 000AC8A37D280B347830F7955AB3FC76
PID: 20.500.12556/rul/9f23cd1c-4f92-4a47-818b-0bb676dad07a

Abstract

Ta diplomska naloga opiše prototip sistema, ki oceni berljivost danega besedila v slovenščini. Za oceno berljivosti besedila smo uporabili dve različni metodi – regresijo in klasifikacijo. Regresijska metoda kot oceno berljivosti besedila vrne število ki ustreza število let študija, medtem ko poskusi klasifikacijska metoda besedilo razvrstiti v enega od dveh razredov, kjer je en razred definiran kot bolj berljiv, drugi pa kot manj berljiv. Kot učno množico smo uporabili korpus esejev Šolar. Berljivost smo ocenili z različnimi statističnimi merami in s pomočjo algoritmov strojnega učenja. Kakovost naših prototipov, smo ocenili tudi s pomočjo časopisov in revij iz korpusa ccGigafida.

Language:	English
Keywords:	beljivost, obdelava naravnega jezika, strojno učenje
Work type:	Bachelor thesis/paper
Organization:	FRI - Faculty of Computer and Information Science
Year:	2017
PID:	20.500.12556/RUL-99044
Publication date in RUL:	22.12.2017
Views:	2493
Downloads:	231
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Assessment of text readability using statistical and machine learning approaches
This thesis describes a prototype of a system that evaluates the readability of a given text in Slovene. To estimate the readability of a text, we used two methods - regression and classification. The regression method returns a numerical estimation of the readability of a text expressed as years of education, while the classification method tries to classify the input into two classes, where one of the classes is defined as more readable and the other as less readable. We used the corpus Šolar as a training set and first estimated readability using statistical measures. Using features extracted from the texts, we trained different ML algorithms. To assess the quality of our prototypes, we used newspapers and magazines from ccGigafida corpus as a testing set.
Keywords:	readability, natural language processing, machine learning

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents