izpis_h1_title_alt

Ugotavljanje berljivosti besedil z uporabo statističnih mer in strojnega učenja
ID ANDOVA, ANDREJAANA (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (270,30 KB)
MD5: 000AC8A37D280B347830F7955AB3FC76
PID: 20.500.12556/rul/9f23cd1c-4f92-4a47-818b-0bb676dad07a

Abstract
Ta diplomska naloga opiše prototip sistema, ki oceni berljivost danega besedila v slovenščini. Za oceno berljivosti besedila smo uporabili dve različni metodi – regresijo in klasifikacijo. Regresijska metoda kot oceno berljivosti besedila vrne število ki ustreza število let študija, medtem ko poskusi klasifikacijska metoda besedilo razvrstiti v enega od dveh razredov, kjer je en razred definiran kot bolj berljiv, drugi pa kot manj berljiv. Kot učno množico smo uporabili korpus esejev Šolar. Berljivost smo ocenili z različnimi statističnimi merami in s pomočjo algoritmov strojnega učenja. Kakovost naših prototipov, smo ocenili tudi s pomočjo časopisov in revij iz korpusa ccGigafida.

Language:English
Keywords:beljivost, obdelava naravnega jezika, strojno učenje
Work type:Bachelor thesis/paper
Organization:FRI - Faculty of Computer and Information Science
Year:2017
PID:20.500.12556/RUL-99044 This link opens in a new window
Publication date in RUL:22.12.2017
Views:1872
Downloads:216
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Assessment of text readability using statistical and machine learning approaches
Abstract:
This thesis describes a prototype of a system that evaluates the readability of a given text in Slovene. To estimate the readability of a text, we used two methods - regression and classification. The regression method returns a numerical estimation of the readability of a text expressed as years of education, while the classification method tries to classify the input into two classes, where one of the classes is defined as more readable and the other as less readable. We used the corpus Šolar as a training set and first estimated readability using statistical measures. Using features extracted from the texts, we trained different ML algorithms. To assess the quality of our prototypes, we used newspapers and magazines from ccGigafida corpus as a testing set.

Keywords:readability, natural language processing, machine learning

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back