Uporaba globokih nevronskih mrež za ločevanje avtomatsko generiranih in ročno napisanih člankov

STOPINŠEK, AMON

Repository of the University of Ljubljana

Details

Uporaba globokih nevronskih mrež za ločevanje avtomatsko generiranih in ročno napisanih člankov
ID STOPINŠEK, AMON (Author), ID Kononenko, Igor (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,46 MB)
MD5: 9629A500E136A5AC9AB333C4A8FD7C50

Abstract

V diplomskem delu se ukvarjamo s klasifikacijo besedil na problemu ločevanja člankov, ki jih je napisal človek, od člankov, ki jih je napisal stroj. Na problemu smo preizkusili različne arhitekture konvolucijskih in rekurenčnih globokih nevronskih mrež ter različne predstavitve besedil. Modele smo testirali na podatkovni zbirki ročno napisanih in generiranih člankov o hujšanju. Najboljše rezultate smo dosegli z uporabo arhitekture BLSTM z besednimi vložitvami word2vec. S tem modelom smo dosegli 96,71% klasifikacijsko točnost na testni podatkovni množici na ročno napisanih člankih, 100% klasifikacijsko točnost na člankih, generiranih s slabim modelom in 97,41% klasifikacijsko točnost na člankih, generiranih z dobrim modelom.

Language:	Slovenian
Keywords:	umetna inteligenca, globoke nevronske mreže, klasifikacija besedil, procesiranje naravnega jezika
Work type:	Bachelor thesis/paper
Organization:	FRI - Faculty of Computer and Information Science
Year:	2019
PID:	20.500.12556/RUL-106122
Publication date in RUL:	30.01.2019
Views:	1396
Downloads:	208
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Using deep neural networks for differentiating automatically generated from manually written articles
This thesis deals with the text classification on the problem of classifying manually written and automatically generated articles. We tested various convolutional and recurrent deep neural network architectures and various text representations. Models were tested on a dataset of manually written and automatically generated articles about weight loss. Best results were achieved with a model using the BLSTM architecture and word2vec word embeddings. With this model, we achieved 96,71% classification accuracy on the test dataset of manually written articles, 100% classification accuracy on articles generated with the bad model and 97,41% classification accuracy on articles generated with the good model.
Keywords:	artifical inteligence, deep neural networks, text classification, natural language processing

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents