Details

Empirična evalvacija procesa avtomatske klasifikacije sentimenta na finančni domeni
ID RUTAR, SAŠO (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window, ID Mozetič, Igor (Comentor)

.pdfPDF - Presentation file, Download (1,44 MB)
MD5: 01259D8F0AC45878C040A4CA600A0FCC
PID: 20.500.12556/rul/dac2638b-dc31-4d7d-a2d0-a2af7ba5f911

Abstract
V tem diplomskem delu obravnavamo specifične vidike sistema za avtomatsko analizo sentimenta v tvitih. Naš sistem za analizo sentimenta temelji na tehnikah strojnega učenja in tekstovnega rudarjenja, kot sta predstavitev besedil z vrečami besed in metoda podpornih vektorjev. S sistemom obdelamo podatkovni tok kratkih sporočil (tvitov) na temo finančnih trgov, specifično na temo trgovanja z delnicami, v razponu dveh let. Vsako sporočilo avtomatsko klasificiramo v pozitivni, negativni ali nevtralni razred, kar predstavlja sentiment oziroma stališče do delnice, ki je omenjena v sporočilu. Sentiment torej v našem primeru odraža stališče govorca in v primeru pozitivnega ali negativnega razreda predstavlja nagib k nakupu ali prodaji delnice. Za izgradnjo klasifikacijskega modela uporabimo relativno velik nabor označenih podatkov, ki sestoji iz približno pol milijona tvitov, ki so jih ročno označili eksperti. Za potrebe analize smo razvili evalvacijsko platformo in pripadajočo metodologijo, ki nam omogoča, da z zaporedjem poskusov lahko odgovorimo na številna vprašanja, ki se pojavijo pri aplikacijah analize sentimenta v industrijskih okoljih. Pri analizah upoštevamo časovno sosledje sporočil v podatkovnih tokovih in tako omogočimo sprotno merjenje uspešnosti sistema tudi v produkcijskih okoljih. Rezultati analize nam med drugim razkrijejo (i) najprimernejši algoritem za klasifikacijo, (ii) optimalno velikost in vzorčenje (redčenje) podatkov za ročno označevanje, (iii) odvisnost med uspešnostjo klasifikacije in časovno oddaljenostjo od označenih primerov, (iv) vpliv prisotnosti duplikatov v podatkih in (v) obnašanje izbrane klasifikacijske metode v območju negotovosti ob hiper ravnini klasifikatorja z metodo podpornih vektorjev.

Language:Slovenian
Keywords:analiza sentimenta, strojno učenje, rudarjenje mnenj, Twitter, obdelava naravnega jezika, klasifikacija, metoda podpornih vektorjev, empirična evalvacija, finančno trgovanje, delnice
Work type:Undergraduate thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2016
PID:20.500.12556/RUL-91200 This link opens in a new window
Publication date in RUL:24.03.2017
Views:1621
Downloads:391
Metadata:XML DC-XML DC-RDF
:
RUTAR, SAŠO, 2016, Empirična evalvacija procesa avtomatske  klasifikacije sentimenta na finančni domeni [online]. Bachelor’s thesis. [Accessed 31 March 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=91200
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Empirical evaluation of automatic sentiment classification process in financial domain
Abstract:
In this thesis, we explore several specific aspects of Twitter sentiment analysis. Our system for sentiment analysis is based on machine learning and text mining techniques, such as the bag-of-words representation of texts and support vector machine classifier. We employ our system to analyze a stream of short messages (tweets) about financial markets, specifically about stock trading, in the time span of two years. We classify each message into positive, negative, or neutral class, which represent the sentiment or stance towards the stock mentioned in the message. The term sentiment in our case thus denotes the stance of the author (speaker) and in the case of positive or negative class represents the author’s leaning towards buying or selling the stock. To build the classification model, we employ a relatively large gold standard which consists of approximately a half million tweets hand-labeled by the domain experts. For the purpose of this analysis, we developed an evaluation platform and a methodology that allow us, by conducting a series of experiments, to answer various questions which arise when applying sentiment analysis in industrial settings. In the evaluation processes, we take the temporal nature of the data into account and thus enable continuous monitoring of performance of live systems. The results of the analysis reveal (i) the most appropriate classification algorithm, (ii) the optimal size of the labeled data and subsampling method, (iii) the relationship between the classifier performance and the time lag from the training data, and (iv) the effect of duplicated tweets (e.g., retweets), and (v) the behavior of the employed classification method in the uncertainty area near the hyper-plane of support vector machine classifier.

Keywords:sentiment analysis, machine learning, opinion mining, Twitter, natural language processing, classification with support vector machine, empirical evaluation, financial trading, stocks

Similar documents

Similar works from RUL:
  1. Working with the Classroom Community during the COVID-19 epidemic
  2. Implementation of additional professional assistance to children with special needs during distance education
  3. Subjective assessment of the quality of life of a family with a person with special needs during distance learning
  4. Students' experiences of online peer violence during distance education
  5. Collaboration between special education teachers and parents in distance education
Similar works from other Slovenian collections:
  1. The impact of the COVID-19 pandemic on the mental health of pregnant women
  2. The impact of the Covid-19 pandemic on the mental health of prisoners
  3. The impact of the COVID-19 pandemic on the mental health of police officers
  4. The impact of the COVID-19 pandemic on student mental health
  5. ADOLESCENTS AND DEPRESSION

Back