Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Empirična evalvacija procesa avtomatske klasifikacije sentimenta na finančni domeni
ID
RUTAR, SAŠO
(
Author
),
ID
Robnik Šikonja, Marko
(
Mentor
)
More about this mentor...
,
ID
Mozetič, Igor
(
Comentor
)
PDF - Presentation file,
Download
(1,44 MB)
MD5: 01259D8F0AC45878C040A4CA600A0FCC
PID:
20.500.12556/rul/dac2638b-dc31-4d7d-a2d0-a2af7ba5f911
Image galllery
Abstract
V tem diplomskem delu obravnavamo specifične vidike sistema za avtomatsko analizo sentimenta v tvitih. Naš sistem za analizo sentimenta temelji na tehnikah strojnega učenja in tekstovnega rudarjenja, kot sta predstavitev besedil z vrečami besed in metoda podpornih vektorjev. S sistemom obdelamo podatkovni tok kratkih sporočil (tvitov) na temo finančnih trgov, specifično na temo trgovanja z delnicami, v razponu dveh let. Vsako sporočilo avtomatsko klasificiramo v pozitivni, negativni ali nevtralni razred, kar predstavlja sentiment oziroma stališče do delnice, ki je omenjena v sporočilu. Sentiment torej v našem primeru odraža stališče govorca in v primeru pozitivnega ali negativnega razreda predstavlja nagib k nakupu ali prodaji delnice. Za izgradnjo klasifikacijskega modela uporabimo relativno velik nabor označenih podatkov, ki sestoji iz približno pol milijona tvitov, ki so jih ročno označili eksperti. Za potrebe analize smo razvili evalvacijsko platformo in pripadajočo metodologijo, ki nam omogoča, da z zaporedjem poskusov lahko odgovorimo na številna vprašanja, ki se pojavijo pri aplikacijah analize sentimenta v industrijskih okoljih. Pri analizah upoštevamo časovno sosledje sporočil v podatkovnih tokovih in tako omogočimo sprotno merjenje uspešnosti sistema tudi v produkcijskih okoljih. Rezultati analize nam med drugim razkrijejo (i) najprimernejši algoritem za klasifikacijo, (ii) optimalno velikost in vzorčenje (redčenje) podatkov za ročno označevanje, (iii) odvisnost med uspešnostjo klasifikacije in časovno oddaljenostjo od označenih primerov, (iv) vpliv prisotnosti duplikatov v podatkih in (v) obnašanje izbrane klasifikacijske metode v območju negotovosti ob hiper ravnini klasifikatorja z metodo podpornih vektorjev.
Language:
Slovenian
Keywords:
analiza sentimenta
,
strojno učenje
,
rudarjenje mnenj
,
Twitter
,
obdelava naravnega jezika
,
klasifikacija
,
metoda podpornih vektorjev
,
empirična evalvacija
,
finančno trgovanje
,
delnice
Work type:
Undergraduate thesis
Organization:
FRI - Faculty of Computer and Information Science
Year:
2016
PID:
20.500.12556/RUL-91200
Publication date in RUL:
24.03.2017
Views:
1615
Downloads:
391
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
RUTAR, SAŠO, 2016,
Empirična evalvacija procesa avtomatske klasifikacije sentimenta na finančni domeni
[online]. Bachelor’s thesis. [Accessed 29 March 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=91200
Copy citation
Share:
Secondary language
Language:
English
Title:
Empirical evaluation of automatic sentiment classification process in financial domain
Abstract:
In this thesis, we explore several specific aspects of Twitter sentiment analysis. Our system for sentiment analysis is based on machine learning and text mining techniques, such as the bag-of-words representation of texts and support vector machine classifier. We employ our system to analyze a stream of short messages (tweets) about financial markets, specifically about stock trading, in the time span of two years. We classify each message into positive, negative, or neutral class, which represent the sentiment or stance towards the stock mentioned in the message. The term sentiment in our case thus denotes the stance of the author (speaker) and in the case of positive or negative class represents the author’s leaning towards buying or selling the stock. To build the classification model, we employ a relatively large gold standard which consists of approximately a half million tweets hand-labeled by the domain experts. For the purpose of this analysis, we developed an evaluation platform and a methodology that allow us, by conducting a series of experiments, to answer various questions which arise when applying sentiment analysis in industrial settings. In the evaluation processes, we take the temporal nature of the data into account and thus enable continuous monitoring of performance of live systems. The results of the analysis reveal (i) the most appropriate classification algorithm, (ii) the optimal size of the labeled data and subsampling method, (iii) the relationship between the classifier performance and the time lag from the training data, and (iv) the effect of duplicated tweets (e.g., retweets), and (v) the behavior of the employed classification method in the uncertainty area near the hyper-plane of support vector machine classifier.
Keywords:
sentiment analysis
,
machine learning
,
opinion mining
,
Twitter
,
natural language processing
,
classification with support vector machine
,
empirical evaluation
,
financial trading
,
stocks
Similar documents
Similar works from RUL:
Using machine learning for sentiment analysis of Slovene web commentaries
Classification of viral genomes using machine learning
System for sentiment analysis of comments about mobile applications
Computer-vision-based tree trunk recognition
Calculation of price indices with machine learning for automatic product classification
Similar works from other Slovenian collections:
CLASSIFICATION OF AUDIO SIGNALS INTO MUSIC GENRES
Use of vector embedding for intelligent processing of slovene text
Evaluation of machine learning methods with natural language processing
DETERMINING GENETIC PREDICTORS BY USING INTELLIGENT SYSTEMS
Using words from daily news headlines to predict the movement of stock market indices
Back