Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Medjezikovna klasifikacija sentimenta tvitov
ID
Reba, Kristjan
(
Author
),
ID
Robnik Šikonja, Marko
(
Mentor
)
More about this mentor...
,
ID
Mozetič, Igor
(
Comentor
)
PDF - Presentation file,
Download
(946,25 KB)
MD5: E317063244789747F42C005746F53081
Image galllery
Abstract
Vektorske vložitve besed so predstavitve besed v obliki vektorjev realnih števil. Predstavljajo temelj mnogih aplikacij v procesiranju naravnega jezika in so potrebne za procesiranje z globokimi nevronskimi mrežami. Medjezikovne vložitve besed preslikajo besede iz več jezikov v isti vektorski prostor, kjer so istopomenske besede poravnane. Uporabljajo se za prenos naučenih modelov med jeziki in širjenje podatkovne množice. Za izgradnjo kakovostnih klasifikacijskih modelov za jezikovne probleme potrebujemo velike množice označenih učnih primerov, ki niso vedno na voljo za vse jezike in vse probleme, zato si želimo, da bi lahko izkoristili učne množice iz drugih, podatkovno bolj bogatih jezikov. V diplomski nalogi želimo za prenos znanja med jeziki izkoristiti medjezikovne vektorske vložitve. Uporabimo podatkovne množice tvitov v 15 različnih jezikih s pripadajočo oceno sentimenta. Klasifikacija sentimenta je naloga klasifikacije besedil, katere cilj je razvrstiti besedilo glede na sentimentno polarnost mnenj, ki jih vsebuje. Nad označenimi podatkovnimi množicami tvitov v različnih jezikih testiramo medjezikovne prenose z modelom BERT in knjižnico LASER. Eksperimenti pokažejo, da prenos informacij med podatkovnimi množicami različnih jezikov tipično ne prinese izboljšav klasifikacijske točnosti.
Language:
Slovenian
Keywords:
sentiment besedil
,
vektorske vložitve besed
,
jezikovni model
,
medjezikovne vložitve
,
tviti
Work type:
Bachelor thesis/paper
Organization:
FRI - Faculty of Computer and Information Science
Year:
2019
PID:
20.500.12556/RUL-109295
COBISS.SI-ID:
1538310339
Publication date in RUL:
29.08.2019
Views:
1705
Downloads:
240
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
REBA, Kristjan, 2019,
Medjezikovna klasifikacija sentimenta tvitov
[online]. Bachelor’s thesis. [Accessed 24 April 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=109295
Copy citation
Share:
Secondary language
Language:
English
Title:
Cross-lingual classification of tweet sentiment
Abstract:
Word embeddings are representations of words in the form of numeric vectors. They are the basic representation for many natural language processing applications and are required for deep neural network processing. Cross-lingual word embeddings map words from multiple languages to the same vector space where similar words are aligned. Cross-lingual embeddings are used for machine learning model transfer between languages and for expansion of data sets. To build good classification models for language problems, we need large sets of labeled learning examples, which are not always available for all languages and for all problems. We aim to be able to take advantage of data sets from data-rich languages. In this work, we use cross-lingual word embeddings to transfer knowledge between languages. We use data sets of tweets in 15 different languages with assigned sentiment labels. Sentiment analysis task aims to classify the text according to the sentiment polarity of the opinions it contains. On labeled data sets of tweets in different languages, we test multilingual information transmissions using the BERT model and the LASER library. Experiments show that the transfer of information between data sets of different languages does not necessarily lead to improvements in classification accuracy.
Keywords:
text sentiment
,
word embeddings
,
language model
,
cross-lingual embeddings
,
tweets
Similar documents
Similar works from RUL:
Searching for similar works...
Similar works from other Slovenian collections:
Back