izpis_h1_title_alt

Medjezikovna klasifikacija sentimenta tvitov
Reba, Kristjan (Author), Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window, Mozetič, Igor (Co-mentor)

.pdfPDF - Presentation file, Download (946,25 KB)
MD5: E317063244789747F42C005746F53081

Abstract
Vektorske vložitve besed so predstavitve besed v obliki vektorjev realnih števil. Predstavljajo temelj mnogih aplikacij v procesiranju naravnega jezika in so potrebne za procesiranje z globokimi nevronskimi mrežami. Medjezikovne vložitve besed preslikajo besede iz več jezikov v isti vektorski prostor, kjer so istopomenske besede poravnane. Uporabljajo se za prenos naučenih modelov med jeziki in širjenje podatkovne množice. Za izgradnjo kakovostnih klasifikacijskih modelov za jezikovne probleme potrebujemo velike množice označenih učnih primerov, ki niso vedno na voljo za vse jezike in vse probleme, zato si želimo, da bi lahko izkoristili učne množice iz drugih, podatkovno bolj bogatih jezikov. V diplomski nalogi želimo za prenos znanja med jeziki izkoristiti medjezikovne vektorske vložitve. Uporabimo podatkovne množice tvitov v 15 različnih jezikih s pripadajočo oceno sentimenta. Klasifikacija sentimenta je naloga klasifikacije besedil, katere cilj je razvrstiti besedilo glede na sentimentno polarnost mnenj, ki jih vsebuje. Nad označenimi podatkovnimi množicami tvitov v različnih jezikih testiramo medjezikovne prenose z modelom BERT in knjižnico LASER. Eksperimenti pokažejo, da prenos informacij med podatkovnimi množicami različnih jezikov tipično ne prinese izboljšav klasifikacijske točnosti.

Language:Slovenian
Keywords:sentiment besedil, vektorske vložitve besed, jezikovni model, medjezikovne vložitve, tviti
Work type:Bachelor thesis/paper (mb11)
Organization:FRI - Faculty of computer and information science
Year:2019
COBISS.SI-ID:1538310339 This link opens in a new window
Views:500
Downloads:141
Metadata:XML RDF-CHPDL DC-XML DC-RDF
 
Average score:(0 votes)
Your score:Voting is allowed only to logged in users.
:
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Secondary language

Language:English
Title:Cross-lingual classification of tweet sentiment
Abstract:
Word embeddings are representations of words in the form of numeric vectors. They are the basic representation for many natural language processing applications and are required for deep neural network processing. Cross-lingual word embeddings map words from multiple languages to the same vector space where similar words are aligned. Cross-lingual embeddings are used for machine learning model transfer between languages and for expansion of data sets. To build good classification models for language problems, we need large sets of labeled learning examples, which are not always available for all languages and for all problems. We aim to be able to take advantage of data sets from data-rich languages. In this work, we use cross-lingual word embeddings to transfer knowledge between languages. We use data sets of tweets in 15 different languages with assigned sentiment labels. Sentiment analysis task aims to classify the text according to the sentiment polarity of the opinions it contains. On labeled data sets of tweets in different languages, we test multilingual information transmissions using the BERT model and the LASER library. Experiments show that the transfer of information between data sets of different languages does not necessarily lead to improvements in classification accuracy.

Keywords:text sentiment, word embeddings, language model, cross-lingual embeddings, tweets

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back