Word embeddings are representations of words in the form of numeric vectors.
They are the basic representation for many natural language processing applications and are required for deep neural network processing.
Cross-lingual word embeddings map words from multiple languages to the same vector space where similar words are aligned.
Cross-lingual embeddings are used for machine learning model transfer between languages and for expansion of data sets.
To build good classification models for language problems, we need large sets of labeled learning examples, which are not always available for all languages and for all problems.
We aim to be able to take advantage of data sets from data-rich languages.
In this work, we use cross-lingual word embeddings to transfer knowledge between languages.
We use data sets of tweets in 15 different languages with assigned sentiment labels.
Sentiment analysis task aims to classify the text according to the sentiment polarity of the opinions it contains.
On labeled data sets of tweets in different languages, we test multilingual information transmissions using the BERT model and the LASER library.
Experiments show that the transfer of information between data sets of different languages does not necessarily lead to improvements in classification accuracy.
|