Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Označevanje skupin dokumentov z uporabo vložitev besed
ID
Đukić, Nikola
(
Author
),
ID
Zupan, Blaž
(
Mentor
)
More about this mentor...
PDF - Presentation file,
Download
(583,22 KB)
MD5: 86F404C7BB34FF824B761B9CF38B52D5
Image galllery
Abstract
Dokumente lahko na različne načine predstavimo z vektorji ter jih vizualiziramo v dvorazsežnem prostoru. V tem prostoru lahko poiščemo skupine podobnih dokumentov in nato poiščemo besede, ki dobro opisujejo posamezne skupine. Vizualizacijo dokumentov lahko obogatimo s prikazom najdenih besed. Za to se uporabljajo metode za označevanje skupin dokumentov, ki temeljijo na uporabi mer pomembnosti, ki upoštevajo le frekvence besed v danem korpusu. V tem diplomskem delu predlagamo novo metodo za označevanje skupin dokumentov, ki za vložitev dokumentov in besed uporablja prednaučene modele za vložitev besed ter temelji na predpostavki, da so podobne besede predstavljene s podobnimi vektorji. Modele za vložitev besed med sabo primerjamo s stališča medsebojne podobnosti in uspešnosti na klasifikacijskih nalogah, da bi izbrali tistega, ki ga bomo uporabili v kombinaciji z metodo za označevanje skupin dokumentov. Metodo empirično ovrednotimo ter jo primerjamo z že obstoječim pristopom in pokažemo, da zaradi uporabe prednaučenih modelov lahko uspešno dela tudi na zelo majhnih podatkovnih množicah, česar že obstoječi pristop ne zmore.
Language:
Slovenian
Keywords:
vložitve besed
,
vizualizacija
,
gručenje
Work type:
Bachelor thesis/paper
Typology:
2.11 - Undergraduate Thesis
Organization:
FRI - Faculty of Computer and Information Science
Year:
2020
PID:
20.500.12556/RUL-119839
COBISS.SI-ID:
31040003
Publication date in RUL:
11.09.2020
Views:
1133
Downloads:
152
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Copy citation
Share:
Secondary language
Language:
English
Title:
Labeling document clusters using word embeddings
Abstract:
Documents can be represented as vectors in various ways and visualized in two-dimensional space. In that space, we can find clusters of similar documents and the words that describe each cluster as well as possible. Those words can be added to the visualization to enrich it. This can be achieved by using methods for labeling document clusters. These methods use the frequencies of words in a given corpus to measure the importance of each word. In this thesis we propose a novel method for labeling clusters of documents. The method is based on using pre-trained word embedding models to embed both words and documents and utilizes the assumption that the similar words are represented with similar vectors. We compare word embedding models by computing their similarities and scores achieved on classification tasks to choose the one to use in combination with our method. Method is empirically evaluated and compared with the traditional approach. We show that compared to the traditional approach, our method can work on very small datasets due to the fact that it uses the pre-trained models to obtain the embeddings.
Keywords:
word embeddings
,
visualization
,
clustering
Similar documents
Similar works from RUL:
Similar works from other Slovenian collections:
Back