Hierarhično gručenje na velikih podatkih

Debevec, Nejc

Hierarhično gručenje na velikih podatkih
ID Debevec, Nejc (Author), ID Zupan, Blaž (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (3,70 MB)
MD5: DED0A2944FCDE5662DE870448D529EBA

Abstract

Hierarhično gručenje je zelo priljubljena in uporabna metoda gručenja. Omogoča nam gradnjo informativne vizualizacije hierarhij v podatkih imenovano dendrogram. Težava se pojavi pri obdelavi večjih količin podatkov, saj ima metoda visoko časovno in prostorsko zahtevnost. V magistrskem delu predstavimo pristop za zmanjšanje kompleksnosti metode hierarhičnega gručenja. Ta temelji na preobdelavi podatkov s hitrejšimi tehnikami gručenja. V ta namen preizkusimo metode: DBSCAN, BIRCH, MeanSHift, metoda voditeljev in pa gručenje v omrežjih. Vsako izmed metod preizkusimo na različnih sintetičnih in realnih podatkovnih množicah. Prav tako podamo idejno vizualizacijo za prikaz rezultatov našega pristopa. Iz rezultatov je razvidno, da z našim pristopom bistveno časovno izboljšamo metodo hierarhičnega gručenja, vendar pri tem izgubimo pri natančnosti. Naš pristop namreč ne vrača popolnoma istih rezultatov, kot metoda hierarhičnega gručenja.

Language:	Slovenian
Keywords:	odkrivanje znanj iz podatkov, razvrščanje v skupine, hierarhično gručenje, vizualizacija podatkov
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2020
PID:	20.500.12556/RUL-124149
COBISS.SI-ID:	51746051
Publication date in RUL:	07.01.2021
Views:	1710
Downloads:	250
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Hierarchical Clustering for Large Data Sets
Hierarchical clustering is a very popular and useful clustering method. It allows us to build an informative visualization of hierarchies in data called a dendrogram. The problem arises when processing large amounts of data, as the method has a high time and space complexity. In the master's thesis, we present an approach to reducing the complexity of the method of hierarchical clustering. This is based on data processing with faster clustering techniques. For this purpose, we test the methods: DBSCAN, BIRCH, MeanShift, K-means and Louvain clustering. Each of the methods is tested on different synthetic and real data sets. We also provide a conceptual visualization to show the results of our approach. It is evident from the results that our approach significantly improves the time complexity of the method of hierarchical clustering, but we do lose accuracy. Namely, our approach does not return exactly the same results as the method of hierarchical clustering.
Keywords:	data mining, clustering, hierarchical clustering, data visualization

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents