Hierarchical clustering is a very popular and useful clustering method. It allows us to build an informative visualization of hierarchies in data called a dendrogram. The problem arises when processing large amounts of data, as the method has a high time and space complexity. In the master's thesis, we present an approach to reducing the complexity of the method of hierarchical clustering. This is based on data processing with faster clustering techniques. For this purpose, we test the methods: DBSCAN, BIRCH, MeanShift, K-means and Louvain clustering. Each of the methods is tested on different synthetic and real data sets. We also provide a conceptual visualization to show the results of our approach. It is evident from the results that our approach significantly improves the time complexity of the method of hierarchical clustering, but we do lose accuracy. Namely, our approach does not return exactly the same results as the method of hierarchical clustering.
|