In order to predict the behaviour of networks with machine-learning algo-
rithms, the vector representation of nodes in a low dimensional vector space
is required. The current state-of-the-art algorithm for the calculation of node
embeddings in vector space is Node2vec. Node2vec samples the network
through the 2nd order random walks. Unfortunately, Node2vec has a high
memory complexity due to the preprocessed probability-distribution tables.
Due to high memory complexity, an average user is unable to use it for larger
networks. In this thesis, we present a heuristic approach to the random walk
simulation. The heuristic approach replaces probability tables with binary
trees and guarantees linear time and space complexity, while retaining the
quality of computed features. The heuristic approach requires from 6 up to
40 times less memory than Node2vec on tested datasets.
|