In this work, we attempt to determine applicability of two kinds of local network patterns, i.e. labelled frequent patterns and graphlets, to the problems of network evolution analysis and prediction of either further temporal development of the network or its missing elements.
Networks and other data with complex structure are attracting ever more attention in the research community. This is partly due to large network analysis finally being feasible with the hardware, developed in the last two decades. Beside that, methods to collect large amounts of network-structured data have also emerged, with industry-fueled need for its analysis following closely. For instance, World Wide Web-based social networking services offer unprecedented insight into social strucutures, and on the other hand provide strong incentive for development of methods that analyse networks and predict their future evolution. The data gathered by those services are a rich source for social sciences researchers, and network analysis also provides useful tools for scientists in the field of biology.
Our work was motivated by our own need of a method to generate sensible hypotheses about missing elements in small labeled networks. No research of such a problem has been done before in the published literature. Therefore, we began by establishing theoretical foundations of the problem by formally defining it and pointing out similarities and differences with regard to other similar problems. Key differences required us to propose a new prediction quality evaluation methodology that constructs test networks by removing an edge from some network in a given set in every possible way. We defined an evaluation metrics based on the rank of correct prediction as well. The justification of the problem definition and the set up of the prediction quality evaluation methodology are important scientific contributions of this work.
We propose a family of methods to solve the defined problem that use frequent patterns to model the domain. Hypotheses about missing elements in the network are constructed and scored by looking for partial embeddings of frequent patterns in what is considered to be an incomplete network. Results of evaluation on real and synthetic data sets indicate not only the usefulness of the proposed family of methods, but also the need to consider this type of problem separately from previously researched network analysis and completion problems. Methods, proposed in the available literature, that solve the latter, performed far worse in our small network completion problem.
The proposed method is not suitable for large network analysis. Its time complexity begins to hinder its usefulness in slightly larger networks. Experiments show that its prediction quality declines with network size as well. That is an expected outcome as the method was not designed to handle large networks.
While working with local network patterns, another type of them caught our attention: graphlets. Those have been first considered in the field of bioinformatics, but their use has since spread to other domains. We considered the process of graphlet evolution in growing networks from a theoretical viewpoint first. We showed how structures can transform into one another and defined mathematical and algorithmic tools for an empirical analysis of multiple real and synthetic networks. We showed how some evolution properties are shared among all analysed networks and others are domain-specific.
Also proposed in this doctoral dissertation is a method for prediction of further temporal evolution of the network, based on analysis of graphlet evolution. Evaluation results do not indicate its usefulness for practical applications. On all analysed networks, the well-known method named Random Walk with Restarts (RWR) achieved better prediction. The predictions generated by our method also did not aid with prediction stacking. The evaluation results did, however, surprisingly show that stacking the RWR predictions, obtained using different parameter values, can increase prediction quality.