The main goal of this master’s thesis was to develop a procedure that will automate the construction of the knowledge base for a virtual assistant that answers questions about municipalities in Slovenia. The aim of the procedure is to replace or facilitate manual preparation of the virtual assistant's knowledge base. Theoretical backgrounds of different machine learning fields, such as multilabel classification, text mining and learning from weakly labeled data were examined to gain a better understanding of the topic. In this thesis, we present a procedure that finds the most relevant websites to provide answers on various questions relating to the municipality's activities. The procedure's parameters were first optimized using test data, and then the procedure was evaluated manually using data of new municipalities. In this way, we acquired real estimation of the quality of the implemented procedure. The results show that the procedure recommends more relevant answers in comparison to a commercial search engine. The developed procedure therefore effectively speeds up and simplifies data preparation for the municipal virtual assistant. In this way, we facilitate the work of municipality staff who until now had to insert answers into the municipal virtual assistant's knowledge base manually.
|