The increasing complexity of services also encourages the complexity and heterogeneity of the systems that providers use for measuring and billing these services. The complexity may result in the occurrence of errors and consequently in a revenue leakage. For the telecom industry that strives to adapt their services not only to the needs of customers but also to new technological opportunities the biggest complexity issue is billing the services. They were the first to recognize the need to develop a systematic way to grapple the problem of errors that cause the revenue leakage. The sector prepared a comprehensive framework of procedures called "revenue assurance". With the liberalization of the electricity market, the services of electricity distribution became as complicated as the telecommunications services. This further significantly enhanced the deployment of smart grids, advanced metering infrastructures and smart meters that, with the abundance of data, give opportunities for new services. This master thesis presents the approach we took to grapple with the revenue assurance. We used general procedures and took into consideration the peculiarities of the electricity distribution domain as well. Particular attention was given to data quality issue.
In the practical part of the thesis, we focused on acquiring knowledge from the data that would benefit us in detecting the revenue leakage, from the smart meters’ data collected by advanced metering infrastructure in particular. In the data warehouse, the data was combined with the billing system data and the geographic information system data. While building the data warehouse, we encountered some problems with data quality. After we had pointed out the problems, we indicated how to eliminate them and how to establish a mechanism for monitoring any possible recurrences of errors.
The thesis focused on collecting information on the characteristics of consumers. Once we acquired this knowledge, we used it to look for any thefts of electricity. We made a comparison of machine learning methods for the classification of daily load curves of consumers into typical groups. Based on the analysis of the results obtained, we selected the best method, i.e. the expectation maximization method. At the same time, we determined the best number of clusters with the typical dynamics of daily consumption. Once all measuring points with the 15-minute consumption data were classified, we were determining the characteristics of a consumer that coincide the most with the dynamics of his electricity consumption. Again, we tested several machine learning methods and established that decision trees are the most appropriate tool for this task. With established behavior, we estimated daily consumption for all other measuring points. Thus prepared data were used to develop an analytical structure that proves to be an excellent base for discovering the revenue leakage. We automated the entire process of filling the warehouse, finding and applying knowledge and, last but not least, processing the analytical structure. We demonstrated the usefulness of this practice with a few examples of actual reports.