The telecommunication sector is one of the most profitable and thus the most competitive in the world. Most countries have several service providers, and their common goal is to maximize possible revenue through acquiring more and more new subscribers. In the last decade, mobile telecommunication markets became saturated in many countries; i.e., the number of applied mobile numbers reached or even exceeded the number of residents. To retain or even increase their market share, service providers are faced with two options: to retain users and to acquire users from competitive providers. Since the cost of retention is about one-sixth of that of acquisition, providers mainly focus on the first action. Nowadays, the use of mobile services has become a necessity. Therefore, users more often churn by moving to another service provider rather than completely ceasing the use of mobile services. In many markets all over the world, the churn rate has increased significantly since the introduction of mobile number portability, because this has made churning transparent for all contacts of a churner. The reason users transfer their subscriber numbers to other service providers is usually their dissatisfaction with their current provider or a better offer from the competition. Although, the annual churn rates can reach and even exceed 40 %, a significant damage is already inflicted with much smaller churn rates, e.g., under 5 %. A number of different churn prevention methods has been developed and suggested in the last years, mainly on the task of churn prediction. To successfully predict churn, service providers use all the data they hold on their customers. By knowing, who is going to churn, providers have the time to prevent these users from churning by offering them appropriate incentives.
Two main approaches to churn prediction can be found in the literature. The most widely used approach uses machine learning and data mining techniques to build classification models. Using such models, the customers are assigned churn probability scores, which are necessary for labelling them as churners or non-churners. Then, the top k-percent of customers determined most likely to churn are offered incentives to prevent them from actually churning. The second, more recent, approach addresses the mobile user network
as a social network, from which churn-relevant patterns and other information are extracted using various social network analysis tools. Despite churn prediction being of high interest for more than 10 years now, several open issues are still present. This thesis solves some of them through the five described novel scientific contributions: (i) a diffusion model for churn
prediction based on sociometric theory, (ii) a novel churn-influence-prediction model, (iii) a novel hybrid model for churn prediction based on classical churn-prediction model and churn-influence-prediction model, (iv) improving churn prediction by separate modelling of influential and other customers, and (v) explaining churn reasons through decision tree visualization. The first contribution proposes a novel churn prediction diffusion model
based on sociometric clique and social status theory. It describes the concept of energy in the diffusion model as an opinion of users, which is transformed to user influence using the derived social status function. Furthermore, a novel diffusion model prediction scheme applicable to a single user or a small subset of users is described. The diffusion model is evaluated on a real dataset of users obtained from the selected mobile service provider. The empirical results show a significant improvement in prediction accuracy of the proposed
method compared with the basic diffusion model in the literature. The aim of the second contribution is to find good predictors of churn influence in a mobile service network. To this end, a procedure for determining the weak ground truth on churn influence is presented and used to determine the churn influence of prepaid customers. The determined scores are used to identify good churn-influence predictors among several candidate features.
The identified predictors are finally used to build the influence-prediction model.
The influence-prediction model is combined with the classical churnprediction-model to obtain a churn-influence-prediction model. This hybrid model is used to predict influential churners before they churn and with it also influence their peers to churn. The results show that considerably better churn prediction results can be achieved using the proposed hybrid model than by using the classical churn-prediction model alone. Moreover, the successfully
predicted churners by the hybrid approach also have a greater number of churn followers. A successful retention of the predicted churners could greatly affect churn reduction since it could also prevent the churns of these followers. The fourth contribution checks if churn prediction can be improved by modelling influential and other customers separately. To this goal, real users from training set are split to a set of influential and a set of all other users,
and churn prediction models are built on each set. Then, results of both models are combined and compared to the prediction model, built on all users together. Comparison is performed using appropriate statistical tests, where the results of both approaches are found statistically equal. We believe that the reason for this is a similar difficulty of the churn prediction task, regardless of using a set of influential users, other users, or all users, for building the model. Good churn-prediction-model interpretability is just as important as the accuracy of the model itself. In the fifth contribution, a solution for visualization
of decision trees is presented, which uses a textual representation of the decision tree as an input to visualize it in the form, appropriate for simple interpretation. We build the decision tree using the random decision trees method implemented in Weka, and transform it using Matlab to a form prepared for drawing. Additionally, the information on user rates and churn
rates in each tree branch are used in transformation for designing the nodes and the edges of the tree. The final tree visualization contains all the key information for identifying the churn reasons of the churners modelled in the tree. Additionally, a tree interpretation procedure is presented, which can be used in real scenarios in the companies engaged in customer care business.
|