One could say that frauds in telecommunications are the plague of the 21st century, because of its huge losses and extremely quick spreading. They include the theft of goods or telecommunication services, hidding existence of using the service, location of origin or user's destination, disclosure of the information about clients' accounts and so on, with the purpose to benefit from the loss of telecommunication operator, provider of the service or user of the telephone network.
From the definition we can conclude that there are many different frauds. We can divide them by two criteria. The first one is by the type of the fraud or the way of using of the network or service with the purpose to misuse. Similarly different methods of fraud are known with respect to the way of access to the network or service with the purpose to misuse.
In the thesis the swindlers are seen as outliers in the set of all users of the telecommunications network. Huge CDR files, which include the metadata of calls, are analysed with data mining. While doing experimental work, we face many challenges, such as unusual time formats, missing values, huge data bases, reduction of the dimension, time complexity and difficulties with distinguishing the normal and fraudulent behaviour. We search for potential swindlers with k-means and LOF methods.
|