Purpose: Mammographic density, which reflects breast tissue composition, is an important independent breast cancer risk factor. The purpose of this thesis was to build a model for mammographic density classification, based on the mammographic images and testing the hypothesis, that the model reliability can be improved by including longitudinal data in the study.
Data and methods: We used images, that were a part of the DORA database. 12190 processed images, recorded with the Siemens mammographs and 4787 processed images, recoreded with the Hologic mammographs were used for making the model for classification. We used the model on the images of women, that were imaged at least twice, with average time between imaging 2.1 years. 34053 processed images, recorded with the Siemens mammographs and 12601 processed images, recorded with the Hologic mammographs were used. Segmentation and feature extraction were performed with the LIBRA software. Feature selection was performed with the MRMR method, and we used multinomial logistic regression for the classifier. Prediction reliability was assessed by calculating Cohen's $\kappa$ coefficient. We compared our results to results from literature.
Results: Mammographic density prediction model's reliability was significantly improved by considering the reference images, and images taken prior to the reference. Improvement was 0.06 $\pm$ 0.03 for the closest and 0.38 $\pm$ 0.29 for the furthest measurements. Density predicion reliability was not significantly improved by considering images taken after the reference images. Improvement in that case was 0.02 $\pm$ 0.01 for the closest and 0.34 $\pm$ 0.53 for the furthest measurements.
Conslusion: Our density prediction model reaches at most a substantial agreement, 0.64 < $\kappa$ 0.81, and at worst a moderate agreement, 0.35 < $\kappa$ < 0.63. We concluded that considering multiple longitudinal measurements can significantly improve our model's reliability.
|