The master’s thesis deals with the problem of interpreting black box machine learning models, explaining predictions with attributes’ contributions, and identifying attribute interactions in data sets. We address this in the case of grapevine disease, which causes crop losses and economic damage. Identification of useful gene markers would allow for earlier detection and containment of infection. The contribution of our work is in a new algorithm for identifying attribute interactions that consists of a method for interpreting model predictions called SHAP, and an algorithm for association rule mining called Apriori. The proposed algorithm successfully identified important attributes and their interactions on artificially generated data sets. The important attributes (genes) we found in the biological data set confirm the results of a previous study that served as a reference point. We also discovered strong association rules that describe possible attribute (gene) interactions, but were unable to draw any conclusions due to a lack of biological information about gene interactions in defense response of grapevine. The advantage of the algorithm is that it generates association rules based on attribute importance rather than using actual gene expression values, thus avoiding different levels of actual gene expression and only considering the genes’ contribution in the machine learning model. It can be used to find higher order interactions and the results (association rules) are interpretable. Its disadvantage is that the generation of association rules is done on discretized numerical attributes and is therefore dependent on the discretization method.
|