Due to our society aging, there is an increase of healthcare personnel who are retiring, which in turn leads to additional workload for the remaining staff. Our master’s thesis was aimed at reducing non-emergency patient walk ins in public health facilities. Thus, we have created a chatbot named ZdraBOT, which can provide a rough diagnosis for 24 diseases based on the user’s current symptoms.
Since medical databases are not publicly available especially in uncommon languages like Slovene, we decided to translate one of the publicly available English collections into Slovenian language. The resulting dataset which contained 88 thousand sentences, was used to train our BERT model, which was based on the SloBERT model. We then used the created model inside Rasa client, which gathered all the necessary information from the user, whit which we then tried to find an approximate diagnosis with a sufficiently large level of confidence. The diagnosis was made using the cosine similarity algorithm between the user’s symptoms and the 24 known diseases. For the user interface, we created an Android application which connected to the previously mentioned Rasa client.
What we found during the BERT training phase is that the translated dataset alone might not be enough, as the model seemed too overfit to the data provided. Regardless of the aforementioned issues, our BERT model was able to on average identify one to two symptoms form the user’s messages. During testing we made 87 diagnoses, of which 62 % were correct. In most cases the incorrect diagnosis was chosen because many of the symptoms were overlapping for diseases in the same group. This was most obvious for pulmonary and infectious diseases. We also noticed that the cosine similarity algorithm is not the best option for matching diseases whit the user’s symptoms, because due to the different number of symptoms, it favours diseases with a smaller number of symptoms.
|