The thesis focuses on the prediction of final punctuation in sentences. Punctuation prediction models are useful in speech recognition and potentially in correcting various texts. We want to predict where sentences end and whether they end with an exclamation, a period or a question mark. Our implementation predicts whether and what punctuation to place after each word. We used two Slovene variants of BERT model, both successful in natural language processing tasks. The CroSloEngual BERT model has been pretrained on Slovenian, Croatian and English language. We compared it to SloBERTa model, trained exclusively on Slovenian corpora. We fine-tuned these models on prepared data sets. Results show that SloBERTa model is better at predicting punctuation than the CroSloEngual BERT model. Results show that predicting exclamation mark is difficult due to a low number of training instances.
|