Comma placement is the most frequent orthological mistake in Slovene. The thesis focuses on comma placement using deep neural networks. We present two architectures, one based on neural networks with GRU cells and another using a pre-learned BERT language model.
Using a pre-learned BERT language model, we get better classification accuracy. The reason for this is better and more complex architecture and the learning process, which fine-tuned a pretrained model with substantial language knowladge. With the multilingual BERT, trained on 104 languages with only a small amount of Slovene texts, we achieve comparable results to Slovene-Croatian-English BERT model, trained with much more Slovene texts.
|