In recent years, large language models have been the most successful approach to natural language processing. An important problem in this field is natural language inference, which requires models to understand the real world to some degree. Requiring models to explain their reasoning offers us additional insight into their functioning. We tested several approaches for natural language inference in Slovene. We used two Slovene large language models, SloBERTa and SloT5, as well as much larger English model GPT-3.5-turbo. Training data consisted of Slovene dataset SI-NLI and additional 50,000 machine-translated samples from English dataset ESNLI. SloBERTa model was fine-tuned on both datasets. Fine-tuning it on SI-NLI achieves classification accuracy of 74.4 % on the SI-NLI test set. Pretraining it on ESNLI improves its accuracy to 75.3 %. We observe that models make different types of errors compared to humans and that they generalize poorly across different datasets. SloT5 was fine-tuned on ESNLI to generate explanations for natural language inference samples. Less than a third of explanations were appropriate, with the model learning common sentence patterns from the domain, producing semantically meaningless explanations. We assume that Slovene large language models with several hundred million parameters are capable of identifying and using language patterns, but language understanding is not inherently tied to understanding of reality. Even larger GPT-3.5-turbo was used both for classification and explanation generation. It achieves an accuracy of 56.5 % on SI-NLI test set using zero-shot learning, with 81 % explanations being appropriate for the correctly classified samples. In comparison with smaller Slovene models, this model shows a reasonably good understanding of reality, but is limited by its lesser Slovene proficiency.
|