Historical parliamentary records are valuable sources for understanding political and social processes of the past, yet exploring large corpora remains time-consuming. This thesis investigates the use of Retrieval-Augmented Generation (RAG) for querying and answering questions over a corpus of session records from the Carniolan Provincial Assembly (1861–1913) and the National Assembly of the Kingdom of Yugoslavia (1919–1939). The corpus contains texts in Slovenian, German, Croatian, and Serbian. We compared various text segmentation strategies, vector embedding methods, retrieval techniques, and open-source generative models. Results show that qwen3:8b achieves the best answer quality for both Slovenian and English questions, while gemma3:4b and llama3.1:8b offer a good balance between quality and speed. We also find that most models provide better support for English, which presents a challenge when applying RAG to less-resourced languages.
|