The aim of the thesis was to develop a system for automated generation of news drafts based on public data obtained through web scraping and subsequent processing. Initially, web scraping techniques were used to collect unemployment data from the website of the Employment Service of the Republic of Slovenia and data on football matches from the SofaScore website. The collected data was processed using various techniques for cleaning, transformation, and extraction of key information. The data was reorganized and converted into appropriate formats suitable for use in natural language generation (NLG) models. The GPT-3.5-turbo model from the OpenAI library was used for generating coherent and meaningful texts based on predefined templates and input data. The generated drafts were then analyzed using readability metrics such as Flesch Reading Ease, Gunning Fog Index, Type-Token Ratio, Automated Readability Index, Läsbarhets Index, sentiment analysis VADER and lexical density. Additionally, feedback was obtained from the Slovenian Press Agency (STA), which is considering using the generated news drafts in their workflow. The system was developed in Python with additional use of several libraries such as Selenium WebDriver, Requests, and xlrd. The results demonstrate automated news draft generation using advanced AI models is feasible and effective, significantly saving journalists' time while ensuring high accuracy and readability of the generated texts.
|