Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Generiranje osnutkov novic iz javno dostopnih spletnih podatkov
ID
Džaferagić, Dino
(
Author
),
ID
Rožanc, Igor
(
Mentor
)
More about this mentor...
PDF - Presentation file,
Download
(854,12 KB)
MD5: 14E0AD9FDF4FC783EC92AB7035AD6D82
Image galllery
Abstract
Cilj diplomske naloge je bil razviti sistem za avtomatizirano generiranje osnutkov novinarskih novic na podlagi javno dostopnih podatkov. Za dosego tega cilja so bile uporabljene metode spletnega strganja, kar je prikazano na primeru podatkov o brezposelnosti Zavoda za zaposlovanje Republike Slovenije in o nogometnih tekmah s spletne strani SofaScore. Podatki so bili obdelani s tehnikami za čiščenje in transformacijo podatkov, nato pa so bili uporabljeni za generiranje osnutka v naravnem jeziku s pomočjo modela GPT-3.5-turbo iz knjižnice OpenAI. Osnutki novic so bili analizirani s pomočjo več metrik berljivosti kot so Flesch Reading Ease, Gunning Fog Index, Automated Readability Index, Läsbarhets Index, Type-Token Ratio, analiza sentimenta VADER in leksikalna gostota. Poleg tega so bile pridobljene povratne informacije s strani Slovenske tiskovne agencije (STA), ki je prepoznala potencial sistema za uporabo v novinarskem delu. Sistem je bil razvit z uporabo programskega jezika Python in več knjižnic kot so Selenium WebDriver, Requests in xlrd. Rezultati so pokazali, da je avtomatizirano generiranje osnutkov novic možno in učinkovito, ker lahko prihrani veliko časa novinarjem in zagotovi visoko stopnjo natančnosti ter razumljivosti besedil.
Language:
Slovenian
Keywords:
avtomatizacija pisanja novic
,
spletno strganje
,
generiranje naravnega jezika
,
OpenAI
,
GPT
,
metrike berljivosti
Work type:
Bachelor thesis/paper
Typology:
2.11 - Undergraduate Thesis
Organization:
FRI - Faculty of Computer and Information Science
Year:
2024
PID:
20.500.12556/RUL-161313
COBISS.SI-ID:
211527171
Publication date in RUL:
09.09.2024
Views:
294
Downloads:
73
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
DŽAFERAGIĆ, Dino, 2024,
Generiranje osnutkov novic iz javno dostopnih spletnih podatkov
[online]. Bachelor’s thesis. [Accessed 17 May 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=161313
Copy citation
Share:
Secondary language
Language:
English
Title:
Generation of news drafts from public web data
Abstract:
The aim of the thesis was to develop a system for automated generation of news drafts based on public data obtained through web scraping and subsequent processing. Initially, web scraping techniques were used to collect unemployment data from the website of the Employment Service of the Republic of Slovenia and data on football matches from the SofaScore website. The collected data was processed using various techniques for cleaning, transformation, and extraction of key information. The data was reorganized and converted into appropriate formats suitable for use in natural language generation (NLG) models. The GPT-3.5-turbo model from the OpenAI library was used for generating coherent and meaningful texts based on predefined templates and input data. The generated drafts were then analyzed using readability metrics such as Flesch Reading Ease, Gunning Fog Index, Type-Token Ratio, Automated Readability Index, Läsbarhets Index, sentiment analysis VADER and lexical density. Additionally, feedback was obtained from the Slovenian Press Agency (STA), which is considering using the generated news drafts in their workflow. The system was developed in Python with additional use of several libraries such as Selenium WebDriver, Requests, and xlrd. The results demonstrate automated news draft generation using advanced AI models is feasible and effective, significantly saving journalists' time while ensuring high accuracy and readability of the generated texts.
Keywords:
automated news writing
,
web scraping
,
natural language generation
,
OpenAI
,
GPT
,
readability metrics
Similar documents
Similar works from RUL:
Searching for similar works...
Similar works from other Slovenian collections:
Back