Details

Benchmarking Machine Learning Methods on Unified Stroke Data
ID Trajkov, Dimitar (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window, ID Kocev, Dragi (Comentor), ID Kostovska, Ana (Comentor)

.pdfPDF - Presentation file, Download (6,47 MB)
MD5: 8B0C1FE4931CA8D3192E9E709EA8C9CA

Abstract
Stroke is one of the leading causes of death and disability, but the development of predictive models using machine learning (ML) has the potential to reduce the number of fatalities. However, progress is hampered by a lack of high-quality public datasets and challenges in research reproducibility. This thesis presents a framework for evaluating ML models on public stroke data. We collected several public datasets and used nested cross-validation to evaluate different ML algorithms. We created a semantic model based on public ontologies (OntoExp, Schema.org) to document the entire experimental process, making the data and results FAIR (findable, accessible, interoperable, reusable). The annotated data is stored in a public knowledge graph, accessible via SPARQL endpoint. For easier access we developed an interactive online catalog (http://semantichub.ijs.si/StrokeBench/), which allows data to be explored without technical knowledge. The framework enables the construction of reliable artificial intelligence for predicting stroke. Future work will add new datasets, advanced models, and a natural language interface powered by large language models for easier data querying.

Language:English
Keywords:Stroke prediction, ML benchmarking, Reproducibility, Ontology-based annotation
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-171663 This link opens in a new window
COBISS.SI-ID:247607043 This link opens in a new window
Publication date in RUL:29.08.2025
Views:318
Downloads:121
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Vrednotenje metod strojnega učenja na enotni bazi podatkov o možganski kapi
Abstract:
Možganska kap je eden izmed vodilnih vzrokov smrti in invalidnosti po svetu. Z razvojem prediktivnih modelov z uporabo strojnega učenja (SU) lahko znižamo število smrti. Napredek ovirajo predvsem pomanjkanje visokokakovostnih, javno dostopnih podatkovnih zbirk ter težave z ponovljivostjo raziskav. V delu predstavljamo okvir za ocenjevanje modelov SU, ki so naučeni na javnih podatkovnih zbirkah o možganski kapi. Zbrali smo več podatkovnih zbirk in z uporabo vgnezdenega prečnega preverjanja ovrednotili različne algoritme SU. Za zagotavljanje kakovosti in transparentnosti smo ustvarili semantični model, ki temelji na javnih ontologijah (OntoExp, Schema.org). S tem modelom smo označili celoten eksperimentalni proces, kar zagotavlja, da so podatki in rezultati skladni z načeli poštenega ravnanja s podatki. Označeni podatki so shranjeni v javnem grafu znanja, dostopnem prek SPARQL. Za lažji dostop smo razvili interaktivni spletni katalog (http://semantichub.ijs.si/StrokeBench/), ki omogoča pregled podatkov brez tehničnega znanja. Okvir omogoča razvoj zanesljivih modelov SU za napovedovanje možganske kapi. V prihodnje načrtujemo dodajanje novih podatkovnih zbirk, naprednejših modelov in vmesnika za iskanje podatkov z naravnim jezikom na osnovi velikih jezikovnih modelov.

Keywords:Napoved možganske kapi, Primerjalna analiza strojnega učenja, Reproducibilnost, Ontološko označevanje

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back