Details

Graph Representation Learning for Evaluation of Synthetic Relational Data
ID Jurkovič, Martin (Author), ID Šubelj, Lovro (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,30 MB)
MD5: 1662BABFD52A2EAC417CD6239C33D3EC

Abstract
Evaluating the utility of synthetic relational databases is challenging, as existing approaches rely on manual feature engineering or single-table flattening, which obscure relational structure and reduce scalability. This thesis introduces RDL-utility, a general framework that represents relational databases as heterogeneous graphs and trains graph neural networks (GNNs) directly on the graphs. Using a standardized AutoComplete task, RDL-utility measures how well models trained on synthetic data perform on real held-out data. Experiments on five real-world databases, including an ablation study across six GNN architectures, show that diffusion-based generative methods achieve the highest utility, although no single method consistently outperforms all others. RDL-utility provides a reproducible, structure-sensitive evaluation, establishing a foundation for future research and applications.

Language:English
Keywords:synthetic data, relational databases, data generation, graph representation learning, graph neural networks, empirical comparison, data quality evaluation, utility
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-173771 This link opens in a new window
COBISS.SI-ID:254172931 This link opens in a new window
Publication date in RUL:22.09.2025
Views:475
Downloads:210
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Učenje grafovskih predstavitev za evalvacijo sintetičnih relacijskih podatkov
Abstract:
Ocenjevanje uporabnosti sintetičnih relacijskih baz podatkov je zahtevno, saj obstoječi pristopi temeljijo na ročnem ustvarjanju značilk ali združevanju tabel v eno samo, kar zakriva relacijsko strukturo in zmanjšuje razširljivost. Ta magistrska naloga uvaja RDL-utility, splošni pristop, ki relacijske baze podatkov pretvori v heterogene grafe in na njih neposredno trenira grafovske nevronske mreže (GNN). Z uporabo standardizirane prediktivne naloge AutoComplete, RDL-utility meri, kako dobro modeli, naučeni na sintetičnih podatkih, delujejo na resničnih, ločenih testnih podatkih. Eksperimenti na petih realnih podatkovnih bazah, vključno s študijo preko šestih GNN arhitektur, kažejo, da generativni pristopi, ki bazirajo na difuziji, dosegajo najvišjo uporabnost, čeprav noben posamezen pristop dosledno ne prekaša vseh drugih. RDL-utility zagotavlja reproducibilno, na strukturo občutljivo ocenjevanje ter postavlja temelje za prihodnje raziskave in aplikacije.

Keywords:sintetični podatki, relacijske baze, generiranje podatkov, učenje grafovskih predstavitev, grafovske nevronske mreže, empirična primerjava, evalvacija kvalitete podatkov, uporabnost

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back