Details

Deep Learning Methods for Synthetic Relational Data Generation
ID Hudovernik, Valter (Author), ID Štrumbelj, Erik (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (5,60 MB)
MD5: 3DA56745ACAE4A4DAD9D7DB4C83E84B9

Abstract
Real-world databases are predominantly relational, comprising multiple interlinked tables that contain complex structural and statistical dependencies. Learning generative models on relational data has shown great promise in generating synthetic data, which can power privacy-sensitive workloads and unlock access to previously underutilized data. However, existing methods often struggle to capture this complexity, typically reducing relational data to conditionally generated individual tables, imposing limiting structural assumptions and a fixed ordering of tables where there is none. To address these limitations, we introduce RelDiff, a novel diffusion generative method that jointly synthesizes all tables in a relational database by explicitly modeling their foreign key graph structure. RelDiff combines a joint graph-conditioned diffusion process for attribute synthesis and a graph generator based on the Stochastic Block Model for structure generation. The decomposition of graph structure and relational attributes ensures both high fidelity and referential integrity, both of which are crucial aspects of synthetic relational database generation. Experiments on 11 benchmark datasets demonstrate that RelDiff consistently outperforms state-of-the-art methods in producing realistic and coherent synthetic relational databases.

Language:English
Keywords:relational deep learning, graph neural networks, relational database, diffusion models, stochastic block models
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-174272 This link opens in a new window
COBISS.SI-ID:254428675 This link opens in a new window
Publication date in RUL:30.09.2025
Views:235
Downloads:51
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Uporaba metod globokega učenja za generiranje sintetičnih relacijskih podatkov
Abstract:
Podatkovne baze so večinoma relacijske, sestavljene iz povezanih tabel in z zapletenimi strukturnimi in statističnimi odvisnostmi. Učenje generativnih modelov na relacijskih bazah ima velik potencial za generiranje sintetičnih podatkov. Ti so uporabni pri modeliranju in analizi občutljivih podatkov ter tako omogočajo dostop do še neizkoriščenih virov podatkov. Obstoječe metode za generiranje sintetičnih relacijskih baz problem običajno poenostavijo na pogojno generiranje posameznih tabel, kar vsili dodatne predpostavke in fiksno zaporedje tabel, kjer vrstni red ne obstaja. Da bi odpravili te omejitve, predstavimo RelDiff, nov generativni model, ki hkrati sintetizira vse tabele v relacijski bazi z eksplicitnim modeliranjem njihove strukture preko grafa tujih ključev. RelDiff definira skupni difuzijski proces za sintezo atributov vseh tabel in generator grafa, ki temelji na stohastičnih bločnih modelih. Delitev generiranja strukture grafa in atributov omogoča visoko verodostojnost podatkov in referenčno integriteto, dva ključna vidika sintetičnih relacijskih podatkov. Rezultati na 11 relacijskih bazah podatkov kažejo, da RelDiff deluje bolje od obstoječih metod za generiranje verodostojnih sintetičnih relacijskih baz.

Keywords:relacijsko globoko učenje, grafovske nevronske mreže, relacijske podatkovne baze, difuzijski modeli, stohastični bločni modeli

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back