izpis_h1_title_alt

Evaluation and Comparison of Data Mining and Machine Learning Capabilities Within Relational Database Management Systems
ID Antolović, Vanda (Author), ID Kukar, Matjaž (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,28 MB)
MD5: A9924E5292039ED9D45B964E033D5D2A

Abstract
The evolution of data and data science technology started bringing machine learning algorithms to the data to ease the process of training and reduce the possibility of data corruption by transfers from system to system. We picked five combinations of Relational database management systems and integrated or semi-integrated machine learning toolsets - SQLite with Python, PostgresML with Python, MariaDB with MindsDB, PostgreSQL with MindsDB, and Oracle with Oracle Machine Learning. All five combinations were compared with the help of predictive performance and the training time they have achieved over seven datasets. MariaDB with MindsDB had the slowest training time, while MindsDB in general could not evaluate datasets containing longer strings or produce qualitative measures for assessing datasets with a regression target value, such as proper measurement of squared differences between the actual values and the estimated values. Oracle with Oracle Machine Learning produced the best results, as it was able to accurately evaluate all datasets with a fast training time. Even though the same is true for Python with SQLite, data had to be optimized and transformed into numerical for the main Python machine learning library, Scikit-learn, to be able to process the data. Considering all of that, a simple decision support system was created to help make a sensible decision on which toolset to use to suit the user’s needs.

Language:English
Keywords:machine learning, data mining, RDBMS, classification, regression, training time
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2022
PID:20.500.12556/RUL-143255 This link opens in a new window
COBISS.SI-ID:136461059 This link opens in a new window
Publication date in RUL:09.12.2022
Views:753
Downloads:120
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Ovrednotenje in primerjava orodij za podatkovno rudarjenje in strojno učenje znotraj sistemov za upravljanje relacijskih podatkovnih baz
Abstract:
Razvoj podatkov in podatkovne znanosti sta začela prinašati algoritme strojnega učenja neposredno k podatkom, kar je olajšalo proces učenja in zmanjšalo možnost poškodovanja podatkov s prenosom iz sistema v sistem. V magistrskem delu smo izbrali pet kombinacij sistemov upravljanja relacijskih baz podatkov in integriranih ali polintegriranih naborov orodij za strojno učenje - Python z SQLite, Python s PostgresML, MariaDB z MindsDB, PostgreSQl z MindsDB ter Oracle z Oracle Machine Learning. Vseh pet kombinacij smo primerjali s pomočjo metrike točnosti in časa učenja, ki so ga dosegli nad sedmih naborih podatkov. MariaDB z MindsdDB je imel napočasnejši čas učenja, medtem ko MindsDB ni mogel oceniti naborov podatkov, ki vsebuje večje nize, niti izdelati kakovostnih meril za oceno regresijeskega nabora podatkov. Oracle z Oracle Machine Learning je dosegel najboljše rezulate, saj je natančno ocenil vse nabore podatkov s hitrim časom učenja. Enako sicer, velja za Python s SQLite, vendar je bilo treba podatke optimizirati in spremeniti v numerične, da je lahko Pythonova glavna knjžnica za strojno učenje Scikit-learn obdelala podatke. Glede na vse je bil ustvarjen preprost sistem za podporo odločanju, ki pomaga sprejeti odločitev, kateri nabor orodij uporabiti pri danih potrebah uporabnikov.

Keywords:strojno učenje, podatkovno rudarjenje, relacijske podatkovne baze, klasifikacija, regresija, čas učenja

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back