izpis_h1_title_alt

Podatkovne baze za podatkovno rudarjenje
ID LANGOF, LADO (Author), ID Kukar, Matjaž (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (2,08 MB)
MD5: 2C45794905C223F3E1FB4DFFE5305218
PID: 20.500.12556/rul/1d058f4a-c33b-42df-814e-59bd39385160

Abstract
Naloga se ukvarja z iskanjem sinergij med orodji za podatkovno rudarjenje in sistemi za upravljanje podatkovnih baz (SUPB). Predstavljajmo si situacijo analitičnega problema nad podatki, ki jih je preveč za obdelavo izključno v glavnem pomnilniku in premalo, da bi motivirali postavitev podatkovnega skladišča ali porazdeljenega analitičnega sistema. Ciljno področje je torej en sam osebni računalnik, s katerim rešujemo probleme podatkovnega rudarjenja. Zanima nas, ali obstajajo orodja, ki omogočajo učinkovito obdelavo in pripravo takšne količine podatkov za nadaljnjo analizo. Naloga se nanaša predvsem na drugi in tretji korak CRISP-DM standardnega modela podatkovnega rudarjenja, torej na razumevanje in pripravo podatkov, ne pa toliko na samo podatkovno rudarjenje. Preučuje, kako s pomočjo funkcionalnosti SUPB in orodij za luščenje, transformiranje ter nalaganje (ETL) čim bolj učinkovito pripraviti podatke za uporabo v podatkovnem rudarjenju. Podatkom zmanjšamo obseg in jih pretvorimo v ustrezno obliko. Optimizirani podatki, ki ne vsebujejo podvojenih zapisov, tiskarskih napak in drugih neželenih lastnosti ter vsebujejo le tiste atribute, ki jih lahko uporabimo, pozitivno vplivajo na hitrost in natančnost podatkovnega rudarjenja. Cilj naloge je torej poiskati primerne načine (orodja oz. kombinacije orodij, metodologije) kako pridobiti relativno velike količine podatkov iz različnih virov in oblik, jih združiti in pretvoriti v obliko, ki jo lahko neposredno uporabimo za podatkovno rudarjenje, pri tem pa uporabljati SUPB in ETL orodja.

Language:Slovenian
Keywords:ETL, ELT, CRISP-DM, SUPB, priprava podatkov, transformacije, podatkovno rudarjenje
Work type:Master's thesis/paper
Organization:FRI - Faculty of Computer and Information Science
Year:2015
PID:20.500.12556/RUL-30834 This link opens in a new window
Publication date in RUL:19.06.2015
Views:2505
Downloads:470
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Databases for Data Mining
Abstract:
This work is about looking for synergies between data mining tools and databa-se management systems (DBMS). Imagine a situation where we need to solve an analytical problem using data that are too large to be processed solely inside the main physical memory and at the same time too small to put data warehouse or distributed analytical system in place. The target area is therefore a single personal computer that is used to solve data mining problems. We are looking for tools that allows us to effectively process and prepare such quantity of data for further analysis. The main focus of this work is not on data mining itself but in particular on the second and third step of CRISP-DM process standard for data mining, that is data understanding and data preparation step. The question is how to use functionalities of various DBMS and ETL tools to prepare data as effectively as possible to use it in data mining. Unneeded data should be ignored and the remainder should be transformed into an appropriate form. Data mining execution time and accuracy should be improved when using optimized data that do not contain unneeded attributes, duplicate records, typos and other unwanted properties. The objective of this work is thus to find appropriate practical methods (tools or combinations of tools, methodologies) for collecting relatively large amounts of data from different sources and in different forms, joining them and transforming this data to a format that can be used directly in data mining algorithms by using DMBS and ETL tools.

Keywords:ETL, ELT, CRISP-DM, DBMS, data preparation, transformations, data mining

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back