Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Repository of the University of Ljubljana
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Details
Strojno učenje v porazdeljenem okolju z uporabo paradigme MapReduce : magistrsko delo
ID
ORAČ, ROMAN
(
Author
),
ID
Robnik Šikonja, Marko
(
Mentor
)
More about this mentor...
,
ID
Lavrač, Nada
(
Comentor
)
PDF - Presentation file,
Download
(1,93 MB)
MD5: D22E1030CA8CADEFD767786AED062F08
PID:
20.500.12556/rul/6c360582-f842-4ce6-8f27-b3cf6fd1ee34
Image galllery
Abstract
Implementacija algoritmov strojnega učenja v porazdeljenem okolju prinaša več prednosti, kot sta zmožnost obdelave velikih množic podatkov in linearna pospešitev izvajanja z dodatnimi računskimi enotami. V magistrski nalogi opišemo paradigmo MapReduce, ki omogoča porazdeljeno računanje na računalniški gruči, in ogrodje Disco, ki ga implementira. Predstavimo sumarno obliko, ki je pogoj za učinkovito implementacijo algoritmov strojnega učenja s paradigmo MapReduce in opišemo implementacije izbranih algoritmov. Poleg tega predstavimo nove različice porazdeljenih naključnih gozdov, ki gradijo model na podmnožicah podatkov. Implementirane algoritme ovrednotimo s primerjavo z uveljavljenimi programi strojnega učenja. Magistrsko delo zaključimo z opisom vključitve implementiranih algoritmov v platformo ClowdFlows, ki omogoča sestavljanje, izvajanje in deljenje interaktivnih delotokov podatkovnega rudarjenja. S tem omogočimo obdelavo velikih paketnih podatkov z vizualnim programiranjem.
Language:
Slovenian
Keywords:
MapReduce
,
porazdeljeno računanje
,
Disco
,
strojno učenje
,
sumarna oblika
,
DiscoMLL
,
porazdeljeni naključni gozdovi
,
ClowdFlows
,
računalništvo
,
računalništvo in informatika
,
magisteriji
Work type:
Master's thesis/paper
Typology:
2.09 - Master's Thesis
Organization:
FRI - Faculty of Computer and Information Science
Publisher:
R. Orač
Year:
2014
Number of pages:
123 str.
PID:
20.500.12556/RUL-29969
COBISS.SI-ID:
1536017347
Publication date in RUL:
22.10.2014
Views:
1811
Downloads:
419
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
ORAČ, ROMAN, 2014,
Strojno učenje v porazdeljenem okolju z uporabo paradigme MapReduce : magistrsko delo
[online]. Master’s thesis. R. Orač. [Accessed 11 April 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=29969
Copy citation
Share:
Secondary language
Language:
English
Title:
Machine learning algorithms in distributed environment with MapReduce paradigm
Abstract:
Implementation of machine learning algorithms in a distributed environment ensures us multiple advantages, like processing of large datasets and linear speedup with additional processing units. We describe the MapReduce paradigm, which enables distributed computing, and the Disco framework, which implements it. We present the summation form, which is a condition for efficient implementation of algorithms with the MapReduce paradigm, and describe the implementations of the selected algorithms. We propose novel distributed random forest algorithms that build models on subsets of the dataset. We compare time and accuracy of the algorithms with the well recognized data analytics tools. We end our master thesis by describing the integration of the implemented algorithms into the ClowdFlows platform, which is a web platform for construction, execution and sharing of interactive workflows for data mining. With this integration, we enabled processing of big batch data with visual programming.
Keywords:
MapReduce
,
distributed computing
,
Disco
,
machine learning
,
summation form
,
DiscoMLL
,
distributed random forest
,
ClowdFlows
,
computer science
,
computer and information science
,
master's degree
Similar documents
Similar works from RUL:
Modelling productions and attractions of freight transport
Modelling the term premium: Cochrane-Piazzesi measure
Correlating mechanical and rheological filament properties to processability and quality of 3D printed tablets using multiple linear regression
Regression
Prediction models for identification of potential inhibitors for various Ebola virus proteins
Similar works from other Slovenian collections:
REGRESSION ANALYSIS
Comparfison of SVM, MLR and PCA methods in predicting photovoltaic
Forecasting of American tourists' arrival to Slovenia
Autoregressive integrated moving average models for forecasting electricity consumption
Study of the impact of different parameters on efficient ion removal from water solutions with membrane capacitive deionization
Back