Obdelava velikih količin podatkov v oblaku : diplomsko delo

Holub, Sandi

Obdelava velikih količin podatkov v oblaku : diplomsko delo
ID Holub, Sandi (Author), ID Zrnec, Aljaž (Mentor) More about this mentor... This link opens in a new window

URL - Presentation file, Visit http://eprints.fri.uni-lj.si/2618/ This link opens in a new window

Abstract

Velika količina podatkov ali Big Data v svetu informatike pridobiva na prepoznavnosti. To so orodja, ki omogočajo shranjevanje in poizvedovanje po veliki količini podatkov. Hadoop je odprtokodni projekt podjetja Apache, ki združuje orodja za shranjevanje, obdelavo in poizvedovanje po strukturiranih ali nestrukturiranih podatkih. Podatke je potrebno obvladovati s pomočjo primerne infrastrukture, največkrat kar gruče računalnikov. Pri tem si lahko pomagamo z oblakom, če ne želimo imeti infrastrukture pri sebi. YARN (angl. Yet Another Resource Negotiator), MapReduce, Pig in HDFS (angl. Hadoop Distributed File System) so osnovne komponente projekta Hadoop in pripomorejo k enostavni implementaciji prve različice programske opreme. Z diplomsko nalogo si lahko bralec pomaga pri postavitvi osnovne gruče Hadoop in izdelavi Java aplikacije ali skripte Pig. Časovna in cenovna primerjava poganjanja v oblaku in lokalni gruči pa pomaga pri odločanju o nakupu infrastrukture.

Language:	Slovenian
Keywords:	Hadoop, MapReduce, velika količina podatkov, gruča, Pig, oblak, računalništvo, visokošolski strokovni študij, računalništvo in informatika, diplomske naloge
Work type:	Bachelor thesis/paper
Typology:	2.11 - Undergraduate Thesis
Organization:	FRI - Faculty of Computer and Information Science
Publisher:	[S. Holub]
Year:	2014
Number of pages:	82 str.
PID:	20.500.12556/RUL-68693
UDC:	004.934(043.2)
COBISS.SI-ID:	10718292
Publication date in RUL:	10.07.2015
Views:	1749
Downloads:	282
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Processing large amounts of data in the cloud
Big Data is gaining recognition in the world of information technology. These are tools that allow saving and retrieval of a large amount of data. Hadoop is an open source project of the company Apache, which combines the tools for storage, processing and retrieval of structured and or unstructured data. Data need to be managed with the help of appropriate infrastructure, which in most cases are clusters of computers. We can help ourselves using a cloud, if we do not wish to have the infrastructure nearby. YARN, MapReduce, Pig and Hadoop Distributed File System (HDFS) are the basic components of the Hadoop project and contribute to simple implementation of the first version of software. The reader can use this diploma thesis as help in setting up the basic Hadoop cluster and develop a Java application or Pig script. The time and price comparison of running in the cloud or local cluster can also help as decision-making process when buying infrastructure.
Keywords:	Hadoop, MapReduce, large amount of data, cluster, Pig, cloud, computer science, computer and information science, diploma

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents