Installation and configuration of Hadoop MapReduce distributed system is quite time-consuming and requires thorough compliance with instructions. As such, it can cause considerable inconveniences to new users who would like to get familiar with the MapReduce programming model.
The aim of this thesis is to research the possibilities for straight forward installation and configuration of Hadoop distributed system. The thesis focuses on creating a distributed system using Hadoop software with the help of a solution called CDH, developed by an American company Cloudera Inc. The solution is based on Apache Hadoop platform. The company has released several versions of CDH software. The latest available CDH 5.5 can be run on different distributions of Linux operating system. The bachelor thesis is comprised out of instructions and tips which are necessary for the successful setup of such distributed system. The text researches various setup options and the creation of a system using predominantly virtualization. Furthermore, we are describing in detail, with examples, the process of running MapReduce jobs. In the end, we have a brief analysis which compares performance (scalability) of MapReduce jobs run on one, two and more nodes.
|