The master's thesis presents the setup of a development environment for testing automation to achieve a highly available data center. It describes the development of automation, which was partially developed by me for the purpose of completing the thesis. While the thesis is my work, the solutions presented in it were created in collaboration with colleagues from the company where I am employed, as we developed the solution for one of our clients. This highlights that the solution presented in the thesis is successfully used in a production environment and has already been successfully applied in certain situations.
The first part of the document mentions and explains best practices in data centers to achieve high availability. It also describes the difference between disaster prevention in a data center and disaster recovery. Some of the methods for achieving system recovery capabilities in the event of unwanted incidents, as well as the reasons for achieving high availability in data centers, are discussed.
In any case, it is preferable to avoid outages altogether. The document presents a developed solution aimed at preventing outages or disasters (disaster prevention). The components of the development environment and automation, which were tested and developed in the development environment, are described. The components of the development environment and those on which the implemented solution is based are mainly products from VMware and Cisco, while Ansible is used for automation. All components are described in detail.
In the final part of the thesis, tests were conducted in the development environment, where I measured the downtime of network gateways, virtual machines, and web applications during migration in the event of a disaster from one data center to another. As expected, the migration of virtual machines using automation was faster, more reliable, and more precise compared to manual migration. The tests and measurements showed that, by using automation, network gateways were on average unreachable for 2 minutes and 19 seconds less, virtual machines for 5 minutes and 41 seconds less, and web applications for 4 minutes and 15 seconds less than with manual migration. Furthermore, an administrator performing manual migration required on average 5 minutes and 16 seconds more than the automated script.
|