Nowadays, different types of online services are often deployed and operated on the cloud since it offers a convenient on-demand model for renting resources and easy-to-use elastic infrastructures. Moreover, the modern software engineering discipline provides means to design time-critical services based on a set of components running in containers. Container technologies, such as Docker, Kubernetes, CoreOS, Swarm, OpenShift Origin, etc. are enablers of highly dynamic cloud-based services capable to address continuously varying workloads. Due to their lightweight nature, they can be instantiated, terminated and managed very dynamically. Container-based cloud applications require sophisticated auto-scaling methods in order to operate under different workload conditions, such as drastically changing workload scenarios.
Imagine a cloud-based social media network website in which a piece of news suddenly becomes viral. On the one hand, in order to ensure the users’ experience, it is necessary to allocate enough computational resources before the workload intensity surges at runtime. On the other hand, renting expensive cloud-based resources can be unaffordable over a prolonged period of time. Therefore, the choice of an auto-scaling method may significantly affect important service quality parameters, such as response time and resource utilisation. Current cloud providers, such as Amazon EC2 and container orchestration systems, such as Kubernetes employ auto-scaling rules with static thresholds and rely mainly on infrastructure-related monitoring data, such as CPU and memory utilisation.
This thesis presents a new Dynamic Multi-Level (DM) auto-scaling method with dynamically changing thresholds used in auto-scaling rules which exploit not only infrastructure, but also application-level monitoring data. The new DM method is implemented to be employed according to our proposed innovative viable architecture for auto-scaling containerised applications. The new DM method is compared with seven existing auto-scaling methods in different synthetic and real-world workload scenarios. These auto-scaling approaches include Kubernetes Horizontal Pod Auto-scaling (HPA), 1\textsuperscript{st} method of Step Scaling (SS1), 2\textsuperscript{nd} method of Step Scaling (SS2), 1\textsuperscript{st} method of Target Tracking Scaling (TTS1), 2\textsuperscript{nd} method of Target Tracking Scaling (TTS2), 1\textsuperscript{st} method of static THRESHOLD-based scaling (THRES1), and 2\textsuperscript{nd} method of static Threshold-based scaling (THRES2). All investigated auto-scaling methods are currently considered as advanced approaches, which are used in production systems such as Kubernetes, Amazon EC2, etc. Workload scenarios which are examined in this work also consist of slowly rising/falling workload pattern, drastically changing workload pattern, on-off workload pattern, gently shaking workload pattern, and real-world workload pattern.
Based on experimental results achieved for each workload pattern, all eight auto-scaling methods are compared according to the response time and the number of instantiated containers. The results as a whole show that the proposed DM method has better overall performance under varied amount of workloads than the other auto-scaling methods. Due to satisfactory results, the proposed DM method is implemented in the SWITCH software engineering system for time-critical cloud-based applications. Auto-scaling rules along with other properties, such as characteristics of virtualisation platforms, current workload, periodic QoS fluctuations and similar, are continuously stored as Resource Description Framework (RDF) triples in a Knowledge Base (KB), which is included in the proposed architecture. The primary reason to maintain the KB is to address different requirements of the SWITCH solution stakeholders, such as those of cloud-based service providers, allowing for seamless information integration, which can be used for long-term trends analysis and support to strategic planning.
|