The dissertation is focused in the performance and efficiency improvements of multi-core based networking devices. The main motivation for the improvements presented in this work are constantly growing demands for fast and reliable network communications between all types of electronic devices. These devices offer to users an ever increasing number of network-based services, which require a continuous connectivity to computer networks. Service providers, which must maintain these large networks, demand efficient networking devices, so they can provide cost-effective high-quality services. In order to meet these demands, single-core based networking devices are being replaced by multi-core based devices, which are able to offer significantly more performance and at the same time consume less power. The usage of multi-core networking devices itself, however, does not improve the performance, unless devices are modified in a way that all available hardware resources are utilized. Therefore, to make multi-core networking devices efficient, the process of handling the network traffic must be parallelized, which can be achieved by two different concepts: a) by a distribution of the network traffic among available processor cores and b) by a parallel implementation of offered network functionalities. The main goal of our work is to develop and evaluate two innovative improvements of networking devices, implemented on multi-core architectures, which can increase the performance and efficiency of networking. The first improvement is in development and implementation of an adaptive network-traffic-distribution method, which is a combination of packet-based and flow-based traffic distributions. In this method each core is assigned a specific amount of tokens, which represent the number of network packets that the core is allowed to process. Each processed packet consumes one token. The tokens are redistributed periodically according to the average core load, so the load of packet processing is balanced among available cores. If a core runs out of tokens, it assigns the packet to the nearest adjacent core, possibly operating on the shared cache memory, which minimizes the time of inter-core communications. We experimentally validated the method by integrating it in to the Linux Bridge and performed the tests with the “worst case” scenario, with one dominant flow, and the “backbone-link” scenario, with a large number of flows that have a similar packet rate. In the first case, the performance in traffic throughput is improved by a factor of 2.8 by utilizing four processing cores. In the second case with a large amount of traffic flows, the performance remains similar to the existing state-of-the-art flow-based methods. The second improvement is a parallelization of the network-traffic encryption process, which is used to ensure the safety and privacy of network communications. We combined functional and data decompositions to create many tasks in common encryption algorithm implementations that can run in parallel. These tasks, however, must be synchronized, which reduces the efficiency of the parallelization. Because encryption algorithms have low computational complexity, even low synchronization overhead can nullify the improvements of the parallelization. We therefore minimized the time of inter-task communication by assigning the tasks to adjacent cores with common cache memory and by using atomic variables and lock-free queues for network packet storage. The results of the verification show, that we achieve, on a computer with twelve cores, speedups of 1.9, 6.3 and 7.6 with encryption algorithms AES, 3DES and RSA, respectively.
In addition to the two presented methods we also defined a methodology for systematic efficiency evaluation of multi-core based networking devices. We defined key criteria that include standard performance and quality-of-service metrics as well as other indicators, which evaluated the utilization of system resources e.g., core load, cache hit ratio, speedup and parallel efficiency. We described steps required to perform the systematic evaluation, which include establishing a testing environment, preparing testing tools, conducting testing procedures defined according to evaluation criteria, and analyzing the results. The established testing methodology was used to compare different implementations of networking devices with the focus on the comparison of traditional hardware-defined networking devices with the emerging software-defined networking devices, which are implemented entirely in software and run on the commercial-of-the-shelf hardware. The results have shown, that hardware defined networking devices achieve more performance and are also significantly more energy efficient than software-defined devices. The latter are on the other hand much more flexible, which results in a simple and cost effective development. Due to their flexibility, the previously described performance-improvement methods can be more easily embedded in the software-defined devices. Additionally, they can be easily used in contemporary networking concepts such as the network functions virtualization.
|