In the presented work, we implemented, tested and evaluated the effectiveness of selected computational methods for improving speech quality in audio signals, where we used speech recordings in the Slovenian language. Speech enhancement methods are commonly used as a pre-process in automatic speech recognition systems, as the elimination of disturbances and noises mixed with the speech signal reduces the possibility of incorrect speech recognition. The use of such methods is indispensable, especially in video calling applications. As part of the presented work, we tested two models of generative adversary neural networks, namely the SEGAN neural network model and the Wave-U-Net neural network model. Both models were trained and tested using machine learning methods with the Slovene language speech database, which was acquired as part of the project Development of Slovene in the Digital Environment (RSDO). The performance of the models and methods used was finally evaluated and compared with the measures commonly used to evaluate the quality of speech sound recordings. We analyzed the operation of both methods used and the differences in their performance when using Slovenian and English language speech recordings.
|