Manual transcription of speech is slow and is being replaced by automatic speech recognition systems. These systems are also used for voice control of various programs and devices. In this thesis, we used as a baseline for Slovene speech recognition GMM-HMM methods for acoustic model and n-grams for language model. We improved both models with deep neural networks, which have proven to be very successful. We tested several architectures of time-delayed neural networks and neural networks with long short-term memory for both acoustic and language model. We used a large lexicon, containing about a million words. Time-delayed neural networks achieved the best results on continuous speech, with 72,84% of correctly identified words.
|