In this thesis, we test the accuracy and usefulness of the Google speech recognition application for the Slovenian language. We also want to find out the limitations and problems of the application.
In the first chapter, there is a short overview of history and the most important developments in the field of speech recognition. We can see how deep learning and use of neural networks has profoundly revolutionized automatic speech recognition. Also, a lot of commercial auto speech recognition systems nowadays are based on deep learning methods.
Some of these commercial systems are described in the second chapter. The commercial systems for speech recognition have been developed by big corporations, such as Microsoft, Google, and Apple. These systems are a part of the programs that work as intelligent personal assistants. Personal assistants use natural language user interface to answer questions, make recommendations, and perform actions.
We tested the accuracy of Google application with the tool HResults. The tests were made on three different groups of samples. The first group included samples of reading that are slow and easy to understand. In the second group, there were samples of faster reading. The third group encompassed speech that is the hardest to understand. The accuracy of recognition was different between different groups of samples. In the first group, recognition accuracy was 88%, recognition accuracy in the second group was 79%, and in the third group 62%.
The recognition results are very good for slow reading. These results are similar to results published by advertisements for commercial recognition systems.
We can also see that Google application has problems when it tries to recognize faster reading. The recognizer has trouble recognizing short words such as conjunctions and prepositions.
When the application recognizes speech that is the hardest to understand, the results are even worse because there are so many unnecessary interruptions and interjections in the speech.
We conclude that the Google application is very useful when we speak clearly and we do not use the application in situations where we need perfect accuracy of recognition.
We also measure the accuracy of recognition at two different time points. The test shows the accuracy of recognition is improving over time.
In the last chapter, we describe the development of our own computer program for sending emails with the help of Google application for speech recognition. Google application is easy to implement into a computer program. The application is also very reliable, but has a limitation on duration of the speech. Time limitation can cause some problems when we want to recognize longer speech files. These problems are solved by cutting audio files into smaller pieces without losing any of the important data. Our program for sending email uses PyAudio for recording audio and it uses Sound eXchange for conversion between different audio formats.
When we get the whole transcription of speech from Google application, the program replaces all dictated punctuation marks with real symbols for punctuation marks. At the end, the computer program sends email to the chosen address. These types of programs are useful in situations where the user cannot use a keyboard.
|