The purpose of this paper is to evaluate the possibility of using objective measures for the purpose of assessing the quality of synthetic speech.
An important step in the development of artificial speech synthesizers product quality assessment. At present subjective methods are the best option for evaluation of text-to-speechsystems, even though they are both time and resource demanding. A method, which would be able to quickly and easily check the quality of the machine itself, as well as different ways of calibrating said machine, would be most useful. In this work we explored two methods that could be used to evaluate synthetic speech called the Mel-Cepstrum distance and DTW (dynamic time warping). The test was carried out on four differenttext-to-speech systems, and evaluated using a simple subjectiveevaluation.
The introductory chapter summarizes the history, and composition of artificial speech followed by a presentation of hidden Markov models for the purpose of speech synthesis. The third chapter deals with the subject of both subjective and objective assessment methods where emphasis is given to showing how both DTW and Mel-Cepstrum distances are calculated.
In the final chapter, we explain how the evaluation process wascarried out and what the results are. Followed by our interpretation of said results, and possible applications and further development.
|