The goal of the thesis was to develop a system for conversation with virtual characters based
on speech recognition, response generation, and speech synthesis.
During implementation, it was necessary to review existing artificial intelligence (AI) tools,
including those for text and speech generation, test them, and integrate them into a system that
enables us to use the system, i.e., speech interaction with a virtual character.
The selection of tools was based on the criteria of quality, feasibility, output result, response
speed, and compatibility with the planned system architecture. The key was the service of large
language models (LLMs/Large Language Models), as they represent the central part of the
system and define responses to input questions.
This was followed by the implementation of converting text responses into speech, integrating
speech recognition, and its subsequent conversion into text.
The result of the thesis is a working system that combines advanced services of large language
models and enables speech interaction with a virtual character.
As an addition, we have also developed a player that is primarily intended for playing audio,
but can also be used to insert any video clip to facilitate interaction with the character. The
system thus represents the basis for further development, which will be directed towards
advanced video display.
|