Digitalization is very important for audio data archives as it increases the lifespan and persistence of stored data. In the process multiple options for semantic analysis emerge. This thesis is about segmentation of audio data, specifically the separation between speech and music in audio files which can be useful for instance for radio stations or streaming services such as Spotify and Netflix. Within the scope of this thesis a working segmentation algorithm, which takes a frequency-domain (meaning it is transformed using a discrete fourier transform) input and returns a list of features with their appropriate time stamps and probablities that the input signal at that specific time belongs to the class music, was developed. It is implemented as a Vamp plugin and with the help of Vampy, a wrapper plugin, it is programmed in Python. Performance of the developed plugin was also analysed and compared to other pre-existing implementations in Matlab and C#.
|