Poravnava besedil in zvočnih posnetkov slovenskega govora in petja

Žakelj, Mark

Repository of the University of Ljubljana

Details

Poravnava besedil in zvočnih posnetkov slovenskega govora in petja
ID Žakelj, Mark (Author), ID Marolt, Matija (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,50 MB)
MD5: 2CCC5AD5E25CBFB8322A76E76D1813D6

Abstract

V diplomski nalogi podamo splošno uporabno rešitev za problem poravnave zvočnega posnetka in pripadajoče transkripcije. Rešitev je sestavljena iz treh komponent: segmentacija posnetka, razpoznavanje govora in poravnava besedil. V nalogi se osredotočimo na uporabo različnih akustičnih modelov za razpoznavanje govora in uporabo različnih metod dekodiranja izhodov modela. Predlagamo tudi razširitev obstoječega algoritma za poravnavo besedil, s čimer zagotovimo poravnavo za vsako besedo v originalnem besedilu. Sistem ovrednotimo na nenarečnem in narečnem govoru ter na narečnem petju brez spremljave, pri čemer uporabimo tri metrike bazirane na absolutni napaki poravnav. Poravnava govora se izkaže za kvalitetno in je primerljiva s kvaliteto podobnih sistemov v tujih jezikih.

Language:	Slovenian
Keywords:	poravnava besedil, razpoznavanje govora, CTC, jezikovni model, narečni govor, konvolucijska nevronska mreža
Work type:	Bachelor thesis/paper
Organization:	FRI - Faculty of Computer and Information Science
Year:	2021
PID:	20.500.12556/RUL-130487
COBISS.SI-ID:	78692355
Publication date in RUL:	15.09.2021
Views:	1740
Downloads:	170
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Text to audio alignment of Slovenian speech and singing
In this thesis, we build a general-purpose solution for the alignment of the voice recording and the associated transcription. The solution consists of three components: sound segmentation, speech recognition, and text alignment. This thesis focuses on the use of different acoustic models for speech recognition and the use of different methods of decoding model outputs. We also propose a new extension of the existing text alignment algorithm to provide alignment of each word in the original text. The system is evaluated on non-dialectal and dialectal speech and unaccompanied dialectal singing, using three metrics based on absolute alignment error. Speech alignment proves to be of good quality and is comparable to the quality of similar systems in foreign languages.
Keywords:	text alignment, speech recognition, CTC, language model, dialectal speech, convolutional neural network

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents