izpis_h1_title_alt

Poravnava besedil in zvočnih posnetkov slovenskega govora in petja
ID Žakelj, Mark (Author), ID Marolt, Matija (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,50 MB)
MD5: 2CCC5AD5E25CBFB8322A76E76D1813D6

Abstract
V diplomski nalogi podamo splošno uporabno rešitev za problem poravnave zvočnega posnetka in pripadajoče transkripcije. Rešitev je sestavljena iz treh komponent: segmentacija posnetka, razpoznavanje govora in poravnava besedil. V nalogi se osredotočimo na uporabo različnih akustičnih modelov za razpoznavanje govora in uporabo različnih metod dekodiranja izhodov modela. Predlagamo tudi razširitev obstoječega algoritma za poravnavo besedil, s čimer zagotovimo poravnavo za vsako besedo v originalnem besedilu. Sistem ovrednotimo na nenarečnem in narečnem govoru ter na narečnem petju brez spremljave, pri čemer uporabimo tri metrike bazirane na absolutni napaki poravnav. Poravnava govora se izkaže za kvalitetno in je primerljiva s kvaliteto podobnih sistemov v tujih jezikih.

Language:Slovenian
Keywords:poravnava besedil, razpoznavanje govora, CTC, jezikovni model, narečni govor, konvolucijska nevronska mreža
Work type:Bachelor thesis/paper
Organization:FRI - Faculty of Computer and Information Science
Year:2021
PID:20.500.12556/RUL-130487 This link opens in a new window
COBISS.SI-ID:78692355 This link opens in a new window
Publication date in RUL:15.09.2021
Views:1168
Downloads:102
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Text to audio alignment of Slovenian speech and singing
Abstract:
In this thesis, we build a general-purpose solution for the alignment of the voice recording and the associated transcription. The solution consists of three components: sound segmentation, speech recognition, and text alignment. This thesis focuses on the use of different acoustic models for speech recognition and the use of different methods of decoding model outputs. We also propose a new extension of the existing text alignment algorithm to provide alignment of each word in the original text. The system is evaluated on non-dialectal and dialectal speech and unaccompanied dialectal singing, using three metrics based on absolute alignment error. Speech alignment proves to be of good quality and is comparable to the quality of similar systems in foreign languages.

Keywords:text alignment, speech recognition, CTC, language model, dialectal speech, convolutional neural network

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back