Details

Odstranjevanje mašil v govoru
ID Vesel, Urban (Author), ID Marolt, Matija (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (619,88 KB)
MD5: 798E8343FAC3A5F7FABE49FD7D725F1B

Abstract
Mašila so besede, besedne zveze ali glasovi, ki ne prispevajo k pomenu govora in so pogosto moteča. Diplomsko delo obravnava problem samodejnega odstranjevanja mašil v slovenskem govoru. Osnovno izhodišče predstavljajo sodobni modeli za samodejno razpoznavanje govora (modeli ASR), ki temeljijo na arhitekturah nevronskih mrež. Gre za end-to-end modele, ki z globokim učenjem na podlagi velike količine zvočnih posnetkov in pripadajočih transkripcij razpoznavajo govor. Glede na učne podatke in arhitekturo določeni modeli podpirajo časovne žige posameznih besed. Ta zmožnost se v nekaterih tujih jezikih že uporablja za zaznavanje in posledično odstranjevanje mašil v govoru. Za slovenski jezik takšno orodje še ne obstaja. V okviru diplomskega dela smo razvili programska orodja, ki omogočajo enostavno uporabo in primerjavo uspešnosti različnih modelov ASR pri zaznavanju ter odstranjevanju mašil v slovenskih zvočnih posnetkih. Osredotočili smo se na natančnost zaznavanja mašil, preverili pa smo tudi točnost časovnih žigov in subjektivno oceno kakovosti očiščenega posnetka. Rezultati kažejo, da so modeli ASR ob ustrezni prilagoditvi obetavna rešitev za samodejno odstranjevanje mašil v slovenskem jeziku, hkrati pa razvito orodje odpira možnosti za nadaljnje raziskave in izboljšave na področju govorne tehnologije.

Language:Slovenian
Keywords:samodejno razpoznavanje govora (ASR), samodejno odstranjevanje mašil, zaznavanje mašil, konformer, transformer, slovenski jezik
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-172556 This link opens in a new window
COBISS.SI-ID:249369603 This link opens in a new window
Publication date in RUL:08.09.2025
Views:176
Downloads:27
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Removal of filler words from speech
Abstract:
Fillers are words, phrases, or sounds that do not contribute to the meaning of speech and are often distracting. This thesis addresses the problem of automatic filler removal in Slovenian speech. The main basis is provided by modern automatic speech recognition (ASR) models based on neural network architectures. These are end-to-end models that, using deep learning on large amounts of audio recordings and corresponding transcriptions, can recognize speech. Depending on the training data and model architecture, certain models support word-level time stamps. This capability is already used in some foreign languages for detecting and subsequently removing fillers in speech. However, no such tool exists yet for Slovenian. Within the scope of this thesis, we developed software tools that enable easy use and comparison of different ASR models for detecting and removing fillers in Slovenian audio recordings. We focused primarily on the accuracy of filler detection, while also evaluating the correctness of time stamps and the subjective quality of the cleaned audio. The results show that ASR models, with appropriate adaptation, offer a promising solution for automatic filler removal in Slovenian speech, while the developed tools open possibilities for further research and improvements in the field of speech technology.

Keywords:automatic speech recognition (ASR), automatic filler word removal, filler word detection, conformer, transformer, Slovenian language

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back