Details

Evalvacija in primerjava sodobnih razpoznavalnikov govora
ID NEMANIČ, VALENTIN (Author), ID Dobrišek, Simon (Mentor) More about this mentor... This link opens in a new window, ID Fajfar, Iztok (Comentor)

.pdfPDF - Presentation file, Download (621,21 KB)
MD5: 477193F4DF68813E4F5CDD468E7EDB1F

Abstract
V diplomski nalogi primerjam tri razširjene sisteme za avtomatsko razpoznavanje govora (angl. Speech-to-text (STT)) v slovenščini: Google Cloud Speech-to-Text, Microsoft Azure Speech Service in odprtokodni OpenAI Whisper. Za objektivno oceno sem oblikoval uravnotežen testni nabor govornih posnetkov iz korpusa Ar- tur 1.0, s poudarkom na prostih monoloških govorih (Artur-N) ter branih in studijskih posnetkih (Artur-B). Izbor vključuje približno eno uro govora 15–20 govorcev z raznolikimi demografskimi lastnostmi. Sisteme sem ocenjeval po na- tančnosti (angl. Word Error Rate – WER), časovni učinkovitosti (čas transkrip- cije) in praktičnih vidikih (enostavnost uporabe). Whisper sem poganjal lokalno na prenosnem računalniku s navadnim centralnim procesorjem (angl. CPU) z vnaprej prednaloženim modelom, storitvi Google in Azure sem uporabljal prek uradnih programskih vmesnikov (angl. API-jev). Celoten potek priprave govor- nih podatkov ter merjenja in izvoza preizkusnih rezultatov je bil avtomatiziran v programskih skriptah, kar omogoča ponovljivost in razširljivost pridobljenih ocen natančnosti teh sistemov.

Language:Slovenian
Keywords:razpoznavanje govora, slovenščina, Google Cloud Speech-toText, Microsoft Azure Speech, OpenAI Whisper, WER, Levenshtein, Artur 1.0, primerjalna analiza
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FE - Faculty of Electrical Engineering
Year:2025
PID:20.500.12556/RUL-173062 This link opens in a new window
COBISS.SI-ID:267817219 This link opens in a new window
Publication date in RUL:12.09.2025
Views:228
Downloads:42
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Evaluation and Comparison of Modern Speech Recognizers
Abstract:
In this thesis, I compare three widely used speech-to-text (STT) systems for Slovene: Google Cloud Speech-to-Text, Microsoft Azure Speech Service, and the open-source OpenAI Whisper. For an objective assessment, I constructed a balanced test set of speech recordings from the Artur 1.0 corpus, with an emphasis on spontaneous monologic speech (Artur-N) as well as read and studio recordings (Artur-B). The selection comprises approximately one hour of speech from 15–20 speakers with diverse demographic characteristics. I evaluated the systems in terms of accuracy (word error rate — WER), time efficiency (transcription time), and practical aspects (ease of use). I ran Whisper locally on a laptop CPU with a preloaded model, while the Google and Azure services were accessed via their official application programming interfaces (APIs). The entire pipeline for preparing the speech data and for measuring and exporting the evaluation results was automated via scripts, enabling reproducibility and scalability of the evaluation process and results.

Keywords:speech recognition, Slovenian, Google Cloud Speech-to-Text, Microsoft Azure Speech, OpenAI Whisper, WER, Artur 1.0, benchmarking

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back