izpis_h1_title_alt

Odprtokodni sistemi za računalniško prepoznavo zvočnih zapisov
ID GABOR, PAVEL (Author), ID Kos, Anton (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (3,24 MB)
MD5: 6623630854FAEB533127B76DDFB7A1F6
PID: 20.500.12556/rul/c3af0137-7429-4c02-9d86-7335dae53bb6

Abstract
V diplomskem delu predstavljam področje računalniške prepoznave zvočnih zapisov, v katerega sodijo vedno bolj popularne aplikacije, kot je Shazam, ki uporabniku omogočajo prepoznavo glasbe, ki se predvaja v njihovi okolici. Sistemi prepoznavo nepoznanega zvočnega zapisa opravljajo na podlagi primerjave zgoščenih povzetkov oziroma odtisov zvočnega zapisa (angl.: acoustic/audio fingerprint), ki jih izdelajo iz krajših vzorcev nepoznanega zapisa in jih primerjajo s shranjenimi odtisi v katalogu poznanih referenčnih skladb. Pri tem se srečujejo z mnogimi težavami. Želimo namreč, da bodo sistemi za prepoznavo posnetek pravilno prepoznali tudi, če je stisnjen s pomočjo izgubnih kompresijskih postopkov, je v njem prisoten okoliški šum, motnje ali degradacije v kakršni koli drugi obliki. Pomembno je tudi, da prepoznavo vršijo hitro in učinkovito, kar pa glede na velike količine referenčnih skladb, ki jih katalogi običajno vsebujejo, ni preprosta naloga. V delu sem raziskal področja uporabe tehnologije ter princip delovanja temu namenjenih algoritmov. Pri tem sem podrobneje preučil lastnosti in zahteve, ki jim morajo zadostiti algoritmi, opisal sem težave s katerimi se algoritmi spopadajo pri zagotavljanju robustnosti in načine reševanja le-teh. Obravnavam tudi učinkovitost in natančnosti prepoznave. V nadaljevanju se osredotočam na primerjavo treh odprtokodnih sistemov: SoundFingerprinting, Echoprint in Dejavu. Izbrane sisteme in algoritme, na katerih le-ti temeljijo, sem podrobneje analiziral in preizkusil. Preizkuse sem opravil z izvedbo premišljenih testnih korakov, s katerimi sem razkril prednosti in slabosti posameznega sistema. V prvi fazi preizkusov sem preveril splošne lastnosti sistemov, ki vplivajo na primernost sistema za uporabo v različnih tipih aplikacij. Določil sem najmanjšo dolžino vzorca zvočnega posnetka, iz katerega je posamezen sistem zmožen zanesljivo opraviti prepoznavo. Na podlagi statistične analize rezultatov izvedbe prepoznave na naboru večjega števila testnih posnetkov sem ovrednotil lastnosti sistemov. Preveril sem, kako dobro sistemi ločujejo različne izvedbe iste skladbe in v kolikšni meri so odporni na različne tipe motenj in degradacij zvočnega signala. V nadaljevanju sem opravil še preizkus prepoznave predvajanih skladb v radijskih programih, s katerim sem preveril ustreznost posameznega sistema za uporabo v aplikaciji za izdelavo sporeda predvajanih skladb. V zaključku podajam lastno oceno primernosti posameznega sistema za uporabo v različnih tipih aplikacij in predlagam možnosti za izboljšave sistemov.

Language:Slovenian
Keywords:prepoznava zvočnih posnetkov, odtis zvočnega posnetka, primerjava sistemov, sistem za izdelavo sporeda predvajanj radijskega programa, SoundFingerprinting, Echoprint, Dejavu
Work type:Undergraduate thesis
Organization:FE - Faculty of Electrical Engineering
Year:2016
PID:20.500.12556/RUL-83482 This link opens in a new window
Publication date in RUL:16.06.2016
Views:2086
Downloads:318
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Open-Source Systems for Computer Recognition of Audio Files
Abstract:
This thesis describes automatic content recognition, an identification technology used to recognize content played in the vicinity of the user or on a media device. It focuses on systems for audio content recognition based on audio fingerprinting technology that is used by increasingly popular applications such as Shazam. An audio fingerprint is a condensed summary made from musical features extracted from a short sample of sound recording. It is used to identify an unknown sound recording by means of comparison of its fingerprints to a collection of fingerprints gathered from known recordings stored in a reference database. In doing so, the systems may encounter many obstacles. A system should be able to recognize an unknown recording regardless of the level of compression, surrounding noise present in the signal or any other form of degradation. It is also important that systems are able to perform identification fast and effectively, what is not a trivial task because systems usually contain a large number of reference tracks. I explored the scope of the technology and the principle of operation of dedicated algorithms. In doing so, I examined the characteristics and requirements that need to be satisfied by algorithms to ensure robustness, effectiveness and accuracy of identification. The main focus of the thesis is a comparison of three open-source systems: SoundFingerprinting, Echoprint and Dejavu. I have performed a detailed analysis of the selected systems and algorithms on which they are based. I also performed test procedures designed to reveal the strengths and weaknesses of each system. In the first testing stage, I verified the overall system properties that affect the suitability of the system for the use in different types of applications. For each system I measured the minimum length of audio clip required for a reliable identification. Based on the statistical analysis of data obtained during an identification test performed on large set of musical recordings, I evaluated the general performance of the systems. Later on I checked how well they distinguish different versions of the same song and to what extent they are resistant to interference and degradation of the audio signal. Lastly I examined the suitability of selected systems for the use in a broadcast monitoring system by means of performing audio recognition on radio broadcast recordings, which resulted in the creation of a playlist containing recognized songs. To conclude I present my own opinion on the suitability of each system for usage in different types of applications. I also suggest possible enhancements that could be made on systems to improve their performance.

Keywords:audio content recognition, audio fingerprint, system comparison, broadcast monitoring system, SoundFingerprinting, Echoprint, Dejavu

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back