With the development of artificial intelligence and machine learning, many everyday human problems are solved, in the field of sound usually mainly by dealing with human speech.
Due to the impact of noise on human health, the detection and minimization of sound noise is also an important point for improving the quality of life.
Current detection solutions rely only on low-level exceedances of the allowable volume for detection, and require further-level information on the sound source for further action.
Prior to implementation, it is necessary to review related works, key sound characteristics and the most appropriate machine learning topologies.
Free datasets will be used as the data source, preferably those on which a similar task has already been performed to compare the results.
After satisfactory performance of the selected features and configuration of the model, it is necessary to try different ways to improve the results.
For the all datasets, the MFCC and MEL features are best used for using a convolutional neural network by adding tone and time augmentations to the audio recordings.
According to related works that used the same dataset, the proposed solution to this task (10-fold average classification accuracy 98.4%) is promising for further development.
|