The accuracy and robustness of song recommendation systems depends on the quality and type of data given to the system. Different types of data vary in difficulty of extraction and analysis. We wish to replace data which is harder to extract and analyse, with data that carries the same information, yet is easier to extract and analyse. The extracted data sets include song lyrics, song popularity scores and song meta data. From the data sets we build song similarity matrices for each data set, using text mining, network analysis and vector analysis. Song similarity matrices are then compared using five different measures, and the results are stored in data set similarity matrices. A thorough examination of data set similarity matrices can reveal hidden similarities between different data sets. Results show that similarity between different data sets is limited to the type of data and type of analysis.
|