The rapid expansion of digital services and the resulting growth of digital data have transformed the way information is collected and analyzed in the social sciences. However, ensuring the high quality of such data is crucial for effective analysis and application. This master's thesis compares the quality of data from the ANES survey and data from X/Twitter used for sentiment analysis in the context of the 2020 U.S. presidential election. Eight dimensions of data quality were analyzed. The research shows that in terms of relevance, accessibility, completeness, representativeness, validity, timeliness, consistency, and accuracy, ANES data is generally of higher quality and more suitable for understanding voting behavior than data from X/Twitter. This is primarily due to ANES’s scientific survey methodology, probability-based sampling, and appropriate data weighting, which together provide more representative and reliable insights into voter behavior. On the other hand, X/Twitter data offered greater timeliness due to its real-time nature, enabling the capture of current public discourse and shifts in sentiment. However, X/Twitter data also presented limitations, such as non-representative user demographics, linguistic bias, the presence of bots and disinformation, and challenges regarding data accessibility and validity. The findings suggest that despite the advantages of big data, traditional surveys remain essential for understanding voter behavior. Future research could explore ways to combine the strengths of both data sources.
|