Social media is an increasingly popular way of exchanging information and staying informed about current events, and therefore represent an important source of data, which is also valuable for research and analysis. The prerequisite for this is that we know how to systematically collect data. Since in practice, most of the time we are often dealing with the process of data collection and transformation, we will research the possibilities for comprehensive content acquisition of various social media. In this case study we will implement a solution for parallel data extraction of large data sets and find a way for efficient data storing and querying. In the end, we will access data with a visualization tool that allows data monitoring with minimal delay regarding to the time when content is published on the social media. In conclusion, we will summarize our experience with different technologies and outline the possibilities for further process improvement.
|