In today’s connected world we are creating sets of data, also called data trace
where information about an individual as well as a broad population can be found, on
every step. Especially interesting is the data that users of such systems express
implicitly and often also subconsciously, as this is also a way to go around the
conscious filtration and self-censorship. On the other hand, this kind of data is many
times limited, sporadically available and contains a lot of noise. Consequently this
presents additional challenges in analysis and final interpretation.
The dissertation presents advanced data processing of IPTV data in the form of
data stream of users’ channel changes. TV set-top boxes that are deployed in modern
IPTV systems can be thought of as capable sensor nodes that collect vast amounts of
diagnostic data, that also contains some hidden information about the users’ activity
and the quality of service, system’s activity, etc. In the dissertation we focus mainly
on the user-generated events and analyze how the pseudonymized data stream of
channel change events received from the entire IPTV network can be mined to obtain
insight about the content and about the user’s opinion, as well as broader population’s
opinion about a certain topic. In the dissertation we demonstrate that it is possible to
detect the occurrence of unwanted content, e.g. TV ads with high probability and also
show that the approach could be extended to model the user’s behaviour and classify
the viewership in multiple dimension. We propose and describe a framework and a
method for estimating public interest from the implicit negative feedback collected
from the IPTV audience. Our research primarily focuses on the channel change events
and their correlation with the content information obtained from closed captions. The
presented framework is based on concept modeling, viewership profiling, and
combines the implicit viewer reactions (channel changes) and content into an interest
score of the user and an entire viewership. The proposed framework addresses many
disadvantages or concerns in these systems and can cover a much broader population.
The framework is validated on a large pseudonymized real-world IPTV dataset
provided by an ISP, and shows how the results correlate with different trending topics
and with parallel classical long-term population surveys. We attempt to validate a
framework for determining public opinion and interest through implicit feedback of
IPTV viewers. Firstly, we address the hypothesis that implicit viewer feedback in the
form of channel change events paired with the content metadata can be used to model
viewers’ opinion and interest. For this, we design a controlled experiment to collect
explicit users’ feedback by rating a set of general-interest news clips. In addition to
collecting demographic information, we also survey viewers’ opinion, interest, and
probability of channel change during each clip. Furthermore, we extract weighted
feature vectors from the closed captions of the video; this data, combined with the
reported probability of channel change, is used to build a model that classifies opinion
in five categories based on probability of channel change and content. Next, we build
a simplified model that classifies opinion in five categories based on interest, which
shows a linear relationship; further consideration of the content however in this case
provides better accuracy and possibility to analyze anomalous cases.
|