Event history analysis is a range of methods and tests which are used when events, states, connections between them and time changes are of our interest. What is characteristic for data in the event history analysis is the complete and incomplete data structure – one event can or cannot occur to an individual, or the information about it remains unknown. The incomplete data are called censored data and they should not be left out of the analysis because by doing so, the estimates would become biased. Various models and methods have been developed for censored data, however, visualisation is often neglected or not even included in the analysis. Censored data visualisation is difficult due to the censored information as well as numerous variables, which makes it difficult to display everything in a two-dimensional graphics. Graphical displays can aid in data errors detection, which is a common phenomenon in the event history analysis. The errors can be random (copying data errors, impossible event sequences) or systematic (e.g., assigning the same time to multiple events, unrefreshed data). The data must be examined in detail before carrying out the analysis in order to eliminate errors and obtain a high-quality and reliable analysis.
In this Master’s thesis, we have examined the existing data visualisations in event history analysis – survival curve, cumulative distribution function, cumulative hazard, hazard, censored data histogram, censored data boxplot, event charts, Lexis diagram and pencil diagram. We have assessed their adequacy and drawn them for our own data (with the exception of the pencil diagram). We have written our own functions in the R software for drawing the censored data histogram and the censored data boxplot (the code is in the appendix). For drawing survival curve, cumulative distribution function, cumulative hazard, event charts and Lexis diagrams, we have applied the existing functions and libraries in the R software. We have created a user-friendly interactive web-based application for detecting data error before carrying out the analysis, which enables an overview of the entered data using event charts and identifies units with errors in the sequence of events or identical event times. The application lists the units with errors by their identification, displays them in a table and visualises them using the event chart. We have used the Shiny library in R for creating the application.
The original contribution of the Master’s thesis is a fast and simple visualisation of data from event history analysis, which gives us an overview about the data and their distribution. In addition, the application enables looking at each individual’s data, searching for errors and consequently also eliminating them. This markedly shortens the time for editing and transforming the data before the analysis, thus enabling a better analysis without data errors.
|