We want to analyze time-ordered data, and we are interested in the difference between the values of the two observations. A random variable of the studied units is observed at two different points in time, assuming that the values at the first point are lower than the values at the second point. An individual unit can be observed at both points or only at one of the points, in the case the unit cannot be observed at both points. For a unit that cannot be observed in either the first or the second point, the opposite state (alternative state) is impossible for it, that is, counterfactual. Since we analyze time-edited data, the event time at the first point always has a smaller value (happens earlier) than the event time at the second point. The same is true if the value of the random variable does not represent the time of the event, since we have assumed that the values at the first point are less than the values at the second point. Thus, for units observed at both points, the difference between the value at the second point and the value at the first point is always positive. Ideally, we would have continuously measured values of all the variables we want to analyze for all the observed units, and we would know exactly which observed units they belong to. However, in everyday life, we often come across partially deficient data, or they have omitted part of the information. This reduces the accuracy of the analysis results, or consequently they are more variable.
For different research designs, we wrote the likelihood functions and estimators for the parameters of the distribution of differences between the observations in the second and first points. We analyzed how the properties of estimates change with the gradual loss of information. Using the method of maximum probability, we estimated the values of the parameters of the distribution, according to which the differences between the values of the two observations are distributed. For the numerical calculation of parameter estimates and their variability, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm was used.
The impact of information loss on variability was examined by simulations, where the gradual loss of information was simulated by continuous or discrete measurements, the presence of information which measurements belong to a particular statistical unit (we know which two measurements represent a pair) and measurements taken on the same or different statistical units. We were particularly interested in the case where measurements are made on different statistical units (this example represents an antifactual alternative). Simulations have shown that in the case of discrete measurements and measurements in which we do not know the pairs, the deviation of the estimation of the searched parameters from the true value is greater than in the case of non-deficient data, and the variability of the estimate increases. Discrete measurements had less effect on the change in values than measurements in which we do not know the pairs.
The analysis was repeated for cases where the difference between the values of the two observations is distributed by the exponential distribution, the gamma distribution, and the exponential distribution, which depends on the previous value. Through simulations, we have shown that in the case of an exponential distribution, we have unbiased estimates of parameter values regardless of information loss. In the other two cases, however, the differences between the different research plans are greater, so we cannot claim that the estimate of the value of the distribution parameter is unbiased for all of them.
For each of the distributions, we were also interested in how the variability of the estimates of the searched parameters differs when changing the values, which can still affect the variability of the difference between the values of the two observations. In doing so, we found that an increase in the variability of the values of one or the other observation does not necessarily mean a more variable estimate but may even decrease it.
In the case of counterfactual alternatives, conditional probability was used to calculate estimates of parameter values as a likelihood function, comparing the value of each measurement of the first observation with all measurements of the second observation. The asymptotic variability of each parameter of the distribution of differences between observations was calculated using Fisher's information. We came to the conclusion that the variability of parameter estimates is influenced by both the variability of the measurements of the first observation and the variability of the differences between the first and second observations.
In the master's thesis, we have newly proposed estimators for estimating the parameters of the distribution of differences between observations in the second and first points for the discussed research plans, which are used in various information losses. A comparison of the research plans and the variability of the estimate was done for different distributions of differences. For the example of counterfactual alternatives, we have shown that we get quite similar estimates as in the case of paired measurements, but we do not know which measurements belong to each statistical unit. Although the properties of the estimators were checked for the example of the distribution of differences by exponential, gamma and exponential distribution, which depends on the values in the first point, the results obtained are generalizable to any one-parameter distribution, two-parameter distribution and distribution dependent on the values in the first point. According to the similarity of the results of the analysis of variability of parameter estimates, the distributions of differences were divided into two groups. The first group is represented by a one-parametric and two-parametric distribution, and the second is a distribution that depends on the values in the first point. As the variability of the measurements of the first observation increases, the variability of parameter estimates increases, while as the variability of the differences increases, the variability of parameter estimates decreases. The opposite is true when the difference between the observations depends on the value in the first point. Then, as the variability of the measurements of the first observation increases, the variability of parameter estimates decreases, and as the variability of differences increases, the variability of parameter estimates initially decreases slightly, and then the trend reverses and begins to increase. For examples of all distributions of differences, we conclude that for less variable parameter estimations, it is necessary to have slightly more measurements of the second observation.
|