Ocenjevanje v času urejenih razlik med protidejstvenimi alternativami

Rupnik Medjedovič, Tom

Podrobno

Ocenjevanje v času urejenih razlik med protidejstvenimi alternativami
ID Rupnik Medjedovič, Tom (Avtor), ID Pohar Perme, Maja (Mentor) Več o mentorju... Povezava se odpre v novem oknu

, ID Šega, Gregor (Komentor)

PDF - Predstavitvena datoteka, prenos (3,72 MB)
MD5: 6389268DB39214C8678553DD00C452B2

Izvleček

Analizirati želimo v času urejene podatke, pri čemer nas zanima razlika med vrednostmi dveh opazovanj. Slučajno spremenljivko preučevanih enot opazujemo v dveh različnih časovnih točkah, pri čemer predpostavimo, da so vrednosti v prvi točki manjše od vrednosti v drugi točki. Posamezna enota je lahko opazovana v obeh ali le v eni izmed točk, v primeru, da enote ni mogoče opazovati v obeh točkah. Za enoto, ki ne more biti opazovana tako v prvi ali drugi točki, je nasprotno stanje (alternativno stanje) zanjo nemogoče, torej protidejstveno. Ker analiziramo v času urejene podatke, je čas dogodka v prvi točki vedno manjši (se zgodi prej) od časa dogodka v drugi točki. Enako velja tudi v primeru, da vrednost slučajne spremenljivke ne predstavlja časa dogodka, saj smo predpostavili, da so vrednosti v prvi točki manjše od vrednosti v drugi točki. Tako je pri enotah, ki so opazovane v obeh točkah, razlika vrednosti v drugi točki in vrednosti v prvi točki vedno pozitivna. V idealnih razmerah bi imeli za vse opazovane enote zvezno izmerjene vrednosti vseh spremenljivk, ki jih želimo analizirati, in bi natanko vedeli, kateri opazovani enoti pripadajo. Vendar pa v vsakdanjem življenju dostikrat naletimo na delno pomanjkljive podatke, ali pa imajo ti izpuščen del informacij. S tem se natančnost rezultatov analiz manjša oziroma so ti posledično bolj variabilni. Za različne načrte raziskav smo zapisali funkcije verjetji in cenilke za parametre porazdelitve razlik med opazovanjem v drugi in prvi točki. Analizirali smo, kako se s postopno izgubo informacij spreminjajo lastnosti cenilk. Z uporabo metode največjega verjetja smo ocenili vrednosti parametrov porazdelitve, po kateri se porazdeljujejo razlike med vrednostmi dveh opazovanj. Za numerični izračun ocen parametrov in njihovih variabilnosti smo uporabili algoritem BFGS (Broyden-Fletcher-Goldfarb-Shanno). Vpliv izgube informacije na variabilnost smo preučili s simulacijami, pri čemer smo postopno izgubo informacij simulirali z zveznimi oziroma diskretnimi meritvami, prisotnostjo informacije, katere meritve pripadajo posamezni statistični enoti (vemo kateri dve meritvi predstavljata par) in meritvami opravljenih na istih oziroma različnih statističnih enotah. Zanimal nas je predvsem primer, ko so meritve opravljene na različnih statističnih enotah (ta primer predstavlja protidejstveno alternativo). S simulacijami smo pokazali, da je v primeru diskretnih meritev in meritev, pri katerih ne poznamo parov, odstopanje ocene iskanih parametrov od prave vrednosti večje, kot v primeru nepomanjkljivih podatkov, in se variabilnost ocene povečuje. Pri čemer so diskretne meritve imele manjši vpliv na spremembo vrednosti kot meritve, pri katerih ne poznamo parov. Analizo smo ponovili za primere, ko se razlika med vrednostmi dveh opazovanj porazdeljuje po eksponentni porazdelitvi, gama porazdelitvi in eksponentni porazdelitvi, ki je odvisna od predhodne vrednosti. S simulacijami smo pokazali, da imamo v primeru eksponentne porazdelitve nepristranske ocene vrednosti parametrov ne glede na izgubo informacij. V preostalih dveh primerih pa so razlike med različnimi načrti raziskav večje, tako da ne moremo za vse trditi, da je ocena vrednosti parametra porazdelitve nepristranska. Za vsako izmed porazdelitev nas je tudi zanimalo, kako se variabilnosti ocen iskanih parametrov razlikujejo ob spreminjanju preostalih vrednosti, ki še lahko vplivajo na variabilnost razlike med vrednostmi dveh opazovanj. Pri tem smo ugotovili, da povečanje variabilnosti vrednosti enega oziroma drugega opazovanja ne pomeni nujno bolj variabilne ocene, ampak jo lahko celo zmanjša. V primeru protidejstvenih alternativ smo pri izračunu ocen vrednosti parametrov v funkciji verjetja uporabili pogojno verjetnost, pri čemer smo vrednost posamezne meritve prvega opazovanja primerjali z vsemi meritvami drugega opazovanja. Asimptotsko variabilnost posameznega parametra porazdelitve razlik med opazovanji smo izračunali z uporabo Fisherjeve informacije. Prišli smo do zaključka, da na variabilnost ocen parametrov vplivata tako variabilnost meritev prvega opazovanja kot tudi variabilnost razlik med prvim in drugim opazovanjem. V magistrskem delu smo za obravnavane načrte raziskav, ki jih uporabimo ob različnih izgubah informacij, na novo predlagali cenilke za ocene parametrov porazdelitve razlik med opazovanjem v drugi in prvi točki. Primerjavo načrtov raziskav in spreminjanja variabilnosti ocene smo naredili za različne porazdelitve razlik. Za primer protidejstevih alternativ smo pokazali, da dobimo precej podobne ocene, kot v primeru parnih meritev, vendar ne vemo, katere meritve pripadajo posamezni statistični enoti. Lastnosti cenilk smo sicer preverili za primer porazdelitve razlik po eksponentni, gama in eksponentni porazdelitvi, ki je odvisna od vrednosti v prvi točki, vendar so dobljeni rezultati posplošljivi na katerokoli eno-parametrično porazdelitev, dvo-parametrično porazdelitev in porazdelitev, ki je odvisna od vrednosti v prvi točki. Glede na podobnost rezultatov analize variabilnosti ocen parametrov smo porazdelitve razlik razdelili v dve skupini. Prvo skupino predstavljata eno-parametrična in dvo-parametrična porazdelitev, drugo pa porazdelitev, ki je odvisna od vrednosti v prvi točki. Z večanjem variabilnosti meritev prvega opazovanja se variabilnost ocen parametrov veča, z večanjem variabilnosti razlik pa se variabilnost ocen parametrov manjša. Nasprotno velja, ko je razlika med opazovanji odvisna od vrednosti v prvi točki. Takrat se z večanjem variabilnosti meritev prvega opazovanja variabilnost ocen parametrov manjša, z večanjem variabilnosti razlik pa se variabilnost ocen parametrov sprva nekoliko manjša, nato pa se trend obrne in se začne večati. Za primere vseh porazdelitev razlik smo zaključili, da je za manj variabilne ocene parametrov treba imeti nekoliko več meritev drugega opazovanja.

Jezik:	Slovenski jezik
Ključne besede:	metoda največjega verjetja, protidejstvena alternativa
Vrsta gradiva:	Magistrsko delo/naloga
Tipologija:	2.09 - Magistrsko delo
Organizacija:	FE - Fakulteta za elektrotehniko
Leto izida:	2025
PID:	20.500.12556/RUL-172091
COBISS.SI-ID:	256829443
Datum objave v RUL:	05.09.2025
Število ogledov:	275
Število prenosov:	85
Metapodatki:
:	Kopiraj citat
Objavi na:

Sekundarni jezik

Izvleček:
Jezik:	Angleški jezik
Naslov:	Estimation of time ordered differences between counterfactuals
We want to analyze time-ordered data, and we are interested in the difference between the values of the two observations. A random variable of the studied units is observed at two different points in time, assuming that the values at the first point are lower than the values at the second point. An individual unit can be observed at both points or only at one of the points, in the case the unit cannot be observed at both points. For a unit that cannot be observed in either the first or the second point, the opposite state (alternative state) is impossible for it, that is, counterfactual. Since we analyze time-edited data, the event time at the first point always has a smaller value (happens earlier) than the event time at the second point. The same is true if the value of the random variable does not represent the time of the event, since we have assumed that the values at the first point are less than the values at the second point. Thus, for units observed at both points, the difference between the value at the second point and the value at the first point is always positive. Ideally, we would have continuously measured values of all the variables we want to analyze for all the observed units, and we would know exactly which observed units they belong to. However, in everyday life, we often come across partially deficient data, or they have omitted part of the information. This reduces the accuracy of the analysis results, or consequently they are more variable. For different research designs, we wrote the likelihood functions and estimators for the parameters of the distribution of differences between the observations in the second and first points. We analyzed how the properties of estimates change with the gradual loss of information. Using the method of maximum probability, we estimated the values of the parameters of the distribution, according to which the differences between the values of the two observations are distributed. For the numerical calculation of parameter estimates and their variability, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm was used. The impact of information loss on variability was examined by simulations, where the gradual loss of information was simulated by continuous or discrete measurements, the presence of information which measurements belong to a particular statistical unit (we know which two measurements represent a pair) and measurements taken on the same or different statistical units. We were particularly interested in the case where measurements are made on different statistical units (this example represents an antifactual alternative). Simulations have shown that in the case of discrete measurements and measurements in which we do not know the pairs, the deviation of the estimation of the searched parameters from the true value is greater than in the case of non-deficient data, and the variability of the estimate increases. Discrete measurements had less effect on the change in values than measurements in which we do not know the pairs. The analysis was repeated for cases where the difference between the values of the two observations is distributed by the exponential distribution, the gamma distribution, and the exponential distribution, which depends on the previous value. Through simulations, we have shown that in the case of an exponential distribution, we have unbiased estimates of parameter values regardless of information loss. In the other two cases, however, the differences between the different research plans are greater, so we cannot claim that the estimate of the value of the distribution parameter is unbiased for all of them. For each of the distributions, we were also interested in how the variability of the estimates of the searched parameters differs when changing the values, which can still affect the variability of the difference between the values of the two observations. In doing so, we found that an increase in the variability of the values of one or the other observation does not necessarily mean a more variable estimate but may even decrease it. In the case of counterfactual alternatives, conditional probability was used to calculate estimates of parameter values as a likelihood function, comparing the value of each measurement of the first observation with all measurements of the second observation. The asymptotic variability of each parameter of the distribution of differences between observations was calculated using Fisher's information. We came to the conclusion that the variability of parameter estimates is influenced by both the variability of the measurements of the first observation and the variability of the differences between the first and second observations. In the master's thesis, we have newly proposed estimators for estimating the parameters of the distribution of differences between observations in the second and first points for the discussed research plans, which are used in various information losses. A comparison of the research plans and the variability of the estimate was done for different distributions of differences. For the example of counterfactual alternatives, we have shown that we get quite similar estimates as in the case of paired measurements, but we do not know which measurements belong to each statistical unit. Although the properties of the estimators were checked for the example of the distribution of differences by exponential, gamma and exponential distribution, which depends on the values in the first point, the results obtained are generalizable to any one-parameter distribution, two-parameter distribution and distribution dependent on the values in the first point. According to the similarity of the results of the analysis of variability of parameter estimates, the distributions of differences were divided into two groups. The first group is represented by a one-parametric and two-parametric distribution, and the second is a distribution that depends on the values in the first point. As the variability of the measurements of the first observation increases, the variability of parameter estimates increases, while as the variability of the differences increases, the variability of parameter estimates decreases. The opposite is true when the difference between the observations depends on the value in the first point. Then, as the variability of the measurements of the first observation increases, the variability of parameter estimates decreases, and as the variability of differences increases, the variability of parameter estimates initially decreases slightly, and then the trend reverses and begins to increase. For examples of all distributions of differences, we conclude that for less variable parameter estimations, it is necessary to have slightly more measurements of the second observation.
Ključne besede:	maximum likelihood estimation, counterfactual alternative

Podobna dela

Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:

Nazaj