Your browser does not allow JavaScript!
JavaScript is necessary for the proper functioning of this website. Please enable JavaScript or use a modern browser.
Open Science Slovenia
Open Science
DiKUL
slv
|
eng
Search
Browse
New in RUL
About RUL
In numbers
Help
Sign in
Optimizirana penalizirana regresija - rešitev za problem ločenosti v logistični regresiji
ID
ŠINKOVEC, HANA
(
Author
),
ID
Blagus, Rok
(
Mentor
)
More about this mentor...
,
ID
HEINZE, GEORG
(
Comentor
)
PDF - Presentation file,
Download
(22,94 MB)
MD5: 37EFBBB7608045A2193FCD98AA26B4DE
PID:
20.500.12556/rul/1953c2ce-aef8-4565-948f-fb79b02b3f30
Image galllery
Abstract
Rezultatom, ki jih dobimo, ko želimo z modelom logistične regresije oceniti majhen vzorec, kjer so izidi redki, ne moremo kar tako zaupati. Pri tovrstni analizi se lahko celo zgodi, da ocene parametrov ne obstajajo. To je tako imenovana ločenost oz. monotona funkcija verjetja, saj situacija nastopi, ko ena ali linearna kombinacija večih napovednih spremenljivk popolnoma loči izide od neizidov, funkcija verjetja pa je neskončno naraščajoča. To povzroči, da so ocene, ki bi jih dobili z metodo največjega verjetja, nedefinirane, algoritem, s katerim maksimiziramo funkcijo verjetja, pa divergira. V nalogi pokažem nekaj primerov realnih podatkov, kjer se pojavi ločenost in skoraj-ločenost, in zato, da bi premostila težave neobstajajočih ocen parametrov, razmislim o možnosti uporabe penalizirane logistične regresije – l2, l1 in (posplošene) Firthove regresije. Penalizirani regresijski modeli namreč koeficiente zmanjšajo v smeri proti nič, tako da ne morejo divergirati, in lahko ponudijo končne ocene parametrov. Vprašanje pa je, kako poiskati optimizacijski parameter, ki uravnava stopnjo penalizacije: prečno preverjanje funkcije največjega verjetja in pa AIC sta v situacijah, ko so podatki ločeni, omejena, saj v enostavnih primerih kot optimalno rešitev ponudita optimizacijski parameter, ki je enak nič. V nalogi pokažem primere, kjer optimizacija ne deluje. Najprej se osredotočim na 2×2 kontingenčno tabelo in primer razširim tako, da dodajam pojasnjevalne spremenljivke. Primerjam l2, l1 in Firthov regresijski model (slednjega tudi v posplošeni različici, kjer optimizacijski parameter ni fiksiran). Splošna učinkovitost obeh optimizacijskih metod je ovrednotena za situacije, kjer je verjetnost ločenosti velika. Pokaže se, da je rešitev, ki jo lahko ponudi optimizirana penalizirana regresija, vprašljiva. V nasprotju pa Firthov tip penalizacije pokaže odlične lastnosti v smislu zmanjšanja srednje kvadratne napake ocen koeficientov pod pogoji, pri katerih je verjetnost za ločenost velika.
Language:
English
Keywords:
ločenost
,
monotona funkcija verjetja
,
neskončne ocene parametrov
,
(optimizirana) penalizirana logistična regresija
,
optimizacija
,
zmanjšanje srednje kvadratne napake
Work type:
Master's thesis/paper
Organization:
FE - Faculty of Electrical Engineering
Year:
2016
PID:
20.500.12556/RUL-86049
Publication date in RUL:
05.10.2016
Views:
3675
Downloads:
599
Metadata:
Cite this work
Plain text
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Copy citation
Share:
Secondary language
Language:
Slovenian
Title:
Tuned penalized regression - a solution to the problem of separation in logistic regression?
Abstract:
In logistic regression when we analyse small or sparse data sets, results obtained by classical maximum likelihood methods cannot be generally trusted. In such analysis it may even happen that maximum likelihood parameter estimates do not exist. The situation has been termed as ”separation” or ”monotone likelihood” as the two outcome groups are perfectly separated by the values of a covariate or a linear combination of covariates and the maximum likelihood curve is infinitely increasing. This causes maximum likelihood estimates to be undefined and numerical algorithms for likelihood maximization to diverge. We provide some real-life data examples of separation and near-separation and discuss the possibility to use penalized likelihood methods – ridge, lasso and (generalized) Firth regression – to overcome non-existence of parameter estimates. Penalized regression models provide a way to shrink coefficients so that parameter esimates do not diverge and can therefore supply finite point estimates. But the question is how to find the penalty parameter that controls the amount of penalization: cross-validation of the log likelihood or optimization of the AIC have their limitations in the presence of separation, leading to a collapsing penalty parameter optimized at 0 in simple situations. We show examples where tuning does not work. We first focus on a simple 2×2 table example and then observe what happens to the penalty parameter if we expand this case by adding more covariates. Ridge, lasso and Firth regression models (the latter also generalized to include a non-fixed tuning parameter) are compared. General performance of both tuning approaches in scenarios with high probability of separation is evaluated. We see that tuned penalized regression as a solution is questionable. In contrast, Firth-type penalization shows excellent bahaviour in terms of reducing MSE of coefficient estimates in situations with high probability of separation.
Keywords:
separation
,
monotone likelihood
,
infinite estimates
,
(tuned) pe- nalized logistic regression
,
tuning
,
MSE reduction
Similar documents
Similar works from RUL:
Similar works from other Slovenian collections:
Back