Vaš brskalnik ne omogoča JavaScript!
JavaScript je nujen za pravilno delovanje teh spletnih strani. Omogočite JavaScript ali pa uporabite sodobnejši brskalnik.
Nacionalni portal odprte znanosti
Odprta znanost
DiKUL
slv
|
eng
Iskanje
Brskanje
Novo v RUL
Kaj je RUL
V številkah
Pomoč
Prijava
To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets
ID
Šinkovec, Hana
(
Avtor
),
ID
Heinze, Georg
(
Avtor
),
ID
Blagus, Rok
(
Avtor
),
ID
Geroldinger, Angelika
(
Avtor
)
PDF - Predstavitvena datoteka,
prenos
(2,66 MB)
MD5: 8301D57B1D7EBB678551DC8177B84D2B
Galerija slik
Izvleček
Background: For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that ridge logistic regression can result in highly variable calibration slopes in small or sparse data situations. Methods: In this paper, we elaborate this issue further by performing a comprehensive simulation study, investigating the performance of ridge logistic regression in terms of coefficients and predictions and comparing it to Firth's correction that has been shown to perform well in low-dimensional settings. In addition to tuned ridge regression where the penalty strength is estimated from the data by minimizing some measure of the out-of- sample prediction error or information criterion, we also considered ridge regression with pre-specified degree of shrinkage. We included "oracle" models in the simulation study in which the complexity parameter was chosen based on the true event probabilities (prediction oracle) or regression coefficients (explanation oracle) to demonstrate the capability of ridge regression if truth was known. Results: Performance of ridge regression strongly depends on the choice of complexity parameter. As shown in our simulation and illustrated by a data example, values optimized in small or sparse datasets are negatively correlated with optimal values and suffer from substantial variability which translates into large MSE of coefficients and large variability of calibration slopes. In contrast, in our simulations pre-specifying the degree of shrinkage prior to fitting led to accurate coefficients and predictions even in non-ideal settings such as encountered in the context of rare outcomes or sparse predictors. Conclusions: Applying tuned ridge regression in small or sparse datasets is problematic as it results in unstable coefficients and predictions. In contrast, determining the degree of shrinkage according to some meaningful prior assumptions about true effects has the potential to reduce bias and stabilize the estimates.
Jezik:
Angleški jezik
Ključne besede:
logistic regression
,
Firth's correction
,
statistics
Vrsta gradiva:
Članek v reviji
Tipologija:
1.01 - Izvirni znanstveni članek
Organizacija:
BF - Biotehniška fakulteta
Status publikacije:
Objavljeno
Različica publikacije:
Objavljena publikacija
Leto izida:
2021
Št. strani:
15 str.
Številčenje:
Vol. 21
PID:
20.500.12556/RUL-153155
UDK:
311
ISSN pri članku:
1471-2288
DOI:
10.1186/s12874-021-01374-y
COBISS.SI-ID:
79125251
Datum objave v RUL:
20.12.2023
Število ogledov:
451
Število prenosov:
28
Metapodatki:
Citiraj gradivo
Navadno besedilo
BibTeX
EndNote XML
EndNote/Refer
RIS
ABNT
ACM Ref
AMA
APA
Chicago 17th Author-Date
Harvard
IEEE
ISO 690
MLA
Vancouver
:
Kopiraj citat
Objavi na:
Gradivo je del revije
Naslov:
BMC medical research methodology
Skrajšan naslov:
BMC Med Res Methodol
Založnik:
BioMed Central
ISSN:
1471-2288
COBISS.SI-ID:
2441236
Sekundarni jezik
Jezik:
Slovenski jezik
Ključne besede:
logistična regresija
,
Firthov popravek
,
statistika
Podobna dela
Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:
Nazaj