izpis_h1_title_alt

Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models
Blagus, Rok (Avtor), Lusa, Lara (Avtor)

URLURL - Predstavitvena datoteka, za dostop obiščite http://www.biomedcentral.com/1471-2105/16/362 Povezava se odpre v novem oknu

Izvleček
Background: Prediction models are used in clinical research to develop rules that can be used to accurately predict the outcome of the patients based on some of their characteristics. They represent a valuable tool in the decision making process of clinicians and health policy makers, as they enable them to estimate the probability that patients have or will develop a disease, will respond to a treatment, or that their disease will recur. The interest devoted to prediction models in the biomedical community has been growing in the last few years. Often the data used to develop the prediction models are class-imbalanced as only few patients experience the event (and therefore belong to minority class). Results: Prediction models developed using class-imbalanced data tend to achieve sub-optimal predictive accuracy in the minority class. This problem can be diminished by using sampling techniques aimed at balancing the class distribution. These techniques include under- and oversampling, where a fraction of the majority class samples are retained in the analysis or new samples from the minority class are generated. The correct assessment of how the prediction model is likely to perform on independent data is of crucial importance; in the absence of an independent data set, cross-validation is normally used. While the importance of correct cross-validation is well documented in the biomedical literature, the challenges posed by the joint use of sampling techniques and cross-validation have not been addressed. Conclusions: We show that care must be taken to ensure that cross-validation is performed correctly on sampled data, and that the risk of overestimating the predictive accuracy is greater when oversampling techniques are used. Examples based on the re-analysis of real datasets and simulation studies are provided. We identify some results from the biomedical literature where the incorrect cross-validation was performed, where we expect that the performance of oversampling techniques was heavily overestimated.

Jezik:Angleški jezik
Ključne besede:statistic, prediction models, sampling techniques
Vrsta gradiva:Delo ni kategorizirano (r6)
Tipologija:1.01 - Izvirni znanstveni članek
Organizacija:MF - Medicinska fakulteta
Leto izida:2015
Št. strani:str. 1-10
Številčenje:Vol. 16
UDK:311
ISSN pri članku:1471-2105
DOI:10.1186/s12859-015-0784-9 Povezava se odpre v novem oknu
COBISS.SI-ID:32284377 Povezava se odpre v novem oknu
Število ogledov:687
Število prenosov:305
Metapodatki:XML RDF-CHPDL DC-XML DC-RDF
 
Skupna ocena:(0 glasov)
Vaša ocena:Ocenjevanje je dovoljeno samo prijavljenim uporabnikom.
:
Objavi na:AddThis
AddThis uporablja piškotke, za katere potrebujemo vaše privoljenje.
Uredi privoljenje...

Gradivo je del revije

Naslov:BMC bioinformatics
Založnik:BioMed Central
ISSN:1471-2105
COBISS.SI-ID:2433556 Povezava se odpre v novem oknu

Sekundarni jezik

Jezik:Neznan jezik
Ključne besede:statistika, napoved modelov, tehnike vzorčenja

Podobna dela

Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:

Komentarji

Dodaj komentar

Za komentiranje se morate prijaviti.

Komentarji (0)
0 - 0 / 0
 
Ni komentarjev!

Nazaj