izpis_h1_title_alt

Boosting for high-dimensional two-class prediction
Blagus, Rok (Avtor), Lusa, Lara (Avtor)

URLURL - Predstavitvena datoteka, za dostop obiščite http://www.biomedcentral.com/1471-2105/16/300/abstract Povezava se odpre v novem oknu

Izvleček
Background In clinical research prediction models are used to accurately predict the outcome of the patients based on some of their characteristics. For high-dimensional prediction models (the number of variables greatly exceeds the number of samples) the choice of an appropriate classifier is crucial as it was observed that no single classification algorithm performs optimally for all types of data. Boosting was proposed as a method that combines the classification results obtained using base classifiers, where the sample weights are sequentially adjusted based on the performance in previous iterations. Generally boosting outperforms any individual classifier, but studies with high-dimensional data showed that the most standard boosting algorithm, AdaBoost.M1, cannot significantly improve the performance of its base classier. Recently other boosting algorithms were proposed (Gradient boosting, Stochastic Gradient boosting, LogitBoost); they were shown to perform better than AdaBoost.M1 but their performance was not evaluated for high-dimensional data. Results In this paper we use simulation studies and real gene-expression data sets to evaluate the performance of boosting algorithms when data are high-dimensional. Our results confirm that AdaBoost.M1 can perform poorly in this setting, often failing to improve the performance of its base classifier. We provide the explanation for this and propose a modification, AdaBoost.M1.ICV, which uses cross-validated estimates of the prediction errors and outperforms the original algorithm when data are high-dimensional. The use of AdaBoost.M1.ICV is advisable when the base classifier overfits the training data: the number of variables is large, the number of samples is small, and/or the difference between the classes is large. To a lesser extent also Gradient boosting suffers from similar problems. Contrary to the findings for the low-dimensional data, shrinkage does not improve the performance of Gradient boosting when data are high-dimensional, however it is beneficial for Stochastic Gradient boosting, which outperformed the other boosting algorithms in our analyses. LogitBoost suffers from overfitting and generally performs poorly. Conclusions The results show that boosting can substantially improve the performance of its base classifier also when data are high-dimensional. However, not all boosting algorithms perform equally well. LogitBoost, AdaBoost.M1 and Gradient boosting seem less useful for this type of data. Overall, Stochastic Gradient boosting with shrinkage and AdaBoost.M1.ICV seem to be the preferable choices for high-dimensional class-prediction.

Jezik:Angleški jezik
Ključne besede:statistic, classification, stochastic gradient
Vrsta gradiva:Delo ni kategorizirano (r6)
Tipologija:1.01 - Izvirni znanstveni članek
Organizacija:MF - Medicinska fakulteta
Leto izida:2015
Št. strani:str. 1-17
Številčenje:Vol. 16
UDK:311
ISSN pri članku:1471-2105
DOI:10.1186/s12859-015-0723-9 Povezava se odpre v novem oknu
COBISS.SI-ID:32198617 Povezava se odpre v novem oknu
Število ogledov:736
Število prenosov:204
Metapodatki:XML RDF-CHPDL DC-XML DC-RDF
 
Skupna ocena:(0 glasov)
Vaša ocena:Ocenjevanje je dovoljeno samo prijavljenim uporabnikom.
:
Objavi na:AddThis
AddThis uporablja piškotke, za katere potrebujemo vaše privoljenje.
Uredi privoljenje...

Gradivo je del revije

Naslov:BMC bioinformatics
Založnik:BioMed Central
ISSN:1471-2105
COBISS.SI-ID:2433556 Povezava se odpre v novem oknu

Sekundarni jezik

Jezik:Neznan jezik
Ključne besede:statistika, razvrstitev, stohastični gradient

Podobna dela

Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:

Komentarji

Dodaj komentar

Za komentiranje se morate prijaviti.

Komentarji (0)
0 - 0 / 0
 
Ni komentarjev!

Nazaj