Transfer Learning for Phenotype Prediction from Small Gene Expression Data Sets

Mohorčič, Domen

Repository of the University of Ljubljana

Details

Transfer Learning for Phenotype Prediction from Small Gene Expression Data Sets
ID Mohorčič, Domen (Author), ID Zupan, Blaž (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (2,04 MB)
MD5: 69CB965CE1A7646EA95A4E0CDFFEB87D

Abstract

Recent advances in biotechnology have enabled researchers to collect huge amounts of data, such as gene expression profiles from patients, which provide a foundation for personalized medicine. Such an approach requires the use of machine learning, however, a significant limitation of many medical studies is the small sample size, typically having only a few hundred patients with tens of thousands of features. In this thesis, we addressed this issue by combining multiple small gene expression data sets into a larger one, regardless of the study type, and training deep learning models capable of producing informative gene expression encodings. We used transfer learning to predict the phenotypes on unseen data sets based on the created encodings. We experimented with two model architectures: autoencoders and multi-task models. Although training multi-task models proved challenging, they achieved higher average results on test data sets than autoencoders but never surpassed the results of logistic regression. An examination of the encodings revealed that autoencoders maintained the original data structure whereas the multi-task models mixed samples from different studies, but both proved that the gene expression profile can be reduced to a few informative markers.

Language:	English
Keywords:	gene expression, small data set problem, transfer learning, autoencoders, multi-task models
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2024
PID:	20.500.12556/RUL-161755
COBISS.SI-ID:	210383875
Publication date in RUL:	13.09.2024
Views:	973
Downloads:	1283
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Učenje s prenosom pri napovedovanju fenotipa na majhnih podatkovnih naborih o izraženosti genov
Nedavni napredki na področju biotehnologije so raziskovalcem omogočili zbiranje velikih količin podatkov, kot so profili genskih izrazov bolnikov, ki so osnova za personalizirano medicino. Takšen pristop zahteva uporabo strojnega učenja, vendar je glavna omejitev številnih študij majhnost vzorca, ki ima običajno nekaj sto bolnikov z več deset tisoč atributi. V magistrskem delu smo se tega problema lotili tako, da smo združili veliko majhnih podatkovnih naborov o izraženosti genov v en večji nabor in naučili globoke nevronske mreže, zmožne informativnega kodiranja vhodnih podatkov. Uporabili smo učenje s prenosom za napovedovanje fenotipa na kodiranih podatkih iz testnih naborov. Eksperimentirali smo z dvema arhitekturama modelov: samokodirniki in večopravilnimi modeli. Čeprav je bilo učenje večopravilnih modelov zahtevno, so na testnih podatkovnih naborih v povprečju dosegli višje rezultate kot samokodirniki, vendar niso presegli rezultatov logistične regresije. Pri pregledovanju kodiranih vrednosti se je izkazalo, da samokodirniki ohranijo prvotno strukturo podatkov, medtem ko večopravilni modeli ne razlikujejo med primeri iz različnih študij, obe arhitekturi pa sta dokazali, da je profil genskih izrazov možno predstavili le z nekaj vrednostmi.
Keywords:	genski izrazi, problem majhnih podatkovnih naborov, učenje s prenosom, samokodirniki, večopravilni modeli

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents