Transfer learning for prediction of transcription start sites across different plant species

Miškić, David

Repository of the University of Ljubljana

Details

Transfer learning for prediction of transcription start sites across different plant species
ID Miškić, David (Author), ID Curk, Tomaž (Mentor) More about this mentor... This link opens in a new window

, ID Zrimec, Jan (Comentor)

PDF - Presentation file, Download (9,71 MB)
MD5: D2B34A39E790B067D8651AAC557C5540

Abstract

Transcription start site (TSS) prediction is a classification problem at the intersection of machine learning and laboratory gene expression measurement methods. The site is significant as it represents the location where the first nucleotide is transcribed by RNA polymerase and can help characterize the genome of an organism. We have developed two variants of prediction models in the plant model organism \textit{Arabidopsis thaliana} based on an existing expression model Enformer, using upscaling and a custom loss function that proved crucial for training success. The GFF model type uses genome annotation information to supplement the context, and this has proven to facilitate the transfer between plant organisms, demonstrated by transfer learning on corn. The MultiTSS model type uses DNA sequence alone with no substantial performance degradation compared to the GFF model, demonstrating that it is able to capture and learn important motifs that characterize a TSS. We show that the developed methods are comparably better than existing approaches and can be applied without retraining as well. We also describe the procedure and pitfalls of the problem area with potential solutions.

Language:	English
Keywords:	transcription start site, polymerase, bioinformatics, transformer, transfer learning
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2023
PID:	20.500.12556/RUL-151112
COBISS.SI-ID:	169486083
Publication date in RUL:	29.09.2023
Views:	1059
Downloads:	87
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Učenje s prenosom znanja za napovedovanje začetnega mesta transkripcije med različnimi vrstami rastlin
Napovedovanje začetnega mesta transkripcije (TSS) je klasifikacijski problem na presečišču strojnega učenja in laboratorijskih metod merjenja ekspresije. To mesto predstavlja položaj, kjer polimeraza RNA začne prepisovati prvi nukleotid in lahko pomaga pri karakterizaciji genoma organizma. Razvili smo dve različici modela na podatkih modelnega organizma pri rastlinah, \textit{A. thaliana}, ki temeljita na jedru obstoječega modela napovedovanja izražanja Enformer. Temu smo dodali sloje za večanje ločljivosti in funkcije izgube po meri, ki se je izkazala ključna za uspeh učenja. Tip modela GFF uporablja informacijo iz anotacije genoma za dopolnjevanje konteksta, kar je dokazano olajšalo prenos med rastlinami, to smo pokazali tudi na primeru koruze. Tip modela MultiTSS uporablja samo zaporedje DNA in brez bistvenega poslabšanja zmogljivosti v primerjavi z GFF dokazuje, da je ta arhitektura sposobna zajeti in se naučiti pomembnih motivov, ki so značilni za TSS. Demonstriramo tudi, da so razvite metode primerljivo boljše od obstoječih pristopov in jih je mogoče uporabljati tudi brez ponovnega učenja. Opisali smo tudi postopek in pasti tega problema ter predlagali možne rešitve.
Keywords:	začetno mesto transkripcije, polimeraza, bioinformatika, transformer, prenos učenja

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents