Izdelava algoritma za zanesljivo prepoznavo kratkih označevalnih sekvenc DNK ob visoki stopnji napak pri sekvenciranju

Močivnik, Luka

Izdelava algoritma za zanesljivo prepoznavo kratkih označevalnih sekvenc DNK ob visoki stopnji napak pri sekvenciranju
ID Močivnik, Luka (Author), ID Skrbinšek, Tomaž (Mentor) More about this mentor... This link opens in a new window

	PDF - Presentation file, Download (2,79 MB) MD5: AC65139F2E870636D871535BB0E8EF2A
	ZIP - Appendix, Download (132,25 MB) MD5: 9297C629E4A10665C628FF09E47A8A0A

Abstract

Tehnologije sekvenciranja tretje generacije, zlasti tehnologija nanopor, omogočajo hitro sekvenciranje dolgih sekvenc DNA. Njihova slaba lastnost so visoka stopnja napak. V magistrski nalogi predstavljamo bioinformatski cevovod za obdelavo sekvenc mikrosatelitov, pridobljenih s sekvenciranjem tretje generacije. Za preizkus cevovoda smo uporabili sekvence mikrosatelitov, pridobljene iz neinvazvnih genetskih vzorcev rjavega medveda (Ursus arctos) in sekvencirane s sekvenatorjem Illumina. V njih smo simulirali substitucije, insercije in delecije v različnih kombinacijah ter ob različni stopnji skupnih napak. Poleg že uporabljenih DNA- oznak vzorcev dolžine 8 bp smo preizkusili še oznake dolžin 12 in 16 bp. Bioinformatski cevovod se je izkazal za učinkovitega samo s substitucijami, pri simuliranih vseh treh vrstah napak pa ne. Kljub temu smo ugotovili, da so trenutno uporabljene oznake dolžine 8 bp pri visokih stopnjah napak, posebej pri simuliranih vseh treh vrstah, neuporabne in je za uspešno identifikacijo vzorcev potrebna uporaba daljših, preferenčno 16 bp dolgih oznak. Ugotovili smo tudi, da se težave lahko pojavijo pri iskanju oligonukleotidnih začetnikov in posledično identifikaciji lokusov, ki jih označujejo. Našli smo šibke točke v cevovodu in predlagamo možne rešitve. Predstavljeni bioinformatski cevovod je tako primeren kot podlaga na nadaljnje delo.

Language:	Slovenian
Keywords:	sekvenciranje tretje generacije, sekvenciranje z visokimi stopnjami napak, mikrosateliti, kratke sekvence DNA
Work type:	Master's thesis/paper
Organization:	BF - Biotechnical Faculty
Year:	2021
PID:	20.500.12556/RUL-125033
COBISS.SI-ID:	55304195
Publication date in RUL:	02.03.2021
Views:	1238
Downloads:	433
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Algorithm for reliable recognition of short DNA tag sequences in presence of high sequencing error rates
Third-generation sequencing technologies, especially nanopores, present the possibility of fast sequencing DNA and obtaining long reads. Their downsides are high error rates. In this thesis, we present a bioinformatics pipeline for processing microsatellite sequences obtained using third-generation sequencing. For testing, we used brown bear (Ursus arctos) microsatellite sequences obtained from non-invasive genetic samples. They were sequenced on the Illumina platform. In these sequences, we simulated substitutions, insertions, and deletions with various combinations and different total error rates. Aside from the previously used 8 bp DNA tags for sample marking, we also tested longer 12 and 16 bp tags. Our bioinformatics pipeline was effective when dealing with substitutions only. It was ineffective when all three error types were simulated. Nonetheless, we found that the currently used 8 bp tags are not useful at high error rates, especially when dealing with all three error types. We also found issues with the primer search, and, consequently, identification of loci that are marked by the primers. We identified weak points in the pipeline and thus suggest possible solutions. The presented bioinformatics pipeline should therefore provide a useful basis for further work.
Keywords:	third generation sequencing, sequencing with high error rates, microsatellites, short DNA sequences

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents