izpis_h1_title_alt

Algoritmi za optimizacijo rabe kodonov : delo diplomskega seminarja
ID Mramor, Anže (Author), ID Žitnik, Arjana (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (523,83 KB)
MD5: A3CD743237D9873CEDE1D461BEDCE1E1

Abstract
Delo diplomskega seminarja se ukvarja z matematično rešitvijo biokemijskega problema – želimo sestaviti algoritem, ki bo v realnem času poiskal optimalno zaporedje kodonov proteina za izražanje v celicah E. coli. Natančneje, za eksperimentalno dobljene podatke sinteze proteina želimo določiti takšno zaporedje kodonov, da bi vrednosti časov translacije posamezne aminokisline, ki jih izračunamo po formuli lokalnega povprečja treh zaporednih kodonov, od njih odstopale čim manj. Problem rešujemo z dinamičnim programiranjem. Najprej izpeljemo rekurzivno formulo, ki nam bo rešila optimizacijski problem. Nato rekurzivno formulo razširimo v splošen računalniški algoritem, ki ga tudi implementiramo v programskem jeziku Python. Najprej preverimo delovanje algoritma na podatkih treh obstoječih proteinov. Zaporedje želenih časov translacije dobimo iz dejanskih proteinov – na tak način zagotovimo, da se začetni aproksimirani podatki ujemajo z rezultati algoritma, ter tako preverjamo uspešnost algoritma. Algoritem se na takšni vrsti podatkov izkaže za delujoč v vseh treh primerih, zato bazo testiranih podatkov razširimo na 900 naključno generiranih proteinov. Tudi tu algoritem za vsak protein s 100 % uspešnostjo najde ustrezno zaporedje kodonov. Poleg tega preverjamo še časovno zahtevnost algoritma, ki se izkaže za linearno. V zadnjem delu naloge spremenimo podatke s šumom (s šumi velikosti 1 %, 5 % in 10 %), tako da so želeni časi drugačni od podanih translacijskih časov za translacijo kodonov. Ponovno preverimo uspešnost algoritma, ki je z večanjem šuma vedno manjša. Izkaže se, da ima ključno vlogo pri uspešnosti algoritma izbira norme znotraj rekurzivne formule. Preizkušamo prvo, drugo in neskončno normo, pri čemer se druga izkaže za najbolj uspešno, prva in neskončna pa sta v deležu uspešno določenih kodonov identični.

Language:Slovenian
Keywords:kodoni, proteini, translacijski čas, dinamično programiranje, norme, časovna zahtevnost
Work type:Final seminar paper
Typology:2.11 - Undergraduate Thesis
Organization:FMF - Faculty of Mathematics and Physics
Year:2021
PID:20.500.12556/RUL-130278 This link opens in a new window
UDC:519.8
COBISS.SI-ID:76299011 This link opens in a new window
Publication date in RUL:12.09.2021
Views:1196
Downloads:128
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Algorithms for optimization of codon usage
Abstract:
This thesis is dealing with mathematical solution of a biochemical problem – we wish to develop an algorithm, which will find optimal sequence of codons in a protein for expression in E. coli in real time. To be more exact, we wish to determine such sequence of codons that the difference between time values of experimentally obtained data of times of synthesis of a protein and time values of translation of each individual aminoacid, which are computed using formula of local average of three consecutive codons, would be minimal. We are solving the problem with dynamic programming. We first derive recursive formula that solves the optimization problem. We then expand it into a computer algorithm, which we implement in Python programming language. We first check the functionality of algorithm on a small sample of three existing proteins. We get the sequence of target times of translation from the chosen proteins themselves – this way we can be sure, whether the inputed data that is being approximated matches results of the algorithm and therefore how successful it is. With this kind of sample the accuracy rate of algorithm is 100 %. We then expand our base of test data on 900 randomly generated proteins. Algorithm’s accuracy rate remains 100 %. We are also analyzing the running time of the algorithm, which is linear. In the last part of our thesis we apply noise ratio (with rates of 1 %, 5 % and 10 %) to our data. This way the target times become different from the values algorithm can compute out of given translation times for translation of codons. We again test the accuracy of algorithm, which is getting lower as we are raising the values of noise ratio. We conclude that the key part in accuracy rate of algorithm is played by the choice of norm inside the recursive formula. We test first, second and infinity norm. We find that algorithm has the highest accuracy rate when second norm is chosen, while first and infinity norm have identical accuracy.

Keywords:codons, proteins, translation time, dynamic programming, norms, running time

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back