Bachelor's thesis is aiming at developing an algorithm for designing nucleotide sequences of genes for expression in E. coli. The current method for codon usage optimization is based on codon usage of each codon throughout the whole genome. However, this optimization procedure does not account for translation time and it's relation to protein folding while disregarding the frequency of transcripts in a single cell.
The authors built an algorithm for designing an optimized codon sequence of a protein from an amino acid sequence. The algorithm is based on the connection between codon sequence and translation time, which is predicted by a long short-term memory neural network. The translation time of a protein is calculated as a local average of translation times of individual codons. Translation time is converted into a codon sequence as the best fit of the translation time of different combinations of codons.
The model prediction is comparable to the current method of codon optimization. The comparison of translation times indicates there are sections of proteins with higher translation time. Codon optimization in those sections is important for yield in the synthesis of proteins because translation time affects protein folding.
|