Prediction of Protein Expression and Growth Rates by Supervised Machine Learning

Zhao, Simiao (2021) Prediction of Protein Expression and Growth Rates by Supervised Machine Learning. Natural Science, 13 (08). pp. 301-330. ISSN 2150-4091

[thumbnail of ns_2021080214163103.pdf] Text
ns_2021080214163103.pdf - Published Version

Download (15MB)

Abstract

The DNA sequences of an organism play an important influence on its transcription and translation process, thus affecting its protein production and growth rate. Due to the com-plexity of DNA, it was extremely difficult to predict the macroscopic characteristics of or-ganisms. However, with the rapid development of machine learning in recent years, it be-comes possible to use powerful machine learning algorithms to process and analyze biolog-ical data. Based on the synthetic DNA sequences of a specific microbe, E. coli, I designed a process to predict its protein production and growth rate. By observing the properties of a data set constructed by previous work, I chose to use supervised learning regressors with encoded DNA sequences as input features to perform the predictions. After comparing different encoders and algorithms, I selected three encoders to encode the DNA sequences as inputs and trained seven different regressors to predict the outputs. The hy-per-parameters are optimized for three regressors which have the best potential prediction performance. Finally, I successfully predicted the protein production and growth rates, with the best R2 score 0.55 and 0.77, respectively, by using encoders to catch the potential fea-tures from the DNA sequences.

Item Type: Article
Subjects: STM Repository > Medical Science
Depositing User: Managing Editor
Date Deposited: 08 Nov 2023 08:55
Last Modified: 08 Nov 2023 08:55
URI: http://classical.goforpromo.com/id/eprint/4572

Actions (login required)

View Item
View Item