Skip to main content
Fig. 3 | Microbial Cell Factories

Fig. 3

From: Characterization and optimization of 5´ untranslated region containing poly-adenine tracts in Kluyveromyces marxianus using machine-learning model

Fig. 3

Construction and analysis of a machine-learning model that predicts the GFP abundance by features of 5´ UTR. (A) Validation of the MLP-NN model. The plot compared the measured versus the predicted relative GFP abundance, with R2 for the train and test sets included. (B) 5´ UTR features ranked by their mean absolute SHAP values. Mean SHAP values were obtained by performing sensitivity analysis on the MLP-NN model. Description of the features: 5´ UTR length, length of 5´ UTR; oofuAUG, number of out-of-frame upstream AUGs and upstream ORFs; MFE, minimum free energy; poly(A) position, the distance between the longest poly(A) tract and AUG; CACC, the presence of at least one CACC motif in the 5´ UTR; GACA, the presence of at least one GACA motif in the 5´ UTR; GG, the presence of at least one GG motif in the 5´ UTR; CC in [-7, -6], the presence of the motif CC at position [-7, -6] relative to the position of AUG; AA in [-3, -2], the presence of the motif AA at position [-3, -2]; A in [-1], the presence of the A at position − 1; poly(A) length, length of the longest poly(A) in 5´ UTR; A/G in [-3], the presence of the A or G at position − 3; T in [-3], the presence of the T at position − 3; AC in [-2, -1], the presence of the motif AC at position [-2, -1]; CA in [-7, -6], the presence of the motif CA at position [-7, -6]. (C) The relationship between the values of 5´ UTR features and SHAP values. Red and blue dots indicated high and low feature values, respectively. (D) A negative correlation between SHAP value and the length of poly(A). (E) The relationship between SHAP value and the position of poly(A). The majority of SHAP values were positive at distances between 10 and 30 nt, as indicated

Back to article page