Transcriptomic and fluxomic changes in Streptomyces lividans producing heterologous protein

Background The Gram-positive Streptomyces lividans TK24 is an attractive host for heterologous protein production because of its high capability to secrete proteins—which favors correct folding and facilitates downstream processing—as well as its acceptance of methylated DNA and its low endogeneous protease activity. However, current inconsistencies in protein yields urge for a deeper understanding of the burden of heterologous protein production on the cell. In the current study, transcriptomics and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{13}\hbox {C}$$\end{document}13C-based fluxomics were exploited to uncover gene expression and metabolic flux changes associated with heterologous protein production. The Rhodothermus marinus thermostable cellulase A (CelA)—previously shown to be successfully overexpressed in S. lividans—was taken as an example protein. Results RNA-seq and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{13}\hbox {C}$$\end{document}13C-based metabolic flux analysis were performed on a CelA-producing and an empty-plasmid strain under the same conditions. Differential gene expression, followed by cluster analysis based on co-expression and co-localization, identified transcriptomic responses related to secretion-induced stress and DNA damage. Furthermore, the OsdR regulon (previously associated with hypoxia, oxidative stress, intercellular signaling, and morphological development) was consistently upregulated in the CelA-producing strain and exhibited co-expression with isoenzymes from the pentose phosphate pathway linked to secondary metabolism. Increased expression of these isoenzymes matches to increased fluxes in the pentose phosphate pathway. Additionally, flux maps of the central carbon metabolism show increased flux through the tricarboxylic acid cycle in the CelA-producing strain. Redirection of fluxes in the CelA-producing strain leads to higher production of NADPH, which can only partly be attributed to increased secretion. Conclusions Transcriptomic and fluxomic changes uncover potential new leads for targeted strain improvement strategies which may ease the secretion stress and metabolic burden associated with heterologous protein synthesis and secretion, and may help create a more consistently performing S. lividans strain. Yet, links to secondary metabolism and redox balancing should be further investigated to fully understand the S. lividans metabolome under heterologous protein production. Electronic supplementary material The online version of this article (10.1186/s12934-018-1040-6) contains supplementary material, which is available to authorized users.

The local clustering algorithm uses a n×m gene expression matrix M containing TPM (transcripts per million) values, with n the number of measured genes and m the number of samples. Since clustering is dependent on genes' physical location on the genome, it is important that genes in the dataset are ordered by location and that the dataset is either gapless, or that the omitting of known genes is accounted for when calculating the physical distance between genes. The expression prole of one gene can be denoted by vector

Measuring gene similarity
For measuring the similarity between two genes, their TPM values are rst normalized by z-scoring. The distance D ij between two genes is then taken as the euclidean distance between their normalized expression vectors.

Distribution of distances
The total number of distances D ij (1 i, j n|i = j) in the dataset is: By determining and sorting the full set of distances we can obtain the cumulative distribution function (CDF) of distances, as illustrated in Figure 1). From this CDF it is possible to nd a p-value corresponding to a given distance and vice versa, which can be used as an indicator to determine if two genes are signicantly co-expressed. If, for example, two genes are assumed to be coexpressed when their distance D ij is in the lowest 1% of all distances, we can determine from the CDF illustrated in Figure 1 that D ij needs to be 2.07 or less.
Note that it is more ecient to only store the distances corresponding to the low p-values. Furthermore, a random selection of genes from the dataset might suce to generate a reliable CDF. This becomes important once the number of genes in the datasets grow very large. With the obtained CDF, it is now possible to match all genes that are coexpressed with a certain p-value. However, in order to introduce dependence on genomic location in the matching process, the p-value required for matching two genes is decreased as the physical distance on the genome increases. The distance between genes i and j in number of genes, |i − j|, and a base p-value p 0 is taken as the p-value required to match neighbouring genes, the specic cut-o p-value p co,ij for matching i and j is given by: (1) This function is shown in Figure 2 for p 0 = 0.01. A toy example with 7 genes showing the location-based cut-o p-values for p 0 = 0.01 and resulting gene matches is given in Figure 3. Figure 2: Dependence on physical location of cuto p-value for matching two genes based on their expression proles. As the physical distance between the two genes increases, the p-value drops and genes should have a more similar expression proles to be matched. The base p-value p 0 the value required for matching two neighbouring genesis 0.01 in this gure.

Forming and splitting clusters
The initial cluster forming procedure is straightforward: all genes that match directly or indirectly (through matches with other genes) are joined to the same cluster, which can be represented as a graph (Figure 3 B). This process can lead to large, heterogeneous clusters, especially for high values of p 0 . To ensure expression homogeneity within a cluster, all cluster genes are required to be co-expressed with a p-value lower than a given tolerance p tol . If for genes i and j in the same cluster p tol p ij , the cluster graph is split in two using a minimal cut algorithm, in which the weights of the edges for the matched genes (genes with a direct link) are given by: In which D p=pco,ij is the cut-o distance for which genes i and j are matched, and D ij is the actual distance between the genes. Note that in this equation, D ij D p=pco,ij , as otherwise the genes would not have been matched. For genes within the cluster that are not matched directly W ij = 0. After splitting a cluster, both resulting clusters are again examined for p-values higher than p tol , and split further if necessary. Cluster splitting goes on until all genes within all resulting clusters meet the given p tol . If a cluster contains multiple pairs X ij = {i, j|p tol < p ij }, a choice needs to be made as to which pair to split using the minimal cut algorithm, and the order of splitting dierent pairs will result in dierent clusters, and even dierent numbers of clusters. For large clusters, the number of gene-pairs that need to split can become very large, making it impossible to nd a global optimal split (an NP-hard problem). The best results in terms total number of resulting clusters were obtained by rst requiring a minimal cut between the pair with the smallest distance D ij , as splitting this pair has the highest rate of splitting additional pairs of X, requiring a lower number of splits until all clusters meet the given criterion.

Results
Results are grouping of genes to clusters based on their expression proles and their physical location on the genome. The maximal distance between two genes within one cluster is D p=p tol (the distance corresponding to the given tolerance p-value). Note that not all genes are assigned to a cluster: genes that do not have nearby genes that are suciently co-expressed will remain isolated. Figure 3: Toy example of gene matching mechanism for seven genes. (A) Requirements for matching genes increases (cut-o p-value decreases) as physical distance between genes increases. Exemplary gene matches are given by the red dots. Note that the matching pattern is always symmetrical. (B) The resulting cluster in the form of a graph in which the edges are the matches (direct links) between genes. The cluster contains all genes except gene 4.