A toolset of constitutive promoters for metabolic engineering of Rhodosporidium toruloides

Background Rhodosporidium toruloides is a promising host for the production of bioproducts from lignocellulosic biomass. A key prerequisite for efficient pathway engineering is the availability of robust genetic tools and resources. However, there is a lack of characterized promoters to drive expression of heterologous genes for strain engineering in R. toruloides. Results This data describes a set of native R. toruloides promoters, characterized over time in four different media commonly used for cultivation of this yeast. The promoter sequences were selected using transcriptional analysis and several of them were found to drive expression bidirectionally. Promoter expression strength was determined by measurement of EGFP and mRuby2 reporters by flow cytometry. A total of 20 constitutive promoters (12 monodirectional and 8 bidirectional) were found, and are expected to be of potential value for genetic engineering of R. toruloides. Conclusions A set of robust and constitutive promoters to facilitate genetic engineering of R. toruloides is presented here, ranging from a promoter previously used for this purpose (P7, glyceraldehyde 3-phosphate dehydrogenase, GAPDH) to stronger monodirectional (e.g., P15, mitochondrial adenine nucleotide translocator, ANT) and bidirectional (e.g., P9 and P9R, histones H3 and H4, respectively) promoters. We also identified promoters that may be useful for specific applications such as late-stage expression (e.g., P3, voltage-dependent anion channel protein 2, VDAC2). This set of characterized promoters significantly expands the range of engineering tools available for this yeast and can be applied in future metabolic engineering studies. Electronic supplementary material The online version of this article (10.1186/s12934-019-1167-0) contains supplementary material, which is available to authorized users.


Monodirectional promoters
Of the 16 putative monodirectional promoters selected for investigation, 12 resulted in medium to high fluorescence when driving EGFP expression in R. toruloides. Hierarchical clustering of a heatmap generated from flow cytometry data (Figure 2) shows that the strongest monodirectional promoters under most conditions are P14 (translation elongation factor 1, TEF1), P17 (hypothetical protein, RTO4_15825), and P15 (mitochondrial adenine nucleotide translocator, ANT) -see Table 1 for more information on promoter sequences. Overall, promoters P14, P17, P15, P4, P27, P18, and P10 are classified as strong promoters, while promoters P7, P1, P5, P3 and P6 are clustered as medium-strong promoters (in decreasing order of strength). It is worth noting that promoter P7, which natively drives expression of GAPDH and is commonly viewed as a benchmark strong promoter, is classified as medium-strong within this promoter set.
In order to evaluate consistency of expression of different coding sequences, promoters were also used to drive expression of mRuby2 ( Figure SI2). Again, promoters P6, P3 and P1 were classified as medium-strong promoters and promoters P14, P15 and P17 were clustered as the strongest promoters.

Bidirectional promoters
Each of the 13 promoters that was predicted to be bidirectional (i.e., selected by transcriptomics and comprising a complete intergenic sequence in R. toruloides) was positioned in between the divergent coding sequences for EGFP and mRuby2, allowing expression in both directions to be monitored simultaneously. Of these, 8 resulted in medium-to-high reporter fluorescence in both directions in at least one condition. Hierarchical clustering of a heatmap generated from flow cytometry data shows that the strongest bidirectional promoter pairs are P9 and P9R (histones H3 and H4, respectively), P12 and P12R (small subunit ribosomal proteins S28e and S5e, respectively), and P19 and P19R (ribosomal proteins S15e and LP2, respectively) (Figure 3).
The forward (EGFP) and reverse (mRuby2) promoter strengths for these three bidirectional pairs are well-balanced across a range of conditions. Under most conditions, promoters P9 and P9R, constituting the strongest bidirectional promoter pair, are similar in strength to the strongest of the monodirectional promoters. Promoter pairs P13/R, P28/R, P22/R, P29/R and P8/R were clustered as medium-strength promoters in at least one direction. Of these, P29 and P29R are the most evenly-balanced, while P8 and P8R are the most divergent in terms of promoter strength.
Fluorescence from a second reporter was measured for each of these bidirectional promoters by cloning them in orientation 2 (Figures SI1 and SI3). Promoters P9/R, P19/R and P12/R were again classified as the strongest bidirectional promoter pairs while P8/R and P29/R were overall the weakest promoter pairs. Promoter P22 was classified as medium bidirectional promoter when assayed in both orientations.

Correlation of mRuby2 and EGFP expression
To evaluate promoter reliability and robustness, the expression level of the two reporter genes (EGFP and mRuby2) driven by the same promoter were compared. Linear regression and correlation analysis of fluorescence from the two reporters was performed for all the promoters comparing EGFP from orientation 1 and mRuby2 from orientation 2 (EGFP expression for these promoters is shown in Figures 2 and 3 and mRuby2 expression is shown in Figures SI2 and SI3) in the four different media at 48 hours (Figure 4).
Promoter P1, P12, and P29 expressed EGFP at significantly higher levels than mRuby2. This could suggest a composability effect, in which contextual effects such as the mRNA secondary structure surrounding the ATG is less favorable for EGFP than for mRuby2 [17][18][19]. The benchmark promoter P7 (GAPDH) expressed mRuby2 at higher levels than EGFP in strains grown in YPD medium or SD with 1% xylose, indicating that expression levels from this promoter may be less predictable in certain media. Most of the strongest promoters, including P4, P9, P10, P14, P15, P17, P18 and P27, produce similar levels of EGFP and mRuby2 fluorescence, and are recommended for metabolic engineering needs.

Discussion
The lack of standard parts that can be reliably used for genetic manipulation is often a bottleneck in metabolic engineering studies involving non-model microbial hosts. We aimed to find and characterize a collection of promoters to support reliable gene expression in the fungal host R. toruloides. We selected 29 R. toruloides promoters, based on transcriptomics studies, and examined them in four different media over seven days using a dual-reporter system. P GAPDH is one of the most widely used promoters for engineering a variety of yeasts, including R. toruloides, where it was employed for expression of heterologous terpene synthases [1].
However, 13 of the promoters investigated in this study were found to be stronger than P GAPDH , under the majority of conditions (Figures 2 and 3).
Temporal changes in promoter strength can be an important factor in guiding their selection for pathway engineering. Under control of several of the promoters, reporter fluorescence reaches its maximum early (at 8 or 24 hours) and then diminishes over time or drops to a lower but stable level (Figures 2 and 3). Interestingly, this occurs for two of the strongest promoters, P15 (ANT) and P9 (histone H3) when strains are cultured in YPD medium, but not in the SD media. This is likely due to the cells reaching stationary phase significantly faster in YPD compared to SD media ( Figure SI4 and SI5). In contrast, expression from promoters P3 (voltage-dependent anion channel protein 2, VDAC2) and P6 (unknown function) increases over time under some conditions. These two promoters may be useful for applications where late expression of a protein is desired (e.g., generation of metabolic products that are inhibitory to growth or consumed during log phase).
Correlation of fluorescence from EGFP and mRuby2 at the 48 hours suggests that most of the promoters express both reporter genes at similar rates, suggesting a reasonable level of composability, with a few exceptions (Figure 4). A similar analysis of fluorescence from the two reporters driven by promoter P9, which includes every time point and growth medium, indicates that the expression level of the two reporters is similar under all conditions for this promoter ( Figure SI6).
We investigated promoters from R. toruloides that were posited to have medium or strong transcriptional activity in both directions. Bidirectional promoters can reduce design and assembly complexity and also help keep DNA construct sizes within reasonable limits, a useful feature when engineering multigene pathways [20][21][22]. Of the 14 promoters that were predicted to be bidirectional according to RNA-sequencing data (P1, P2, P8, P9, P11, P12, P13, P19, P21, P22, P24, P25, P28 and P29) fluorescence of both reporters was detected for 8 of them, indicating bidirectional transcription (Figure 3). Expression in the forward and reverse directions from the strongest bidirectional promoter pairs (P9/P9R, P12/P12R and P19/P19R) is balanced (Figures 3 and SI3), a feature that should prove valuable for metabolic engineering applications where equal expression of two genes is sought. Each of these three promoters drives expression of two genes that are closely related in function. Promoters P9 and P9R, the bidirectional pair natively responsible for expression of histones 3 and 4, respectively (Table 1) [10,11,13,14]. Since normalization of the reporter genes used to characterize these promoters differs between these studies, it is challenging to compare strength of these promoters with the collection presented here.
Synthetic biology studies often rely heavily on the fine-tuning of gene expression levels, to achieve a metabolic balance and a high product titer, rate and yield [5,9,25,26]. The majority of engineering work to date performed to date in R. toruloides has relied on random-integration strategies such as ATMT [1,2, 14,15], that offer little control over integration locus or copy number. The strong promoters characterized in this work will complement the recent development of CRISPR technology for R. toruloides, allowing for site-specific integration while maintaining high expression rates [27].

Conclusions
RNA sequencing data was found to be a useful starting point for identification of R. toruloides promoters that can be used for heterologous expression. The collection of 12 monodirectional and 8 bidirectional native promoters presented in this work is the largest promoter set published for R. toruloides, including the first bidirectional promoters reported for this organism. Among these were 13 promoters that are stronger than the benchmark GAPDH promoter, the most commonly used promoter for yeasts. This characterized promoter set expands the R. toruloides genetic toolbox and is likely to be valuable for future metabolic engineering efforts in this promising host. For the promoter studies, all 60 R. toruloides IFO0880 ∆ku70 strains (the parent strain, 58 strains containing promoter constructs, a ∆car2 control strain) were inoculated into 24-deep-well plates containing 2 mL of LB (Miller) and grown overnight at 30 °C with shaking at 200 rpm. These cultures were then used to inoculate (at a 1:100 dilution) four different media in 24-deep-well plates: SD 1% xylose, SD 1% glucose, SD 1% xylose plus 1% glucose, and YPD. These cultures were grown to exponential phase overnight at 30 °C with shaking at 200 rpm and were then used to inoculate 3 mL of identical media in 24-deep-well plates at a starting OD 600 of 0.05. These cultures were grown for 7 days at 30 °C in a shaker at 200 rpm with samples taken every 8, 24, 48, 96 and 168 hours.

Strains and plasmid construction
The parent strain for all work described here is R. toruloides IFO0880 harboring a deletion for the non-homologous end joining (NHEJ) gene, KU70, to facilitate a higher proportion of sitespecific integration events in each transformation ( Figure 1A) [2]. R. toruloides strain IFO0880 ∆ku70, a haploid strain, was used for transformations to favor homologous recombination.
Strains and plasmids used in this study are available upon request through the Joint BioEnergy Institute Strain Registry (https://public-registry.jbei.org/). Gene synthesis and plasmid construction were performed by Joint Genome Institute (JGI) and two constructs were designed for each promoter so that it was positioned between EGFP and mRuby2 coding sequences in both directions. Constructs also contain a Nourseothricin acetyltransferase (nat1) cassette, conferring resistance to Nourseothricin, and homology regions corresponding to the CAR2 locus.
A schematic representation of the expression cassette built for integration into R. toruloides genome can be seen in Figure 1 and selected promoters and their combined annotations can be seen in Table 1.

RNA-sequencing
RNA-sequencing was performed at the Joint Genome Institute and used to identify a set of promoters that drive high or medium expression in different media including SD medium with 2% glucose (NCBI accession SRP183741, SRP183740, SRP183739), and M9 minimal medium with added trace elements and 2% glucose (NCBI accession SRP164139, SRP164140, SRP164141) or 10 mM p-coumaric acid (NCBI accession SRP164142, SRP164143, SRP164144). Another dataset was generated at the University of California, Berkeley and included in the analysis (NCBI accession GSE128360), which consists of samples grown in YNB minimal medium with 3% glucose and different carbon to nitrogen ratios (7mM ammonium chloride or 168mM ammonium chloride), and a library made of pooled samples grown in several media including YNB medium with different carbon sources (glucose, xylose, glycerol, or glutamate) and YPD medium with different conditions (growth phase, osmolarity, temperature, and oxygen availability). Sequenced reads were trimmed and filtered using BBTools (https://jgi.doe.gov/data-and-tools/bbtools/), mapped to IFO0880 genome sequence using HISAT2 [28], mapped reads were assigned to genes using featureCounts [29], and read counts were used to calculate Fragments Per Kilobase of transcript per Million mapped reads (FPKM). High expression was defined as log2 of the mean FPKM values above 10, and medium expression was defined as log2 of the mean FPKM values between 8 and 10. In order to select promoters with constitutive expression, 20 genes with the lowest variance across different growth conditions were selected from each high and medium expression gene. Subsequently, promoter regions of high and medium expression genes were selected for testing, spanning 1 kb of upstream sequence or a shorter distance in cases where an upstream CDS was identified. In some cases, the predicted promoter region of a selected gene was cut short by an upstream CDS on the opposite strand and was annotated as a putative bidirectional promoter.

R. toruloides transformation
Plasmids were purified from E. coli DH10B strains using a QIAprep plasmid miniprep kit °C for two days to differentiate the white from orange colonies. For each construct, at least three independent transformants were grown overnight in YPD and tested for fluorescence by flow cytometry (Figure 1B).

Flow cytometry
High-throughput flow cytometry experiments were performed using the Accuri C6 flow  were clustered by hierarchical clustering using Euclidean distance as the distance metric selection through average linkage clustering. The remaining graphs were created using GraphPad Prism ® (GraphPad Software, San Diego, CA), also used to calculate linear regression as needed.

High performance liquid chromatography (HPLC) analysis
Sugars were quantified on an Agilent Technologies 1200 series HPLC (Agilent Technologies, Santa Clara, CA, USA) equipped with an Aminex HPX-87H column (BioRad, Hercules, CA) as described previously [30]. Samples were filtered through 0.45 µm filters (VWR, Visalia, CA, USA) before injection of 5 µl of each sample onto the column. Sugars were monitored by a refractive index detector, and concentrations were calculated by integration of peak areas and comparison to standard curves for the compounds of interest.