Transcriptome-enabled discovery and functional characterization of enzymes related to (2S)-pinocembrin biosynthesis from Ornithogalum caudatum and their application for metabolic engineering

Background (2S)-Pinocembrin is a chiral flavanone with versatile pharmacological and biological activities. Its health-promoting effects have spurred on research effects on the microbial production of (2S)-pinocembrin. However, an often-overlooked salient feature in the analysis of microbial (2S)-pinocembrin is its chirality. Results Here, we presented a full characterization of absolute configuration of microbial (2S)-pinocembrin from engineered Escherichia coli. Specifically, a transcriptome-wide search for genes related to (2S)-pinocembrin biosynthesis from Ornithogalum caudatum, a plant rich in flavonoids, was first performed in the present study. A total of 104,180 unigenes were finally generated with an average length of 520 bp. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway mapping assigned 26 unigenes, representing three enzyme families of 4-coumarate:coenzyme A ligase (4CL), chalcone synthase (CHS) and chalcone isomerase(CHI), onto (2S)-pinocembrin biosynthetic pathway. A total of seven, three and one full-length candidates encoding 4CL, CHS and CHI were then verified by reverse transcription polymerase chain reaction, respectively. These candidates were screened by functional expression in E. coli individual or coupled multienzyme reaction systems based on metabolic engineering processes. Oc4CL1, OcCHS2 and OcCHI were identified to be bona fide genes encoding respective pathway enzymes of (2S)-pinocembrin biosynthesis. Then Oc4CL1, OcCHS2 and MsCHI from Medicago sativa, assembled as artificial gene clusters in different organizations, were used for fermentation production of (2S)-pinocembrin in E. coli. The absolute configuration of the resulting microbial pinocembrin at C-2 was assigned to be 2S-configured by combination of retention time, UV spectrum, LC–MS, NMR, optical rotation and circular dichroism spectroscopy. Improvement of (2S)-pinocembrin titres was then achieved by optimization of gene organizations, using of codon-optimized pathway enzymes and addition of cerulenin for increasing intracellular malonyl CoA pools. Overall, the optimized strain can produce (2S)-pinocembrin of 36.92 ± 4.1 mg/L. Conclusions High titre of (2S)-pinocembrin can be obtained from engineered E. coli by an efficient method. The fermentative production of microbial (2S)-pinocembrin in E. coli paved the way for yield improvement and further pharmacological testing. Electronic supplementary material The online version of this article (doi:10.1186/s12934-016-0424-8) contains supplementary material, which is available to authorized users.

The biosynthesis of (2S)-pinocembrin (2) begins with the phenylpropanoid pathway, in which trans-cinnamic acid (5, t-CA) is used to generate trans-cinnamoyl CoA (10) by 4-coumarate:coenzyme A ligase(4CL). Chalcone synthase (CHS) catalyzes the stepwise condensation of three acetate units from malonyl CoA (17) with transcinnamoyl CoA (10) to yield pinocembrin chalcone (4). The latter is then converted to (2S)-pinocembrin (2) by the action of chalcone isomerase (CHI) in vivo or to racemic pinocembrin non-enzymatically (Fig. 2). The health-promoting effects of (2S)-pinocembrin (2) have spurred on research efforts towards the development of microbial production platforms using phenylpropanoid and flavonoid biosynthetic enzymes [18,[26][27][28][29][30]. Up to date, pinocembrin has been obtained from engineered Escherichia coli [18,20,31], Saccharomyces cerevisiae [29,30] and Streptomyces venezuelae [27] by combinational expression of pathway enzymes with diverse genetic sources. These studies, although valuable, have a distinct defect, namely no full characterization of stereochemistry of microbial (2S)-pinocembrin (2). Besides this, it will be necessary to test much more structural genes coming from varied origins because the cloning and the characterization of diverse genes can offer new perspectives in the development of recombinant microorganisms capable of a high and optimized production of microbial (2S)-pinocembrin (2). With these in mind, this study describes the isolation and functional expression of enzymes related to a complete (2S)-pinocembrin (2) pathway from Ornithogalum caudatum for the first time. Importantly, these enzymes were then used to successfully rebuild a biosynthetic circuit in E. coli to acquire (2S)-pinocembrin (2), which broadened the genetic sources of gene parts used for microbial (2S)-pinocembrin (2) production. What's more, the present study fully characterized the absolute configuration of microbial (2S)-pinocembrin (2), which is uniquely value for yield improvement and further pharmacological testing of chiral (2S)-pinocembrin (2).

KEGG pathway analysis of O. caudatum unigenes
The transcriptome is the universe of expressed transcripts within a cell at some particular state. Transcriptome sequencing is a high-throughput approach and can yield a tremendous amount of sequences in each run, far greater than that produced by traditional techniques. Transcriptome sequencing, therefore, can greatly accelerate full-length genes isolation. In the present study, a total of 104,180 unigenes with an average length of 520 bp were acquired from transcriptome de novo assembly. These unigene sequences were aligned to KEGG pathways by KEGG analysis. Results showed varied unigenes were assigned to every step of (2S)-pinocembrin (2) biosynthesis (Additional file 1: Table S1). Totally, 19, 3 and 4 unigenes showing high similarity with 4CL, CHS and CHI were retrieved from transcriptome sequence, respectively (Additional file 1: Table S1). These unigenes were further analyzed by BLAST X for their ORF (open reading frame) identification. Some of these unigenes were predicted to contain full-length complementary DNA (cDNA) sequences and the others had partial cDNA encoding sequences.
These predicted full-length cDNA sequences can be isolated from O. caudatum cDNA directly by nested polymerase chain reaction (PCR). The missing sequences of these tentatively partial cDNA, however, were obtained mainly by RACE (rapid amplification of cDNA end) [32]. Finally, a total of 11 full-length cDNAs, including seven  Fig. 1 Chemical structures of compounds investigated in this study 4CL-like sequences, three CHS-like cDNA and one fulllength CHI-like fragment, were isolated from O. caudatum (Additional file 1: Table S1). All of these ORFs were then inserted to the cloning vector pEASY ™ -T1 Simple vector for sequencing. The results verified that these cDNA sequences were identical with the result from transcriptome sequencing, which means the real genes in planta. Therefore, these sequences were deposited in the GenBank database (Table 1).

cDNA isolation and functional characterization of 4CL gene family
A 4CL gene family harboring seven full-length cDNAs, namely Oc4CL1-7, was isolated from O. caudatum by nested PCR (Table 1). These cDNAs were cloned into pEASY ™ -T1 to generate pEASY-Oc4CLs for sequencing. After sequences verification, the Oc4CL genes were cloned into E. coli vector pET-28a (+) resulting in recombinant vectors pET28a-Oc4CLs for heterologous expression by In-Fusion ® method, respectively.
The other six Oc4CL proteins, however, showed no reactive action with any substrates. The enzymatic properties were determined by the purified Oc4CL1 with His 6 -tag in N-terminal. The final content of the purified proteins were 0.0808 mg/mL. The optimum pH of the Oc4CL1 was 7.98. It was stable at pH 6-10, and retained more than 85 % activity even at pH 11. The optimal temperature for Oc4CL activity was 30 °C. The enzyme retained 80.80 and 77.44 % even at 40 and 50 °C, respectively. The kinetic parameters of recombinant Oc4CL1 were determined in an enzyme activity assay using compounds 5-8 as the substrates, respectively. Kinetic parameters of Oc4CL1 against various phenylpropanoid substrates were listed in Table 3. As showed in Table 3, the best substrate for Oc4CL is p-coumaric acid (6), with 16.42 μM of K m value.

cDNA isolation and functional characterization of CHS gene family
A CHS gene family harboring three members, OcCHS1, OcCHS2 and OcCHS3, was isolated from O. caudatum (Table 1). After sequence verification, the three fulllength cDNA sequences were inserted into pET-28a (+) to yield recombinant pET-28a (+) derived vectors for heterologous expression, respectively (Additional file 1: Table S2) (4) and naringenin chalcone (19), respectively. 5 mg purified products each were produced by HPLC and applied to NMR. It is hard, however, to get a clear and complete NMR results due to the instability of the two products, pinocembrin chalcone (4) and naringenin chalcone (19). Both of the two chalcones were thought to be rapidly isomerized into corresponding (2S)-flavanones [(2S)pinocembrin (2) and (2S)-naringenin (21)], which are stable and can be monitored by HPLC and NMR analysis, under the action of CHI. A new approach based on metabolic engineering, therefore, was applied to functionally characterize OcCHSs. Specifically, OcCHSs and MsCHI (M91079) from Medicago sativa L. genes were inserted into pCDFDuet-1 to afford pCDF-OcCHSs-MsCHI (Additional file 1: Table S2). Plasmids pET28a4CL1 and pCDF-OcCHSs-MsCHI were then co-transformed into E. coli to form an artificial pathway of (2S)-pinocembrin (2) biosynthesis. Strain 2 was constructed by grafting the genes coding for Oc4CL1, OcCHS2 and MsCHI into Transetta (DE3) (Additional file 1: Table S2). Strain 1 and 3 contained the same set of flavonoid genes as strain2 with the exception of OcCHS2, which was respectively replaced by OcCHS1 and OcCHS3 (Additional file 1: Table S2). Strains 1-3 were cultured as described previously [19,20,37]. When 0.1 mM trans-cinnamic acid (5) was supplemented in the medium, a new peak with the same retention time and UV spectrum as authentic standard (2RS)-pinocembrin (1) was reproducibly detected in the engineered strain 2 (Fig. 4). The ion peak [M-H] − at m/z 255 in the ESI-MS spectra suggested the

Table 3 Enzyme activities of recombinant Oc4CL1
The K m and V max of recombinant Oc4CL1 proteins were determined from a Lineweaver-Burk plot  new compound has a molecular weight of 256, which is consistent with that of authentic (2RS)-pinocembrin (1). The 1 H NMR spectrum (Table 4) Table 4, the 13 C NMR spectrum presented signals of a carbonyl at δ 196.82 (C-4), and an oxygenated methyne at δ 80.5 (C-2), and a methylene 44.2 (C-3), which were in agreement with the flavanone skeleton. On the basis of the above observations, the structrure of 2 was identified as pinocembrin [14]. The absolute configuration of pinocembrin was further assigned by optical rotation and circular dichroism (CD) spectroscopy. Compared to the control (racemic pinocembrin (1) produced by strain 4), the CD spectrum of microbial pinocembrin exhibited a positive cotton effect at 325 nm and a negative cotton effect at 283 nm, which is consistent with the previous report [38]. Therefore, the absolute configuration of the microbial pinocembrin at C-2 was assigned to be 2S-configured (Fig. 5). This conclusion was further supported by the negative optical rotation ([α] D 23 −22.0°, c 1.67 mg/ mL, DMSO) of the microbial pinocembrin [14]. Thus, the structure of our microbial pinocembrin was determined to be (2S)-pinocembrin (2) (Figs. 4, 5; Table 4). No peak, however, was detected in the engineered strains 1 and 3. These results clearly indicated that OcCHS2 was a bona fide chalcone synthase. Moreover, strain 2 can also produce a major product, which was characterized as naringenin based on the ESI-MS, UV, and NMR data, when the substrate p-coumaric acid (6) was added into the culture broth (Additional file 7: Fig. S6; Additional file 1: Table S3).

cDNA isolation and functional characterization of CHI gene family
A full-length OcCHI cDNA with 633 bp was purified from O. caudatum by nested PCR (Table 1). After sequence verification, the resulting PCR fragment was then inserted into pET-28a (+) to acquire the recombinant expression vector pET28aOcCHI after sequence verification. Next, pET28aOcCHI was introduced to E. coli Transetta (DE3) for heterologous expression. SDS-PAGE (Additional file 8: Fig. S7) and westernblot (Additional file 9: Fig. S8) analyses had an indicative result of soluble expression of OcCHI protein. Both pinocembrin chalcone (4) and naringenin chalcone (19) are the theoretic substrates of OcCHI. Functional identification of OcCHI by in vitro enzymatic reaction was not applicable due to the inaccessibility of the two substrates. A pathway procedure based on metabolic engineering was therefore applied to functionally characterize OcCHI. Specifically, an artificial gene cluster carrying Oc4CL1, OcCHS2 and OcCHI, in the form of plasmids pET28a-Oc4CL1 and pCDF-OcCHS2-OcCHI, was transferred to E. coli to yield strain 5 (Additional file 1: Table S2). Active OcCHI was reflected by the microbial production of (2S)-pinocembrin (2). As illustrated in Fig. 6, a new peak was reproducibly appeared in the fermentation products of strain 5 compared to the control. The retention time of the new peak was identical to that of the authentic standard pinocembrin. The compound was then applied to LC-MS analysis in the negative-ion mode. The new compound appeared at m/z 255[M-H], indicating that it was pinocembrin. However, the amount of pinocembrin in the supernatant of the cell culture was too small to be preparatively collected for further detection. Moreover, the engineered strain 5 also can produce naringenin after the addition of substrate p-coumaric acid (6) (Fig. 6)
To improve the heterologous expression of pathway enzymes, Oc4CL1, OcCHS2 and MsCHI genes were optimized for E. coli using the JCat algorithm (http:// www.jcat.de/) [39]. These codon-optimized genes were therefore applied to construct five more engineered strains, namely strains 7-11 (Additional file 1: Table  S2). These strains were grown in M9 medium with addition of trans-cinnamic acid (5) and the yield of (2S)pinocembrin (2) was compared by HPLC analysis. To test the potential limitations in the engineered pathway, OcCHS2 was first chosen to be highly expressed. As illustrated in Table 5, when condon-optimized OcCHS2 was introduced into E. coli, the resulting strain 7 can produce 4.42 ± 0.07 mg/L (2S)-pinocembrin (2), 1.23-fold than that of the strain 2 ( Table 5). The enhancement of (2S)-pinocembrin (2) yield in strain 7 was supposed to be the result of overexpression of OcCHS2, which leads to a more conversion of pinocembrin chalcone (4) from trans-cinnamoyl CoA (10). To promote the conversion of more (2S)-pinocembrin (2) from pinocembrin chalcone (4), overproduction of CHI is necessary. A condon-optimized MsCHI was, therefore, also introduced into the strain 7 to generate the strain 8. As expected, the yield of (2S)-pinocembrin (2) increased further, reaching to 5.96 ± 0.24 mg/L (Table 5). To direct more trans-cinnamoyl CoA (10) into (2S)-pinocembrin (2) biosynthesis, Oc4CL1 was also over-expressed in the strain 10. Unexpectedly, although Oc4CL1, OcCHS2 and MsCHI were highly expressed in the strain 10, the yield of (2S)-pinocembrin (2) in the strain 10 declined to 4.77 ± 0.17 mg/L, only 80 % of that in the strain 8. The decline in production was deemed to result from two kinds of metabolic burden being placed on the cell. One is related to the synthesis of plasmid-encoded proteins. Previous studies indicated that the overproduction of foreign proteins can cause a metabolic load in the host cell, which resulted in a negative effect on E. coli cells [40,41]. In the present investigation, overproduction of three heterologous proteins Oc4CL1, OcCHS2 and MsCHI in strain 10 may impose metabolic burden on the cell, which in turn cause a decline of (2S)-pinocembrin (2). Moreover, the redundant metabolites in the pathway may lead to the imposition of a metabolic load. In strain 2, the supply of trans-cinnamoyl CoA (10) was so surplus that can not be completely directed to biosynthesis of (2S)-pinocembrin (2) by OcCHS2 and MsCHI, even highly expressed OcCHS2 (strain 7) and MsCHI (strain 8). Therefore, the amount of trans-cinnamoyl CoA (10) accumulated in strain 10 due to overproduction of Oc4CL1, which imposed metabolic burden on E. coli cells. This negative effect on E. coli strains exerted by metabolites accumulation in turn resulted in the lowered yield of (2S)-pinocembrin (2). This notion was further supported by the construction of strains 9 and 11. As with that of the strain 10, the amount of transcinnamoyl CoA (10) was kept constant in strains 9 and 11. However, the consumption of trans-cinnamoyl CoA Fig. 6 HPLC analysis of the fermentation products from strain 5 using trans-cinnamic acid (5, left panel) or p-coumaric acid (6, right panel) as the the substrate, respectively. a&e blank control; b&f HPLC analysis of the fermentation products of strain 5 using trans-cinnamic acid (5, left panel) and p-coumaric acid (6, right panel) as the substrates; c&g HPLC analysis of the standard pinocembrin and naringenin; d&h HPLC analysis of the fermentation products of strain 2 using trans-cinnamic acid (5, left panel) and p-coumaric acid (6, right panel) as the substrates; 1 and 2 refer to pinocembrin and naringenin, respectively

Discussion
Pinocembrin (1) is a chiral compound with a chiral center at C-2 (Fig. 1). Chiral pinocembrin (1) is a racemic mixture of 2 minor image enantiomers, (2S)-pinocembrin (2) and (2R)-pinocembrin (3, Fig. 1). The two enantiomers have shared identical molecular formulas, atom-toatom linkages, and bonding distances. These identical architectures of these two enantiomers resulted in an often-overlooked chirality analysis of microbial pinocembrin [18-20, 28, 37, 44]. It has long been known that differences in the pharmacokinetic profiles and activity of individual stereoisomers exist, and that these differences can cause significant, sometimes harmful, effects in humans [13,45]. Thalidomide tragedy is an example [46,47]. Although it is not sure whether the two enantiomers of pinocembrin have unwanted side effects, it is necessary to analyze the chirality of pinocembrin prior to pharmacological testing. The full characterization of absolute configuration of microbial (2S)-pinocembrin (2) by combination of MS, NMR, CD and optical detection is thereby uniquely valuable in the present study ( Fig. 5; Table 4), which is the first step toward yield improvement and further pharmacological testing.
There are at least three enzymes, such as 4CL, CHS and CHI, responsible for (2S)-pinocembrin (2) biosynthesis from trans-cinnamic acid (5) (Fig. 2). These three enzymes are encoded by a multi-gene family, respectively. It will take much more time to isolate and further functionally characterize all of these genes by conventional molecular biology technologies. It is particularly important to develop a high-throughput method, allowing for drastically quicker and cheaper genes discovery, and leading towards a far more comprehensive view of biosynthetic pathway of (2S)-pinocembrin (2) biosynthesis. The advent of next-generation sequencing approach such as transcriptomic analysis provides a platform, which has been proved to be critical in speeding up of the identification of a large number of related genes of secondary products. In the present investigation, a tremendous amount of sequences was yielded by transcriptomic sequencing of O. caudatum. A few candidate genes, including Oc4CLs, OcCHSs and OcCHIs, that encode putative enzymes of (2S)-pinocembrin (2) biosynthetic pathway, were retrived based on the transcriptome analysis (Additional file 1: Table S1). Moreover, in order to quickly construct expression vectors used for heterologous expression of interest genes, an In-Fusion ® method based on In-Fusion ® enzyme was applied for plasmid construction, which can greatly improve the ligation efficiency of plasmid fragments. These candidate genes were then be functionally identified by combination of in vitro enzymatic reaction and multi-enzyme system based on metabolic engineering in our laboratory. By combination of these biotechnologies, functional characterizations of pathway enzymes of (2S)-pinocembrin (2) biosynthesis were performed in a rapid fashion, which provides a successful example for gene parts identification used for pathway reconstruction.
In the present investigation, seven full-length 4CL-like cDNA were obtained from O. caudatum by nested PCR.
The seven genes were thus cloned and the corresponding recombinant proteins (each with an N-terminal His 6 -tag) were expressed in E. coli (Additional file 2: Fig. S1, Additional file 3: Fig. S2). In each case, the precise physiological/enzymatic functions of the various 4CL-like members in the O. caudatum gene family were carried out using trans-cinnamic (5), p-coumaric (6), caffeic (7), ferulic (8), sinapic (9) and benzoic acids (15) as potential substrates. The products authenticity in the assay mixtures were verified unambiguously by HPLC analysis rather than by spectrophotometric assays. The data indicated that there was only one bona fide 4CL gene, Oc4CL1. The result is out of accord with the previous notion that 4CL is encoded by a small multi-gene family [48][49][50][51]. The reason why the recombinant Oc4CL2-7 proteins are not active is likely because they can not be actively expressed in E. coli. On the other hand, there may be several 4CL genes in O. caudatum genome, and we did not isolate all of them and identify enzymatic activity. These 4CLlike proteins were therefore carefully checked for their amino acid sequences (Additional file 10: Fig. S9). Protein sequence alignments of these 4CL-likes revealed the existence of a conserved box I motif (SSGTTGLPKGV), a signature for the superfamily of adenylate-forming enzymes including 4CLs, firefly luciferases, nonribosomal polypeptide synthetases and acyl-CoA synthetases [52,53]. The absolute conservation of another conserved box II motif (GEICIRG), however, seemed to be restricted to Oc4CL1 and Oc4CL6. The discrepancy of box II in Oc4CL2, 3, 4, 5 and 7 is indicative of their members of the superfamily of adenylate-forming enzymes without 4CL function. Oc4CL6 shared two highly conserved peptide motifs, box I and box II, with Oc4CL1. Oc4CL6, however, differs in four amino acids (Y238F, P278A, M305L and L341I) within a signature motif generally determining 4CL substrate specificity [52], indicative of its devoid of 4CL function (Additional file 10: Fig.  S9).
CHS is a well-studied ubiquitous plant-specific type III polyketide synthase (PKS) [54][55][56]. A number of active site residues, including Cys164, Phe215, Phe265, His303 and Asn336, are conserved in CHSs but vary in other type III PKSs [54][55][56]. These conserved amino acids played important roles in CHS reaction mechanism. For example, Phe265 separates the coumaroyl binding site from the cyclization pocket and may function as a mobile steric gate during successive rounds of polyketide elongation [56]. Single site substitution of these conserved sites is deemed to result in decreased, even no activity. In the present investigation, Phe265 of OcCHS2 was replaced by the Ile residue in OcCHS1 and OcCHS3, respectively. The substitution of Phe265, therefore, was postulated to be a good explanation of no CHS activity of OcCHS1 and OcCHS3 (Additional file 11: Fig. S10).
By combination of in vitro reaction and co-expression assay, we identified the enzymes related to (2S)pinocembrin (2) biosynthesis from one single species for the first time. Importantly, as a first step towards microbial scale-up production of (2S)-pinocembrin (2), combined expressions of these biosynthetic genes in E. coli were performed. As illustrated in Table 5, the co-expression of genes originating from single plant species resulted in low-level de novo production of (2S)pinocembrin (2). Also, it is clear that the combined use of pathway-encoding genes from the single plant origin does not guarantee the best production of flavonoids [57,58]. To optimize the (2S)-pinocembrin (2) production, several parameters have to be considered. First of all, to test the effect of gene organizations on microbial (2S)pinocembrin (2) production, two types of gene organizations were generated in two engineered strains. Results indicated only strain 2 can produce (2S)-pinocembrin (2) ( Table 5). No activity of strain 6 is likely to result from inappropriate construction of plasmid pET28a-Oc4CL1-OcCHS2. In this plasmid, Oc4CL1 and OcCHS2 were regulated by their respective expression cassettes. The distance between the two expression cassettes is 14 bp. The short distance was assumed to be the main reason of abnormal transcription or translation of Oc4CL1, or OcCHS2, or both, which was assumed to result in no activity of strain 6. Moreover, the expression levels could be estimated from the gene copy numbers of pathway enzymes. The copy numbers of pCDFDuet-1 (CDF origin) and pET-28a (+) (pBR322 origin) are 20 and 40, respectively. The imbalances within (2S)-pinocembrin (2) pathway may lead to under-production of pathway enzymes. In addition, we can not rule out the possibility of homologous recombination. Oc4CL1, OcCHS2 and MsCHI were under the control of T7 promoter and RBS (ribosome-binding sequence) in the plasmids pET28a-Oc4CL1-OcCHS2 and pCDF-MsCHI. When the two plasmids were co-transformed into E. coli, the resulting strain 6 contained the three repeats of the T7 promoter and RBS. A deletion of the repeats is possible to take place due to possibly homologous recombination. The productivity is still low although strain 2 was detected to produce (2S)-pinocembrin (2). We hypothesized that the low titer of (2S)-pinocembrin (2) production from recombinant E. coli is partially due to the low activity of pathway enzymes. Oc4CL1, OcCHS2 and MsCHI, therefore, were designed to optimize the codon usage for E. coli. The enhancements of (2S)-pinocembrin (2) titre was observed in all the strains containing E. coli-preferred genes with the exception of strain 11.
Unexpectedly, when co-expression of synthetic condonoptimized Oc4CL1, OcCHS2 and MsCHI was performed in strain 11, decreased yield in (2S)-pinocembrin (2) was observed (Table 5). Typically, codon-optimization of Oc4CL1, OcCHS2 and MsCHI may lead to their overexpression in the strain 11. Overproduction of the three heterologous proteins, however, usually imposes the metabolic burden on the strain and in turn results in the negative effect on cell physiology. Hence, it is supposed that lowered yield of (2S)-pinocembrin (2) in strain 11 should be caused by overproduction of heterologous Oc4CL1, OcCHS2 and MsCHI. Overall, an engineered strain, strain 8 with higher titre of 5.96 ± 0.24 mg/L (2S)pinocembrin (2) was selected for further improvement. At this stage, insufficient levels of the precursor malonyl CoA (17) could be limiting for the overall product titers. In order to find out whether the availability of malonyl CoA (17) was limiting, cultivations were performed in which cerulenin (18, up to 0.3 mM) was supplemented during the production phases. The exclusive supplementation of 0.2 mM cerulenin (18) drastically increased product titers up to 6.2-fold, reaching 36.92 ± 4.1 mg/L, which was comparable to that of the previous reports (Table 5) [18,19].
Although the yields of (2S)-pinocembrin (2) in E. coli were increased, there is still room for improvement. Common methods used to improve production from engineered biosynthetic pathways include, but is not limited to, enhancing production of pathway enzymes [19,20,37], yield enhancement of the intracellular pool of precursors [19,59] and balancing multi-gene expression to optimize flux [18,57,60]. It is well recognized that optimal protein yield may be achieved either by mutagenic experiments to create the desired attributes of an enzyme or through selection of variant enzymes deposited in public databases with differing kinetic properties. Typically, codon optimization had been proved to be a mutagenesis technique improving the efficiency of heterologous protein production in the present and previous studies [18,57,60]. Also, screening various target enzymes with desired attributes from the public databases can optimize engineered pathways. There are many well-characterized homologs of 4CL [48,61,62], CHS [63,64] and CHI [65] in publicly available sequence databases. These variants have differing kinetic properties. They may be chosen to investigate their in vivo performance for (2S)-pinocembrin (2) production in the context of the entire (2S)-pinocembrin (2) pathway. The best performing variants will be used as the ideal candidates for (2S)-pinocembrin (2) production.
In the expression of a multi-gene heterologous pathway, the activity of a single enzyme may be out of balance with that of the other enzymes in the pathway, leading to unbalanced carbon flux and the accumulation of an intermediate. Varied strategies, like modular metabolic strategy [18,60] and expression correlation analysis [57], may be employed to balance the overall pathway.
Besides, selection of appropriate hosts [60], alleviation of the metabolic burden [60] and optimization of fermentation conditions [60] should be taken into account since they may lead to robust improvement of (2S)pinocembrin (2) produced. Availability of such a powerful E. coli platform paves the way for scale-up production and eventual industrialization of (2S)-pinocembrin (2) production.

Conclusions
In the present study, we presented a full characterization of absolute configuration of microbial (2S)-pinocembrin (2), a chiral molecule with versatile pharmacological and biological activities. Also, we isolated and functionally identified gene parts used for pathway reconstruction of (2S)-pinocembrin (2) biosynthesis in E. coli based on transcriptome-wide sequencing in this investigation. The resulting engineered E. coli can produce 36.92 ± 4.1 mg/L (2S)-pinocembrin (2), which paves the way for yield increase and further pharmacological testing of chiral (2S)-pinocembrin (2).
The expression vector pET-28a (+) and pCDFDuet-1 were from Novagen (Madison, USA) and used for heterologous expression. The plasmids and strains used in this study are provided in Additional file 1: Table S2.

Plant materials
O. caudatum plants were grown under sterile conditions on 67-V medium [68] at a temperature of 22 °C and 16 h light/8 h dark cycle. The bulbs of O. caudatum were collected and used fresh or were frozen in liquid N 2 and stored at −80 °C for RNA isolation.

Transcriptome sequencing and analysis
The detailed procedure is the same as the previous reports by our laboratory [69][70][71]. Specifically, a (cDNA) sequencing library was prepared from the total RNA of O. caudatum using a mRNA-seq Sample Preparation Kit (Illumina) following the manufacturer's protocol. After that, the resultant cDNA library could be sequenced using Illumina HiSeq ™ 2000. Short nucleotide reads obtained via Illumina sequencing were assembled by the Trinity software to produce error-free, unique contiguous sequences (contigs). Then, these contigs were connected to acquire non-redundant unigenes, which could not be extended on either end.
After transcriptome sequencing of O. caudatum, the resulting unigenes were aligned by BLAST X to protein databases like nr, Swiss-Prot, KEGG and COG (e < 0.00001), and aligned by BLAST N to nucleotide databases nt (e < 0.00001), retrieving proteins with the highest sequence similarity with the given unigenes along with their protein functional annotations. The candidate unigenes which were assigned to (2S)-pinocembrin (2) biosynthesis pathway based on KEGG pathway analysis, that is 4CL-like (Oc4CLs), CHS-like (OcCHSs) and CHI-like homologs (OcCHIs), were retrieved for further studies.

cDNA isolation and functional characterization of 4CL gene family
Since the assembled sequences were products of de novo assemblies, they were considered prone to error. To confirm that the sequences represented true gene products, experimental verifications were performed by designing gene-specific primers for these full-length sequences encoding (2S)-pinocembrin (2) pathway enzymes and verifying the identity of amplified products by sequencing of the PCR amplimers. All the oligonucleotides used for DNA manipulation are described in the Additional file 1: Table S4.
Amplification of full-length cDNA synthesized from mRNA extracted from the sterile bulb tissues of O. caudatum was performed by a nested PCR method. The amplified products were inserted in pEASY ™ -T1 Simple vector for sequencing.
After sequence verifications, these full-length cDNAs were inserted into EcoRI/HindIII linearized pET-28a (+) using In-Fusion ® technology for heterologous expression as the procedures previously described [69][70][71]. In all cases, successful gene cloning was verified by digestion checks, and the absence of undesired mutations introduced during PCR was verified by direct nucleotide sequencing.
Induction of Oc4CL proteins expression was carried out at 27 °C for 8 h after addition of IPTG with a final concentration of 0.4 mM. His-tag recombinant Oc4CL proteins were subsequently purified using immobilized metal affinity chromatography system. Activity assay and biochemical properties analysis of the recombinant proteins were performed differentially. 4CL activity was determined by measuring the formation of the corresponding CoA thioesters from trans-cinnamic acid (5) and its derivatives by in vitro reaction. 100 µL crude protein extracts for Oc4CLs (derived from 1 mL of culture) was added to the reaction mixture containing 2.5 mM MgCl 2 , 2.5 mM ATP and 20 µM substrates (trans-cinnamic acid (5), p-coumaric acid (6), caffeic acid (7), ferulic acid (8), sinapic acid (9) and benzoic acid (15), respectively) in 200 mM Tris-HCl (pH 7.9) in a total volume of 1000 μL. The reaction was started by the addition of 0. 2 (7), ferulic acid (8) and sinapic acid (9), 270 nm for reaction product of trans-cinnamic acid (5) and 259 nm for benzoic acid (15).
LC-MS analysis was performed using an Agilent 1200 RRLC series HPLC system (Agilent Technologies, Waldbronn, Germany) coupled to the QTRAP MS spectrometer (QTRAP 2000, Applied Biosystems/MDS SCIEX) tandem mass spectrometer equipped with a Turbo Ion spray ion source (Concord, ON, Canada) which was controlled by Analyst 1.5. UV spectra were recorded from 190 to 400 nm. The mass spectrometer was operated in negative ion mode and spectra were collected in the enhanced full mass scan mode from m/z 100 to 1000.
NMR spectroscopic data were obtained at 500 MHz for 1 H NMR and 125 MHz for 13 C NMR using the solvent CDCl 3 on Bruker-500 spectrometers, respectively. Chemical shifts (δ) are given in ppm, coupling constants (J) are given in hertz (Hz).
To examine the biochemical properties and kinetic parameters of Oc4CL1, purified recombinant protein was used. The pH optimum was determined in a buffer of 200 mM Tris-HCl containing 20 μM varied substrates, 2.5 mM ATP, 25 mM MgCl 2 , and 0.02 mM CoA, in the pH range from 5.90 to 9.48 using 1.616 μg pure enzyme in a final volume of 200 μL. Samples were incubated at 30 °C for 2 min.
To determine the optimum temperature, assays were performed in the buffer of 200 mM Tris-HCl containing 20 μM diverse substrates, 2.5 mM ATP, 25 mM MgCl 2 , and 0.02 mM CoA at pH 7.9 for 2 min with various temperatures from 15~50 °C.
Kinetic analysis of Oc4CL1 was conducted by the standard assay with a range of concentrations of different substrates. The apparent K m (Michaelis-Menten constant) and the maximum rate of OC4CL1 (V max ) were determined graphically by the Lineweaver-Burk plot.

cDNA isolation and functional characterization of CHS gene family
The full-length cDNAs of candidate CHS genes were isolated from O. caudatum by nested PCR using the gene-specific primers (Additional file 1: Table S4). The resulting PCR products were cloned into pEASY ™ -T1 Simple vector to generate pEASYOcCHSs and verified by sequencing (Additional file 1: Table S2). After confirming the sequences fidelity, the three OcCHS genes were functionally identified either by in vitro reaction or by multienzyme-cooperative systems. In vitro enzymatic reaction is a simple and direct way to identify gene function. Specifically, three OcCHS genes were subcloned in frame with the polyhistidine tag into the BamHI/HindIII sites of pET-28a (+), giving three constructs, pET28aOc-CHS1~3. Heterologous expression, SDS-PAGE analysis and western-blot verification of the recombinant OcCHS proteins were performed using the same procedures as that of Oc4CLs. After induction by the addition of IPTG, 1 ml cells were harvested by centrifugation at 10,000g for 2 min at 4 °C. The resulting cell pellets were resuspended in 1 ml of 200 mM Tris-HCl (pH 7.9) and disrupted by sonication. Cell debris was removed by centrifugation at 12,000g for 5 min at 4 °C, and the resulting supernatant was used as crude protein extracts for in vitro activities of the recombinant OcCHS proteins. OcCHS activities were determined by measuring the formation of the corresponding chalcones from CoA thioesters. Enzyme activities were carried out at 30 °C for 30 min in 1 ml of 200 mM Tris-HCl (pH 7.9) containing 0.2 mM CoA thioesters and 20 μM malonyl-CoA. The reactions were terminated by adding 40 μl acetic acid and then extracted three times with 1.5 ml ethyl acetate. After vortexing and centrifugation (12,000g, 10 min), the top organic layer was separated and evaporate to dryness, and then the remaining residue was resolubilized with 250 μl methanol. The resulting methanol samples were then analyzed by HPLC and LC-MS using the same program as that for Oc4CLs. UV detection was performed at 341 nm. The function of OcCHSs was also characterized using multienzyme-cooperative systems owing to the unstability of pinocembrin chalcone (4), a product to CHS reaction. Specifically, The candidate OcCHSs were co-expressed with a Oc4CL1 and chalcone isomerase from Medicago sativa (MSCHI, GenBank accession number M91079) [27,31,72,73] in E. coli Transetta (DE3) to form a (2S)pinocembrin (2) biosynthetic pathway. The instable pincembrin chalcone (4) produced by CHS was then biotransformed into (2S)-pinocembrin (2), which was validated by HPLC analysis, under the action of MsCHI.
First of all, a synthetic MsCHI was inserted into BamHI/HindIII sites of pCDFDuet-1, resulting in pCDF-MsCHI. OcCHSs genes were PCR amplified from respective pET28a-derived plasmids and were then ligated into pCDF-MsCHI between NdeI and XhoI sites, yielding pCDF-OcCHSs-MsCHI (OcCHSs refer to OcCHS1, OcCHS2 and OcCHS3). Both OcCHS and MsCHI were separately placed under the control of the T7 promoter of pCDFDuet-1.