Modified ‘one amino acid-one codon’ engineering of high GC content TaqII-coding gene from thermophilic Thermus aquaticus results in radical expression increase

Background An industrial approach to protein production demands maximization of cloned gene expression, balanced with the recombinant host’s viability. Expression of toxic genes from thermophiles poses particular difficulties due to high GC content, mRNA secondary structures, rare codon usage and impairing the host’s coding plasmid replication. TaqII belongs to a family of bifunctional enzymes, which are a fusion of the restriction endonuclease (REase) and methyltransferase (MTase) activities in a single polypeptide. The family contains thermostable REases with distinct specificities: TspGWI, TaqII, Tth111II/TthHB27I, TspDTI and TsoI and a few enzymes found in mesophiles. While not being isoschizomers, the enzymes exhibit amino acid (aa) sequence homologies, having molecular sizes of ~120 kDa share common modular architecture, resemble Type-I enzymes, cleave DNA 11/9 nt from the recognition sites, their activity is affected by S-adenosylmethionine (SAM). Results We describe the taqIIRM gene design, cloning and expression of the prototype TaqII. The enzyme amount in natural hosts is extremely low. To improve expression of the taqIIRM gene in Escherichia coli (E. coli), we designed and cloned a fully synthetic, low GC content, low mRNA secondary structure taqIIRM, codon-optimized gene under a bacteriophage lambda (λ) P R promoter. Codon usage based on a modified ‘one amino acid–one codon’ strategy, weighted towards low GC content codons, resulted in approximately 10-fold higher expression of the synthetic gene. 718 codons of total 1105 were changed, comprising 65% of the taqIIRM gene. The reason for we choose a less effective strategy rather than a resulting in high expression yields ‘codon randomization’ strategy, was intentional, sub-optimal TaqII in vivo production, in order to decrease the high ‘toxicity’ of the REase-MTase protein. Conclusions Recombinant wt and synthetic taqIIRM gene were cloned and expressed in E. coli. The modified ‘one amino acid–one codon’ method tuned for thermophile-coded genes was applied to obtain overexpression of the ‘toxic’ taqIIRM gene. The method appears suited for industrial production of thermostable ‘toxic’ enzymes in E. coli. This novel variant of the method biased toward increasing a gene’s AT content may provide economic benefits for industrial applications.


Background
Thermophilic bacteria, which thrive at temperatures greater than 50°C, require special adaptation strategies at the genome, transcriptome and proteome levels. The pattern of synonymous codon usage within thermophilic prokaryotes is different from that within mesophilic ones [1][2][3][4][5][6]. This difference is the result of natural selection linked to thermophily [1,6]. Differences in codon usage between species adversely affect recombinant gene expression levels, thus gene optimization is often needed to obtain adequate expression levels, which is especially important for industrial enzyme production processes. Natural REase-coding genes found in wild-type (wt) organisms are often not highly expressed, due to the 'toxicity' of their protein product to their hosts, if not fully protected by cognate MTases. The subtle balance between both enzymatic activities, comprising the restrictionmodification (RM) system, can be affected by environmental conditions and lead to the cell's death, caused by genome damage. Moreover, this problem is much more pronounced in a recombinant host, harbouring the cloned RM system, due to the different coding gene regulatory circuits. Recent development in artificial gene synthesis has enabled the construction of synthetic genes [7][8][9][10], and thus made possible the rational design of artificial genes and their functional clusters, described as a 'synthetic biology' approach. Synthetic biology can be used to overcome problems of low gene expression in heterologous hosts, which is a crucial economical aspect in industrial gene expression. Although the gene expression is highly correlated with codon usage, the problem is not as simply defined or solved. A general preference for the use of codons of the highest frequency in the genome or in the highly expressed gene subset of the host is not necessarily a guarantee of improved expression [10,11].
To aid the gene design process, computational tools have been developed [12]. Typically, two strategies have been used for codon optimization. The first one, known as 'one amino acid-one codon' assigns the most abundant codon of the recombinant host or a set of selected genes to a given amino acid (aa) in the target sequence [13]. The second, 'codon randomization', uses translation tables, based on the frequency distribution of the codons in a genome or a subset of highly expressed genes. Each codon has an assigned weight or probability. As a result, a random mixture of codons assigned for a given aa is used to assemble the synthetic gene. In this case, as codons are assigned randomly, a vast number of possible gene variants can be obtained [13]. This allows for further nt sequence fine-tuning, without altering the final aa sequence. Many of the accessible sequence design software tools are focused on the frequency of Individual Codon Occurrences (ICU) as one of the most crucial factors affecting mRNA translational efficiency [14][15][16][17][18].
In addition to ICU, a significant influence of codon pair usage, also known as Codon Context (CC), at the level of gene expression has been reported in several studies and is suggested to be a result of potential tRNA-tRNA steric interaction within the ribosome [18]. For that reason, the CC was also incorporated into current gene design tools [18,19].
It is important to note that the codon usage optimization may not need to concern the whole gene to result in substantially increased gene expression. There is evidence suggesting that the initial 15-25 codons of the Open Reading Frame (ORF) deserve special consideration [11]. It was shown that the impact of rare codons on translation rate is particularly strong in these first codons for expression in both E. coli and Saccharomyces cerevisae [11]. This phenomenon is even more profound for the initiation codon. For example, replacing the native TTG initiation codon with an ATG codon resulted in high-level expression of the previously silent bspRIR gene in E. coli, which encodes BspRI REase [20].
Other known strategies for the improvement of recombinant gene expression include: (i) avoiding secondary mRNA structures in gene design; (ii) displacing mRNA structure from the initiation region or improving the physical integrity of the protein by the addition of N-terminal fusion tags [11]; and (iii) targeted and global bacterial genetic/strain engineering to enhance recombinant protein production [21].
In this study we describe a successful strategy for cloning and expression of a 'toxic', fully synthetic taqIIRM gene, designed for a significant improvement of biologically active recombinant prototype TaqII REase-MTase production in E. coli. Using the 'one amino acid-one codon' strategy, we intentionally avoided excessively high expression, which would be detrimental to recombinant cells, due to the protein's high 'toxicity'. This variant of the 'one amino acid-one codon' strategy is biased towards a low AT content and is suitable for other thermostable REases. We also anticipate its usefulness for non-REase-related genes, originating from thermophiles, including those coding for industrial enzymes.

Results and discussion
Design and cloning of a synthetic taqIIRM gene and comparison to wt taqIIRM gene from Thermus aquaticus (T. aquaticus) The taqIIRM gene was sequenced de novo by a combination of PCR products, obtained using the T. aquaticus genomic template, a proofreading DNA polymerase and direct genomic dideoxy and NGS sequencing approaches. The obtained extended sequence contig contained previously published taqIIRM gene sequence data (without expression analysis) [GenBank: AY057443, AAL23675.1] [30], with an error corrected, located outside the taqIIRM ORF, coding for a 125.7 protein. Furthermore, the gene is preceded by a sub-optimal ribosome-binding-site 5′-GGAG-3′, located 6 bp upstream of the ORF start codon [GenBank: KF92665]. Subsequently, the wt gene was converted to a novel artificial gene, which radically departs from the wt taqIIRM nucleotide sequence, while maintaining the same aa sequence (Figure 1) [GenBank: KF894945]. Here we show the designing of a synthetic 3315 bp taqIIRM gene (syn-taqIIRM), cloning, expression and isolation of the recombinant enzyme. A total of 718 out of 1105 codons were changed, thus comprising a massive 65% portion of the ORF. For comparative purposes, we also cloned de novo and expressed the wt gene (wt-taqIIRM), PCR amplified from T. aquaticus genomic DNA. Analysis of the wt-taqIIRM gene (66.3% GC) [GenBank: KF92665] revealed that at least 56.4% of codons are not the preferred for highly expressed E. coli genes (Table 1). Due to the previously observed low expression of the Thermus sp. family genes in E. coli [26,27], we assumed that the codon optimization coupled with mRNA secondary structure reduction and a generally decreased GC content of the taqIIRM gene, leading to relaxing of the DNA-RNA duplexes and RNA-RNA secondary structures, might result in an increase of TaqII protein synthesis. Therefore, a synthetic variant of the taqIIRM gene (with only 76.5% nt sequence identity to the wt gene) was designed using a modified 'one amino acidone codon' method [GenBank: KF894945] [11,13]. Figure 1 shows wt-TaqII and syn-TaqII nt sequences as well as functional domains and motifs that we have previously determined by bioinformatics analysis [26] and further confirmed experimentally [manuscript in preparation]. Consequently, bioinformatic prediction of secondary structures (Mfold Web Server [31,32]) of the first 200 nt of mRNA's, coding for wt-taqIIRM and syn-taqIIRM genes ( Figure 2), has revealed that the ATG start codon and RBS are much more exposed in mRNA transcribed from the optimized gene ( Figure 2B) than from the wt gene ( Figure 2A). In wt mRNA the translation signals are hidden in a double stranded (ds) RNA helix with substantial stability (revised free energy: dG = −84.5 kcal/mol). On the contrary, ATG and RBS of syn-taqIIRM mRNA are located on a single-stranded (ss) region and the mRNA ds structure has substantially higher flexibility, as it exhibits revised free energy dG = −63.33 kcal/mol.
For the 'one amino acid-one codon' approach, the most preferred codon in the highly expressed E. coli genes was selected for every aa (Table 1; Figure 1). A single exception was made in the case of the serine codon: from two nearly identically frequent codons, UCC and UCU, the latter was selected as it has a lower GC content, even though it is used at slightly lower rate as UCC in highly expressed E. coli genes (Table 1; Figure 1). It was hypothesized that such an approach might result in a lower level of expression than the maximum obtainable with the use of the set of most frequent codons, specific for each as a random, weighted mixture. It was shown experimentally that a 'codon randomization' method approach leads to higher gene expression by preventing depletion of the aminoacyl-tRNAs pool and consequently slowing down translation, stalling ribosomes or prematurely terminating translation [11,13]. As codons are assigned randomly, this method allows for the generation of countless gene variants [13]. This allows for further nt sequence fine-tuning, without altering the final aa sequence. Thus, further removal of mRNA secondary structures, considering ICU, CC factors is possible.
However, sub-optimal gene optimization, using the 'one amino acid-one codon' strategy over the 'codon randomization' strategy, may be beneficial in some cases by reducing metabolic stress imposed on the recombinant host, which has to repair cellular damages caused by overproduction of 'toxic' heterologous proteins. Excessive expression of such proteins would result in poor recombinant host growth, activity-less mutations appearing in the cloned gene and a natural selection for mutantcarrying bacteria during cultivation, cell fragility and spontaneous lysis, among others. Another, more subtle effect might be associated with co-translational folding, where the availability of isoacceptor tRNA molecules regulates folding kinetics. Thus, the obtained expressed proteins may vary in properties, depending on whether they were synthesised basing on the fastest possible translation constructs or moderately boosted genes. TaqII, originating from a thermophile, is very large for a Prokaryotic protein (125.7 kDa) and contains functional (and perhaps physical) domains. For that reason folding kinetics may play a role in the final active state of the recombinant protein variants. As a result of the factors listed above, the final recombinant protein yield for production purposes may actually be lower and less predictable with the use of maximum expression constructs, than while using moderately expression-boosted, but stable, recombinant constructs. Thus, our motivation behind using the 'one amino acid-one codon' strategy for the syn-taqIIRM gene construction was to stabilize recombinant constructs by preventing excessively high expression of the TaqII REasecoding gene, 'toxic' for a bacterial host. To reduce taqIIRM gene 'toxicity', we used a strictly controlled λ P R promoter and a very low permissive cultivation temperature of 28°C, which not only kept the λ P R promoter silent, but also further decreased the activity of any thermostable TaqII molecules, originating from residual expression under permissive conditions. Despite strict promoter control we still observed increased fragility of recombinant E. coli cells, expressing the taqIIRM gene. This is a general phenomenon, which we have also observed in the case of other cloned, Thermus sp. family REases.
The codon-optimized synthetic gene was generated by a commercial service using ss 5′-phosphorylated, overlapping complementary primers, subjected to ligation. Finally, the fully assembled gene was amplified with a proofreading DNA polymerase. The resulting synthetic gene (55.9% GC) was further enriched with two DNA fragments, overlapping the sequence of a modified pRZ4737 vector DNA (Table 2; sequence written in small letters). For that purpose, two oligodeoxyrybonucleotides (oligos) were used ( Table 2) and an additional PCR reaction with a proofreading DNA polymerase was performed (see Methods section). Finally, the gene was assembled with the complementary modified pRZ4737 vector linear backbone, with gene expression driven by a λ P R promoter, inducible by a temperature shift to 42°C. The DNA assembly was performed using a 'one-step DNA fragment assembly and circularization' method, without DNA ligation needed [34] (Figure 3). The expression temperature of 42°C was selected to ensure adequate folding of the thermostable TaqII protein. As a control, a wt taqIIRM gene was cloned to the modified pRZ4737 using the same cloning strategy (see Methods section).
Improved expression of the thermophile-based synthetic recombinant taqIIRM gene in mesophilic E. coli Similar to other genes from the investigated Thermus sp. family, low expression of the native taqIIRM gene in the T. aquaticus results in a very small yield of active TaqII protein (lower than 0.2 mg/L culture) ( Figure 4C,D). Moreover, the native TaqII protein isolation from T. aquaticus is impaired by the presence of vast amounts of nonspecific nucleases and another REase -TaqI -as well as abundant amounts of pigments and other cellular components, which strongly interfere with chromatographic separations and enzymatic assays [35]. To improve the expression of the gene and to increase the protein yield, two taqIIRM gene variants (wt and synthetic) were cloned and expressed in E. coli ( Figure 4A,B,D; Figure 5). Initial wt taqIIRM cloning [Genbank: AY057443, AAL23675.1] was conducted using a different strategy than presented in this paper and is to be published elsewhere. The amount of TaqII protein produced by the expression of each gene variant was quantified by densitometry of the stained SDS/ PAGE gels and is shown in Figure 4D. Consistent with the results obtained from gel scanning quantification, the yields of protein for the synthetic and wt gene were 178 mg/l and 18 mg/l, respectively, thus reaching on average app.10-fold expression increase. We have obtained such expression levels in several experiments. The TaqII protein yields are relatively high, even though it is a 'toxic' protein.
However, being a thermostable enzyme, it exhibits decreased activity at lower temperatures used for recombinant E. coli cultivation. The high TaqII yields are also attributed to the development of a rapid and efficient purification protocol as well as to the bacteria cultivation conditions, which include overnight growth with vigorous aeration after induction. As a result, high cell densities are obtained, leading to an increased bacterial mass per litre of culture. The presence of recombinant TaqII both in entire cells and in the soluble fraction was confirmed using enzymatic activity assays as well as SDS/PAGE and has shown that the enzyme is fully soluble (not shown). The high expression boost findings are in contrast to the report (See figure on previous page.) Figure 1 Differences in DNA sequences of the synthetic and wt recombinant taqIIRM genes. The predicted aa sequence of the 125.7 kDa TaqII protein is indicated in capital letters. The DNA sequence of the wt-taqIIRM gene is indicated in blue italics. The DNA sequence of the syn-taqIIRM gene is shown in black bold letters and the changed bases are marked in red. The crucial amino acids of the catalytic centres are dark red, bold and underlined. The functional protein domains are marked as follows: REase domain in blue, helical domain in light green, MTase domain in dark green and the potential TRD region in brown. Numbering of nt of taqIIRM gene variants and polypeptide aa starts as '1' with the beginning (ATG) of taqIIRM ORF. Fraction of relative occurrences of the codon in its synonymous codon family [33]. Bold underlinedcodons selected for syn-TaqII construction. Bold italics underlined -Ser codon, which is the most frequently used in E. coli.  [13] that showed a relatively small expression increase with the use of the 'one amino acid-one codon' gene optimization method, explained by depletion of the tRNAs variants, assigned for single codon types. Moreover, such cell deprivation also induces translation errors, thus decreasing protein-specific activity. Here we show that the 'one amino acid-one codon' combined with weighting toward low GC content codons (in this case, serine codons), allows for a significant expression increase of a thermophile gene in the recombinant host. Even though no comparison was made between the two equivalent variants (using alternatively UCC or UCU serine codons) of the synthetic gene, we hyphothesize that the achieved high expression points to the fact that using less frequent codons, but with a lower AT content is not detrimental to the high expression of a synthetic gene. Thus, modifications of this method, namely further biasing towards other aa variants with similar codon usage as most frequently used codons, may be an interesting avenue for future exploration. Besides codon optimization, the GC content was significantly decreased by 10.4%. Any further GC content decrease was limited by the aa sequence of the TaqII protein.
Together with the post-optimization sequence scanning for mRNA secondary structures (Figure 2), codon clusters and the local codon environment, the final synthetic gene has become 'E. coli friendly' with the preferred codons content and ATG start codon as well as RBS exposed in a ss mRNA segment, allowing for a one-order of magnitude increase in taqIIRM expression, as detected by the cellular protein enzymatic assays and SDS/PAGE. The method was devised for 'toxic' REasecoding genes in particularhowever, it seems well suited for general industrial thermostable enzyme production, including those 'toxic' to their recombinant hosts via different mechanisms than REases. As expression results reported in literature vary greatly for different genes being optimized, the issue is complicated and, apparently, multiple factors, not always defined, affect the final protein yield outcome. Our results are meant to be an experimental data contribution to the discussion, which may become useful to solve thermophile gene-derived expression problems. Besides the anticipated, more general usefulness of the modified, AT-content biased gene design method, the major novelty of the presented work is also attributed with the optimization target chosen -the sub-Type IIS/IIC/IIG TaqII thermostable REase. The enzyme is a new tool for DNA manipulation purposes, as it exhibits a prototype DNA-cleavage specificity. We present for the first time the taqIIRM cloning method, as only the wt taqIIRM nt sequence has been previously deposited in GenBank [30]. Moreover, we have recently published [28] a new method for quasi-random genomic libraries generation, by the development of chemically-induced TaqII REase specificity relaxation from 6-bp to a combined 2.9-bp cognate site. This was achieved by including the enzyme's cofactor analogue into the DNA digestion reaction. Thus, we anticipate an increased interest in practical usage of the enzyme in DNA cloning technologies.

Enzymatic properties of recombinant TaqII enzyme variants
The recombinant TaqII protein (syn-TaqII), isolated from the recombinant E. coli strain harbouring the pRZ-syn-taqIIRM expression plasmid ( Figure 5, lane 7), was used for the study of the enzyme biochemical properties, reaction conditions, cofactors and their analogues that influence DNA cleavage and/or the methylation activity.
The purification scheme included mid-scale isolation, app. 50 g cells, which were suspended in a buffer with pH and salt concentrations stabilizing the enzyme (not shown). In addition, glycerol and non-ionic detergents were added to block hydrophobic patches on the TaqII protein surface and prevent the protein from denaturation, aggregation and adhesion. After ultrasonic disruption and centrifugation of cell debris, the crude lysate was subjected to a heating step at 65°C ( Figure 5, lane 2). This stage was critical to remove most mesophilic E. coli host proteins and inactivate non-specific nucleases. The thermal inactivation step was important to obtain a DNA degradation-free purified enzyme preparation, thus suitable for practical applications in molecular cloning methodology as a new prototype specificity. Further precipitation steps included polyethyleneimine (PEI) removal of nucleic acids and residual acidic proteins ( Figure 5, lane 3), followed by fractionated precipitation with AmS ( Figure 5, lane 4). The above three precipitation methods used, each based on a different principle, were sufficient to obtain an enzyme yielding high quality DNA digests, although it was not a homogeneous protein ( Figure 5, lane 4). Further purification included ion exchange on cationite phosphocellulose P11 ( Figure 5, lane 5), which also served as a semi-affinity medium, due to the presence of phosphate groups, followed by anionite ion exchange on DEAE-cellulose ( Figure 5, lane 6). The nearly homogeneous preparation was then subjected to molecular sieving to remove any trace contaminants, taking advantage of the high molecular weight of TaqII ( Figure 5, lane 7). Both recombinant TaqII protein variants were also subjected to analytical molecular sieving in a buffer with a composition close to the physiological conditions containing 3 mM MgCl 2 (in the absence of DNA). The experiment revealed that the molecular size of both variants is in the range 110-130 kDa, indicating that under physiological conditions, the proteins exist as monomers, identical to the previously described native enzyme [24]. Moreover, the apparent molecular size of the recombinant protein variants under denaturing conditions was found to be sligthly over 120 kDa, very similar to the molecular mass of TaqII isolated from T. aquaticus, which was analysed with the use of different molecular size markers [24].
As expected, the recombinant TaqII maintains the absolute requirement for Mg 2+ for cleavage activity. The temperature activity range of the recombinant TaqII REase extends from 40°C to 85°C, with the maximum observed at 70-80°C ( Figure 6). Remarkably, the upper activity limit extends well beyond the T. aquaticus growth range by approximately 10°C. This indicates that different cellular components are becoming limiting factors for cell survival at different temperatures, thus no simple 'thermostability' explanation can be given in a thermophile characterization. Those findings are in contrast to our previous observations regarding another member of the Thermus sp. family -TsoI, exhibits remarkably lower thermostability, by app. 10-15°C than optimum growth temperature of TsoI-coding Thermus scotoductus bacteria. As RM systems exhibit a tendency towards horizontal transfers between species, a higher than expected temperature maximum of TaqII and lower than expected temperature maximum of TsoI may indicate that these enzymes have been acquired in the past from more thermophilic or more mesophilic bacteria, respectively. Finally, such high TaqII thermostability may be of practical use in DNA manipulation methodologies. Incubation at 37°C resulted in no detectable REase activity under our assay conditions (data not shown). Recombinant TaqII is inactivated at temperatures above 90°C.

Conclusions
The novelty of the presented work includes: i. Design of entirely synthetic, low GC content and mRNA secondary structures, long 3315 bp taqIIRM gene with optimized codons to enhance its expression in E. coli; ii. Cloning of the sub-Type IIS/IIC/IIG thermostable REase syn-TaqII -a new tool for DNA manipulation purposes, which includes the use of TaqII prototype REase specificity for DNA cleavage as well as for specialized applications in quasi-random genomic libraries generation [28]; iii. Expression of optimized synthetic taqIIRM gene in E. coli under the control of λ P R promoter that has resulted in an approximately 10-fold increase as compared to the cloned, native taqIIRM gene; iv. Development of rapid and efficient TaqII purification protocols and the recombinant enzyme's characterization; v. Displaying evidence that in contrast to other reports [13], the modified 'one amino acid-one codon' method allows for a significant increase of REase-coding gene expression in recombinant E. coli, which can be suited more generally for the industrial production of other thermostable enzymes.

Methods
Bacterial strains, plasmids, media and reagents   markers were from Thermo Fisher Scientific/Fermentas (Vilnius, Lithuania). The cloning vector pRZ4737 (Cm R , P15A ori, f1 ori, P R promoter) was from Bill Resnikoff [39]. T7 DNA was from Vivantis Technologies (Shah Alam, Malaysia). The DNA sequencing and PCR primer synthesis were performed at Vivantis Technologies and Genomed (Warsaw, Poland). All other reagents were purchased from Sigma-Aldrich (St. Louis, MO, USA).
Sequencing, synthesis, amplification and cloning of wt-taqIIRM and syn-taqIIRM genes Construction of the synthetic taqIIRM gene with low GC content The taqIIRM gene nt sequence was obtained by a combination of sequencing of PCR products, prepared using a T. aquaticus genomic template and a proofreading DNA polymerase as well as direct genomic dideoxy and NGS sequencing approaches. Multiple runs of both strands were performed to ensure error-free determination of the high GC content in T. aquaticus DNA. Sequencing was performed through commercial services (Vivantis Technologies and Genomed). The codon-optimized synthetic gene was created using single strand (ss) 5′-overlapping complementary oligos with a length ranging from 40 to 60 nt. Both the top and bottom strand were covered with the phosphorylated ss oligos, subjected to ligation and PCR amplified using a proofreading DNA polymerase. The gene synthesis procedure was conducted by a commercial service at Vivantis Technologies.

Cloning of wt-taqIIRM and syn-taqIIRM genes
The approach to obtain overexpression of the TaqII bifunctional enzyme employed the modified vector pRZ4737, originally obtained from Bill Resnikoff [39] and further modified. The vector is a derivative of the pACYC184 plasmid [40], carrying a λ DNA section, containing the P R promoter under the control of the CI repressor. The cI gene was located on the pRZ4737 backbone, allowing for hostindependent expression in E. coli. For gene cloning a 'one-step DNA fragment assembly and circularization' method was used [34]. The method recruits a thermostable DNA polymerase for the precise assembly of DNA overlapping fragments into circular constructs, under a low cycle number regime to minimize mutations. A linear vector backbone and the genes to be cloned were PCR amplified with proofreading Taq DNA polymerase blend using suitable oligos. DNA sequences of the primers used are in Table 2.

Linear vector backbone amplification
The PCR fragment, comprising the vector backbone was amplified from the modified pRZ4737 plasmid DNA [39,40], using FpRZ and RpRZ primers ( Table 2). The PCR reaction was performed in 50 μl samples in a thermocycler (Biometra) and contained: 1× Marathon PCR Buffer, 0.1 mM of each dNTP, 0.5 μM of each primer, 1 ng of circular pRZ4737, and 0.25 units of proofreading DNA polymerase (Marathon DNA Polymerase). The PCR cycling profile for the linear vector backbone amplification was as follows: 94°C for 3 minutes (min), 80°C for 20 seconds (sec) (addition of DNA polymerase), 94°C for 30 sec, 67°C for 30 sec, and 72°C for 5 min (for 35 cycles); 72°C for 4 min.

PCR amplification of the wt-taqIIRM and syn-taqIIRM genes
The wt-taqIIRM gene was amplified from the T. aquaticus genomic DNA, using a PCR primer pair FTaq and RTaq, which introduced the following restriction sites: BspHI and SalI (after the TGA stop codon), respectively ( Table 2).
The syn-taqIIRM gene was amplified from the original commercial fully synthetic gene DNA ( Figure 1) using PCR primer pairs FsynTaq and RsynTaq, which introduced the restriction sites BspHI and SalI (after the TAG stop codon), respectively ( Table 2). The 5' ends of all the primers were complementary to the pRZ4737 DNA sequence (Table 2; DNA sequence fragments small letters).
The PCR reactions were performed in 50 μl samples in a thermocycler (Applied Biosystems) and contained: 1× Marathon PCR Buffer, 0.1 mM of each dNTP, 0.5 μM of each primer, either 0.5 ng syn-taqIIRM template DNA or 100 ng T. aquaticus genomic DNA, 3% DMSO and 0.2 units of DNA polymerase (Marathon DNA Polymerase). The PCR cycling profile for both the syn-taqIIRM and wt-taqIIRM gene amplification was as follows: 94°C for 3 min, 80°C for 20 sec (addition of DNA polymerase), 94°C for 30 sec, 67°C for 30 sec, and 72°C for 3.5 min (for 35 cycles); 72°C for 2 min.
Assembly of DNA fragments DNA assembly and circularization was performed on non-purified PCR amplification products by high-fidelity PCR, in a single step. Each 50 μl sample contained 1× Marathon PCR Buffer, 0.1 mM of each dNTP, 100 ng of crude reaction product mix containing the linear vector backbone, 100 ng of crude reaction product mix including either the wt-taqIIRM or syn-taqIIRM gene, and 0.2 unit of Marathon DNA Polymerase. The molar ratios of insert to vector were 1.4 : 1.
The PCR cycling profile, optimized for DNA assembly, was as follows: 95°C for 3 min, 80°C for 20 sec (addition of DNA polymerase), 94°C for 30 sec, 58.5°C for 30 sec, and 72°C for 5 min (for 35 cycles); 72°C for 4 min. As the primers included complementary directional overhangs, the corresponding head and tail sequences of the vector and gene were annealed and assembled into plasmid pRZ-taqIIRM ( Figure 3). After the assembly reaction, the methylated template pRZ4737 was subjected to DpnI digestion. The final DNA construct was phenolchloroform extracted and ethanol precipitated. The resulting DNA was used to transform E. coli DH5α competent cells. After electroporation the bacteria were plated onto 2xYT medium supplemented with chloramphenicol (40 μg/ml) and 0.2% maltose at 28°C.

Selection of positive bacterial clones
Both SalI cleavage of plasmid DNA and direct PCR from a single bacterial colony were used for the screening of positives clones. After a preliminary analysis, plasmid DNA isolated from the selected bacterial clones was subjected to DNA sequencing. The promoter regions and the taqIIRM gene sequences (either wt or synthetic) of the recombinant plasmids were also confirmed.
Expression of the recombinant wt and synthetic taqIIRM genes under P R promoter in E. coli The resulting positive clones were subjected to protein expression experiments. E. coli BL21(DE3) were electroporated either with pRZ-wt-taqIIRM or pRZ-syn-taqIIRM and mini-scale expression was performed by cultivation in 50 ml TB media supplemented with chloramphenicol and maltose at 28°C with vigorous aeration, followed by P R promoter induction by a temperature shift to 42°C, when OD 600 reached 0.9. The immediate temperature shift was obtained by the addition of 50 ml fresh TB medium, heated previously to 65°C. The cultivation temperature of 28°C was used to minimize residual TaqII REase activity, minimizing its toxicity for a bacterial host. It was anticipated that the temperature shift to 42°C promotes folding of the thermostable enzyme to its biologically active form. The culture growth was continued for 19 hours (h) at 42°C. Bacterial pellets from both the control, non-induced and induced cultures were subjected to SDS/PAGE electrophoresis. The gels were analysed for the appearance of the expected band size of~120 kDa [24]-125.7 kDa (this work) and for TaqII REase activity in crude lysates. The bacterial clones, efficiently expressing taqIIRM gene variants, were selected for a large-scale bacterial culture in a biofermentor.

Purification of the recombinant TaqII enzyme
The recombinant TaqII purification procedure was common for both recombinant wt and synthetic genederived TaqII, and employed a simplified and modified protocol, which included some stages used for the native enzyme from T. aquaticus [35]. For large-scale protein purification, expression of both taqIIRM gene variants in E. coli BL21(DE3) [pRZ-wt-taqIIRM and pRZ-syn-taqIIRM] was initiated with bacteria inoculum washed out from a Petri dish into 1 L of rich TB media, supplemented with chloramphenicol at 28°C and 0.2% maltose. The culture was grown in a biofermentor Bioflo 115 (New Brunswick Scientific, Edison, NJ, USA) with vigorous aeration until OD 600 reached 0.9, and then the λ promoter P R was induced by a temperature shift to 42°C. The immediate temperature shift was obtained by the addition of fresh TB medium, heated previously to 65°C. After induction, the culture was supplemented with chloramphenicol and glucose to the final concentration of 0.2%. The induced bacteria were further cultivated at 42°C for 19 hours at 42°C. Having achieved an OD 600 of 4.0, the culture was cooled down to 4°C and the cells were recovered by centrifugation. The yield was 48 g from 10 L of bacterial culture.
The purification scheme varied from the scheme described previously for native TaqII enzyme [35], and included the following stages ( Figure 5):

REase and MTase assays
For REase assays, the reactions were performed in 50 μl of 'TaqII REase buffer' (40 mM Tris-HCl pH 8.0 at 65°C; 1 mM DTT, 10 mM MgCl 2 , 10 mM AmS, bovine serum albumin (BSA) 100 μg/ml), supplemented with 100 μM SIN and DNA substrates. SIN was used as it is stable and highly stimulatory to TaqII REase. Addition of SIN simplified detection of the enzyme presence in the column fractions, which contained the enzyme inhibitory concentrations of salts and buffers, as well as boosted this inherently very 'slow' enzyme to allow more precise analysis. One unit of the TaqII REase is defined for the purpose of this work as the amount of enzyme required to hydrolyse 1 μg of bacteriophage lambda DNA in 1 h at 65°C in 50 μl of TaqII REase buffer, enriched with 50 μM SIN, resulting in a stable partial DNA cleavage pattern.
The recombinant TaqII REase activity was investigated as described above at a temperature range from 40°C to 90°C. The pH of all the reaction buffers was determined at the appropriate reaction temperature.
The potential allosteric effectors were tested for stimulation of TaqII REase activity, using the TaqII REase assay described above. The incubation time was reduced to 30 min to obtain reaction conditions for partial DNA cleavage. The reactions were performed at 65°C in 50 μl of 'TaqII REase buffer' supplemented with 50 μM of SAM, SIN, SAH or ATP, respectively. A 390 bp PCR DNA fragment (containing two convergent TaqII sites 5′-GACCGA-3′ and ′CACCCA-3′) [23] was used as a DNA substrate. The reaction products were resolved on 15% poliacrylamide gel in TBE buffer and stained with Sybr Green I.
The in vitro modification activity of TaqII enzyme was tested by the DNA protection assay. The 390 bp PCR DNA fragment (containing single TaqII site 5′-GACCGA-3′) [23] was used as a substrate in 50 μl of TaqII MTase buffer (10 mM Tris-HCl pH 8.5 at 65°C; 1 mM DTT; 200 μM SAM) supplemented either with 10 mM CaCl 2 or with 10 mM EDTA. After the addition of the TaqII protein, the reaction mixture was incubated for 16 h at 65°C. Proteinase K was added to the solution and the incubation was continued for additional 60 min at 55°C. Samples were purified to remove all traces of proteins and divalent cations from the methylation reaction mixture and the resulting DNA was challenged with an excess of TaqII (2:1 molar ratio of enzyme to recognition sites) for 1 h in 50 μl of TaqII REase buffer supplemented with 10 mM MgCl 2 at 65°C. The reaction products were then resolved by agarose gel electrophoresis and TaqII MTase activity was assessed.
Gel electrophoresis and protein concentration determination DNA electrophoresis 1.5% agarose gels were prepared in TBE buffer [38]. The gels were visualized after staining with ethidium bromide using a 312 nm UV transilluminator. 15% polyacrylamide gels were prepared in 1x TBE buffer [38]. The gels were visualized after staining with SYBR Green I using a 312 nm UV transilluminator and photographed with a SYBR Green gel stain photographic filter.

Protein electrophoresis
SDS-PAGE electrophoresis of the proteins was in 10% polyacrylamide gels [38]. For the calibration curve, SDS-PAGE electrophoresis of various BSA concentrations was performed. Quantitative comparison of the resulting protein bands was made using UN-SCAN IT GEL for Windows 6.1 data software (v. 6.1, Gel Analysing and Graph Digitizing Software, Silk Scientific Corporation, Orem, Utah, USA). The calibration curve was used for the determination of the investigated TaqII protein variants concentration.
protein variants. KS isolated the recombinant wt-TaqII enzyme. JJF verified and corrected the wt taqIIRM contig sequence. PMS conceived the idea of taqIIRM synthetic gene cloning, designed its sequence, co-coordinated execution of the experiments, participated in the design and interpretation of experiments and co-drafted the manuscript. All authors read and approved the final manuscript.