Instability of pCTXVP60 clones propagated in wt E. coli MG1655
In an earlier study [24], it was found that an expression construct of a fusion gene, coding for adjuvant CTX B subunit fused to VP60, could not be stably propagated in MG1655. Upon transformation by plasmid pCTXVP60, heterogeneous colonies of transformants were obtained on agar plates (Fig. 1A). While the majority of the colonies were small, slow-growing, and translucent, a few percent showed normal growth and appearance, as compared to control pCTX clones (not shown). Upon prolonged incubation, eventually all small colonies started to develop sectors of normal growth. A preliminary analysis of the cells from normal, healthy colonies showed that the plasmids recovered had altered restriction digestion patterns, primarily due to IS element insertions in the fusion gene [24]. An extended analysis of 100 colonies by PCR spanning the fusion gene confirmed the extreme instability of the plasmid. A vast majority, 92%, of the large colonies carried plasmids with IS insertions, while 8% displayed deletions or no alteration detectable by PCR. Insertion events were due to IS1, IS3, and IS5 translocations, as detected by using IS-specific PCR primers. Insertion sites were determined by sequencing 12 clones. All IS insertions occurred in the 5' third of the fusion gene, in the ctx part or near the 5' end of vp60. Results of the analysis are summarized in Fig. 2.
Similar heterogeneity of pCTXVP60 transformants was observed in other regular E. coli hosts, including DH5α, DH10B or C600. In contrast, transformants of IS-less MDS42 [24] displayed a nearly uniform morphology (Fig. 1B) on LB plates. Colonies generally displayed moderately retarded growth, and only < 0.1% of the colonies was larger, showing normal growth. Generally, plasmid DNA recovered from cultures using MDS42 host yielded unaltered restriction digestion pattern and nucleotide sequence (data not shown).
Growth retardation effect of pCTXVP60 is due to an artificial byproduct, a Leu-rich ORF
The synthetic fusion gene ctxvp60 contains a large number of rare Arg ( 1 AGG, 21 AGA) codons. It was assumed that translation of ctxvp60 exerts a toxic effect on the cell by exhausting the rare tRNAArg pool and thus rendering it unavailable for the synthesis of essential proteins. To test this hypothesis, two new versions of ctxvp60 were synthesized. Version ctxvp60 opt was codon-optimized for E. coli and carried no rare codons, while ctxvp60 dezopt contained the rare AGG codons for all the 22 arginines (Additional files 2 and 3). Both versions were cloned in pSG1144 (Additional file 1), using MG1655-T7 host. Surprisingly, induced expression of both constructs, optimized and deoptimized, had only a minor negative effect on the growth of the cell, comparable to that seen with the vector plasmid (Fig. 3).
On the other hand, a frame-shift mutation of ctxvp60, introduced in a position near the 5' end of the vp60 part (Fig. 2; Additional file 1), did not relieve the growth retardation effect of the plasmid (data not shown). All these results indicate that neither the CTXVP60 protein per se, nor the supposed rare tRNA depletion phenomenon is the cause of the toxic effect of pCTXVP60.
A thorough inspection of the original ctxvp60 gene revealed the presence of a 238 aa ORF (ORF238) out of the original reading frame. ORF238 spans the joint between ctx and vp60 and extends into the 5' part of vp60 (for coordinates, see Fig. 2). Translation of ORF238 results in an extremely Leu-rich (102 Leu residues) protein (Additional file 4). The hydrophobic nature of the protein and its four putative transmembrane domains (predicted with HMMTOP [29] and DAS-TMfilter [30, 31]) strongly suggest that it integrates in the cell membranes, and might be the cause of the toxic effect (Additional file 4).
To test the hypothesis that ORF238 was the culprit of the growth defect, it was cloned as an inducible construct in plasmid pSG1144. Upon induction, expression of ORF238 resulted in a phenotype identical to that of the original ctxvp60 cloned in the same plasmid. Growth retardation of the culture, due to induction of ORF238, is demonstrated in Fig. 3.
Confocal laser scanning microscopy imaging of the cells expressing either CTXVP60 or ORF238 confirmed its inhibitory role. Nucleic acid staining (Ethidium bromide or DAPI) combined with differential interference contrast (DIC) imaging and membrane staining (nonyl acridine orange) revealed that cell division is impaired, resulting in slow-growing, abnormally long cells with aberrant nucleoids (Fig. 4B, E, F). Untransformed (MG1655) or uninduced ORF238-harboring cells, on the other hand, displayed normal phenotype and size distribution (Fig 4A, C, D).
ISes integrated in ctxvp60 block transcription of the Leu-rich ORF
All IS integrations, found in pCTXVP60, occurred either upstream or in the 5' end region of ORF238. It was assumed that ISes landing in this region relieve the cell from stress by blocking transcription of ORF238. To prove this, we showed that RNA prepared from cells harboring pCTXVP60 contained transcripts of the toxic gene detectable by RT-PCR. However, when picking a mutant with an IS1 or IS5 inserted into the 5' end of ctxvp60, transcripts of downstream sequences could not be detected (Fig. 5).
Growth dynamics and evolution of strains harboring pCTXVP60 is related to the number of genomic ISes
Growth characteristics of transformants of toxic and mutant plasmids were monitored in liquid medium. Ten parallels originating from 10 colonies were grown for each plasmid-host combination, and automatic O.D.-readings were taken in a Bioscreen C instrument (Fig. 6).
Initial growth of MG1655/pCTXVP60 stopped at ~O.D. = 0.2-0.5. However, after prolonged incubation, individual cultures resumed growth at various time points, and reached a final density comparable to the MG1655/pCTX control (Fig. 6A, B). Plasmids isolated from these outgrown cultures almost invariably contained ISes inserted in pCTXVP60. Reinoculation of the outgrown cultures into fresh medium resulted in uninterrupted, normal growth (Fig. 6C). The results were consistent with the observed growth characteristics of transformants on agar plates: i) pCTXVP60 causes growth retardation, ii) in a stochastic manner, a fraction of the cells picks up an IS insertion in ctxvp60, due to the mobility of genomic ISes, iii) cells harboring an IS-inactivated ORF238 resume normal growth and quickly become dominant in the culture.
In contrast, growth of MDS42/pCTXVP60 displayed a nearly uniform pattern. Initial growth of the cultures stopped at ~O.D. = 0.3, and only one culture of 10 parallels grew eventually to high density (Fig. 6D). Plasmid isolated from this dense culture revealed a deletion in ctxvp60. Results indicate that the primary route to inactivation of the toxic gene is via IS translocation.
Next, strains representing the various stages of the genome reduction process (MG1655, MDS12, MDS30, MDS42), and thus harboring various numbers of ISes (44, 25, 5, and 0, respectively) were transformed with pCTXVP60. The median growth curves observed for the transformants seemed to support the notion that the time needed for evolution of a non-toxic plasmid variant in the culture correlated with the number of ISes in the host genome (Fig. 6E). Even a single IS1, present in the host genome, had a marked effect on growth dynamics, and accelerated the evolution of fast-growing variants (Fig. 6F), as compared to the IS-less, essentially isogenic host (MDS42) (Fig. 6D).
Bioinformatic analysis of E. coli IS sequences in shotgun sequencing data
To test whether IS-mediated instability of cloned sequences could be more wide-spread than anticipated, a bioinformatic analysis of raw data from early genome sequencing projects was performed. Protocols for shotgun genome sequencing by the Sanger method started by the insertion of fragments from a target genome into a plasmid vector, followed by transformation into E. coli cells. Next, transformed cells were grown first on solid medium, then in liquid cultures, prior to processing. Enriched target fragments were then sequenced, and overlapping sequence reads were assembled into contigs. The middle, cell growing steps can be viewed upon as some biological test of E. coli 's tolerance to the introduction of foreign DNA, consequently, read data from the thousands of clones can be mined for signs of the host's reaction to foreign DNA segments [32]. In this study we searched the read sets for the appearance of E. coli IS elements in raw (prior to their assembly into target genome sequences) sequence reads.
A typical bacterial shotgun library for Sanger sequencing would comprise 2-4 × 104 clones, which are grown for about 20 generations on solid medium, followed by another 10 generations of growth in liquid cultures, prior to processing. Considering the typical spontaneous, combined transposition rate of 10-8/gene/generation of resident ISes [9], the chances of sequencing a spontaneously arisen insertion mutant is about 1.2 × 10-2 (=30 × 4 × 104 × 10-8). This means that approximately one in 100 sequencing projects would yield a gene interrupted by an IS of the cloning host. Higher incidence of host ISes in the sequence reads would likely indicate selection events preferring the IS-inserted clones during cultivation.
Data from 295 shotgun genome sequencing projects were downloaded and analyzed. The sets typically contain 50-70 thousand reads. Of the ~18 million reads, a total of 22 thousand match some E. coli IS sequence within the strict expectation, score, and run length thresholds. While 166 sets contain one or more reads with IS sequences, 129 sets are free of such reads. Of the 295 analyzed organisms 62 have complete (assembled) genome sequence. The number of organisms possessing both IS- containing reads and an assembled genome (intersection of the 166 set and the 62 set) is 30. In some IS-containing reads chromosomal sequence was not detectable, while in others IS elements seemed to have cloning vector origins (e.g., are spliced with E. coli lacZ), these have been weeded out. To avoid the difficulties of distinguishing true cloning related E. coli IS insertions from genomic self-rearrangements, organisms which have their own IS elements were excluded. After these filtering steps 14 shotgun sequencing sets remain, which possess a total of 109 IS-containing reads. IS10 and IS1 are the dominating invaders (they show up 68 and 30 times, respectively), but six other IS types were also found (IS2, IS3, IS4, IS5, IS150, IS186). Examples of IS elements appearing in sequence reads are shown in Fig. 7 and Additional file 5.