Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli
© Sørensen and Mortensen. 2005
Received: 12 November 2004
Accepted: 04 January 2005
Published: 04 January 2005
Skip to main content
© Sørensen and Mortensen. 2005
Received: 12 November 2004
Accepted: 04 January 2005
Published: 04 January 2005
Pure, soluble and functional proteins are of high demand in modern biotechnology. Natural protein sources rarely meet the requirements for quantity, ease of isolation or price and hence recombinant technology is often the method of choice. Recombinant cell factories are constantly employed for the production of protein preparations bound for downstream purification and processing. Eschericia coli is a frequently used host, since it facilitates protein expression by its relative simplicity, its inexpensive and fast high density cultivation, the well known genetics and the large number of compatible molecular tools available. In spite of all these qualities, expression of recombinant proteins with E. coli as the host often results in insoluble and/or nonfunctional proteins. Here we review new approaches to overcome these obstacles by strategies that focus on either controlled expression of target protein in an unmodified form or by applying modifications using expressivity and solubility tags.
Microorganisms like the enterobacterium Escherichia coli are outstanding factories for recombinant expression of proteins. An expression system for the production of recombinant proteins in E. coli usually involves a combination of a plasmid and a strain of E. coli . The main purpose of recombinant protein expression is often to obtain a high degree of accumulation of soluble product in the bacterial cell. This strategy is not always accepted by the metabolic system of the host and in some situations a cellular stress response is encountered. Another response encountered in recombinant systems is the accumulation of target proteins into insoluble aggregates known as inclusion bodies. These aggregated proteins are in general misfolded and thus biologically inactive .
Under normal cellular conditions a subset of cytoplasmic proteins are able to fold spontaneously  while aggregation prone proteins require the existence of a number of molecular chaperones that interact reversibly with nascent polypeptide chains to prevent aggregation during the folding process . Aggregation of recombinant proteins overexpressed in bacterial cells could therefore result either from accumulation of high concentrations of folding intermediates or from inefficient processing by molecular chaperones. No universal approach has been established for the efficient folding of aggregation prone recombinant proteins .
Some proteins directly influence the cellular metabolism of the host by their catalytic properties, but in general expression of recombinant proteins induces a "metabolic burden". The metabolic burden is defined as the amount of resources (raw material and energy), which are withdrawn from the host metabolism for maintenance and expression of the foreign DNA . The formation of inclusion bodies occurs as a response to the accumulation of denatured protein. The metabolic burden and inclusion body formation are not directly linked but are both among the main factors to determine the ability of cells to produce soluble recombinant protein. Since the accumulation of denatured protein and the metabolic burden can be controlled by a number of environmental factors, we are partially able to control the formation of soluble protein in vivo.
A well known technique to limit the in vivo aggregation of recombinant proteins consists of cultivation at reduced temperatures . This strategy has proven effective in improving the solubility of a number of difficult proteins including human interferon α-2, subtilisin E, ricin A chain, bacterial luciferase, Fab fragments, β-lactamase, rice lipoxygenase L-2, soybean lypoxygenase L-1, kanamycin nuclotidyltransferase and rabbit muscle glycogen phosphorylase (see  and references cited therein).
The aggregation reaction is in general favored at higher temperatures due to the strong temperature dependence of hydrophobic interactions that determine the aggregation reaction . A direct consequence of temperature reduction is the partial elimination of heat shock proteases that are induced under overexpression conditions . Furthermore, the activity and expression of a number of E. coli chaperones are increased at temperatures around 30°C [11, 12]. The increased stability and potential for correct folding at low temperatures are partially explained by these factors.
However, a sudden decrease in cultivation temperature inhibits replication, transcription and translation . Traditional promoters used in vectors for recombinant protein expression are also strongly affected in terms of efficiency . A similar transcriptional effect is achieved when a moderately strong or weak promoter is used or when a strong promoter is partially induced. Low induction levels have been found to result in higher amounts of soluble protein . This is a result of the reduction in cellular protein concentration which favors folding. However, bacterial growth is decreased, thus resulting in a decreased amount of biomass.
Different strategies aimed at optimizing the expression of recombinant proteins at low temperature are as follows.
A system based on the cspA promoter was developed for the expression of proteins at low temperature . The cspA promoter is highly induced at low temperature and is well repressed at and above 37°C. A sequence encoding the TolAI-β-lactamase fusion protein which is toxic to E. coli and rapidly degraded at 37°C was placed under the control of the cspA promoter. Temperature downshift to 15 or 23°C abolished degradation of the fusion protein and the toxic phenotype associated with expression at 37°C was suppressed. It was suggested that this system is a valuable tool for the production of proteins containing membrane-spanning domains or otherwise unstable gene products in E. coli.
A principle that allows for protein expression and folding at 4°C was presented recently . This principle is based on co-expression of the target protein with chaperones from a psychrophilic bacterium. The two chaperones (Cpn60 and Cpn10 from Oleispira antarctica RB8T) allow E. coli to grow at high rates at 4°C . An esterase from O. antarctica RB8T was co-expressed with Cpn60 and Cpn10 in E. coli at 4°C. This procedure increased the specific activity of the purified esterase 180 fold as compared to enzyme prepared from cultivations at 37°C. It was concluded that the low temperature was beneficial to folding and the system was suggested as a tool for expression and correct folding of recombinant proteins in the cytoplasm of E. coli.
Numerous specialized host strains have been developed to overcome the metabolic burden related to high level protein expression.
Two E. coli mutant strains have contributed significantly to the soluble expression of difficult recombinant proteins. C41(DE3) and C43(DE3) are mutants that allow over-expression of some globular and membrane proteins unable to be expressed at high-levels in the parent strain BL21(DE3) . Expression of the F1Fo ATP synthase subunit b membrane protein in these strains, in particular C43(DE3), is accompanied by the proliferation of intracellular membranes and inclusion bodies are absent . These strains are now commercialized by Avidis http://www.avidis.fr and a high number of reports on their use in expression of difficult proteins have been published [20–23]. A recent work reports that the stability of plasmids encoding toxic proteins is increased in C41(DE3) and especially in C43(DE3) .
Cysteines in the E. coli cytoplasm are actively kept reduced by pathways involving thioredoxin reductase and glutaredoxin. The disulfide bond dependent folding of heterologous proteins is improved in the Origami strains from Novagen. Disruption of the trxB and gor genes encoding the two reductases, allow the formation of disulfide bonds in the E. coli cytoplasm. The trxB (Novagen AD494) and trxB/gor (Novagen Origami) negative strains of E. coli have been selected in several expression situations [25–27]. Folding and disulfide bond formation in the target protein, is enhanced by fusion to thioredoxin in strains lacking thioredoxin reductase (trxB) . Overexpression of the periplasmic foldase DsbC in the cytoplasm stimulates disulfide bond formation further .
The simplest way to produce a recombinant protein is by batch cultivation. Here all nutrients required for growth are supplied from the beginning and there is a limited control of the growth during the process. This limitation often leads to changes in the growth medium such as changes in pH and concentration of dissolved oxygen as well as substrate depletion. Furthermore inhibitory products of various metabolic pathways accumulate. Cell densities and production levels are only moderate in batch cultivations.
In fed batch cultivations, the concentration of energy sources can be adjusted according to the rate of consumption. Several other factors can also be regulated in order to obtain the maximal production level in terms of target protein per biomass. The formation of inclusion bodies can be followed in fed batch cultivations by monitoring changes in intrinsic light scattering by flow cytometry . This allows for real time optimization of growth conditions as soon as inclusion bodies are detected even at low levels and inclusion body formation can potentially be avoided .
Folding of some proteins require the existence of a specific cofactor. Addition of such cofactors or binding partners to the cultivation media may increase the yield of soluble protein dramatically. This was demonstrated for a recombinant mutant of hemoglobin for which the accumulation of soluble product was improved when heme was in excess . Similarly, a 50% increase in solubility was observed for gloshedobin when E. coli recombinants were cultivated in the presence of 0.1 mM Mg2+ . An important factor in soluble expression of recombinant proteins is media composition and optimization. Although this is attained mostly by trial and error, it nevertheless may be beneficial.
A possible strategy for the prevention of inclusion body formation is the co-overexpression of molecular chaperones. This strategy is attractive but there is no guarantee that chaperones improve recombinant protein solubility. E. coli encode chaperones, some of which drive folding attempts, whereas others prevent protein aggregation [4, 11, 33]. As soon as newly synthesized proteins leave the exit tunnel of the E. coli ribosome they associate with the trigger factor chaperone . Exposed hydrophobic patches on newly synthesized proteins are protected by association with trigger factor from unintended inter- or intramolecular interactions thus preventing premature folding. Proteins can start or continue their folding into the native state after release from trigger factor. Proteins trapped in non-native and aggregation prone conformations, are substrates for DnaK and GroEL. DnaK (Hsp70 chaperone family) prevents the formation of inclusion bodies by reducing aggregation and promoting proteolysis of misfolded proteins . A bi-chaperone system involving DnaK and ClpB (Hsp100 chaperone family) mediates the solubilization or disaggregation of proteins . GroEL (Hsp60 chaperone family) operates the protein transit between soluble and insoluble protein fractions and participates positively in disaggregation and inclusion body formation. Small heat shock proteins lbpA and lbpB protect heat denatured proteins from irreversible aggregation and have been found associated with inclusion bodies [36, 37].
Simultaneous over-expression of chaperone encoding genes and recombinant target proteins proved effective in several instances. Co-overexpression of trigger factor in recombinants prevented the aggregation of mouse endostatin, human oxygen-regulated protein ORP150, human lysozyme and guinea pig liver transglutaminase [38, 39]. Soluble expression was further stimulated by the co-overexpression of the GroEL-GroES and DnaK-DnaJ-GrpE chaperone systems along with trigger factor . The chaperone systems are cooperative and the most favorable strategies involve co-expression of combinations of chaperones belonging to the GroEL, DnaK, ClpB and ribosome associated trigger factor families of chaperones [40–42].
Protein insolubility in the E. coli cytoplasm is partially related to the distribution of hydrophobic residues on the surface of the protein. The soluble expression of subunits of hetero multimeric proteins therefore sometimes suffers from inclusion body formation in the absence of an appropriate binding partner.
Soluble expression in E. coli of the bacteriophage T4 gene 23 product (major capsid protein) required the co-expression of gene product 31 (phage co-chaperonin gp31) . Expression of the correct interaction partner enabled gp23 to fold correctly and form long regular structures in the cytoplasm of E. coli.
Another study reports the purification of a heterodimeric complex by expression of each subunit (pheromaxein A and C) as a fusion to thioredoxin . Each subunit remained soluble in solution, when thioredoxin was proteolytically removed, only in the presence of the other.
Conclusively, interaction partners potentially favour in vivo solubility of target proteins. New systems for co-expression of multiple proteins involved in complex structures enable such strategies .
Target proteins are not always expressed in a soluble form by the strategies described above. The last part of this review discusses how misfolded proteins can be engineered or pushed to evolve and selected to gain soluble expression.
The use of affinity tags in recombinant protein purification has a long tradition. Not only have they been exploited for the development of generic purification strategies. Affinity tags have been observed to improve protein yield, to prevent proteolysis and to increase solubility in vivo [1, 45].
Among the most potent solubility enhancing proteins characterized to date are the E. coli maltose binding protein (MBP) and the E. coli N-utilizing substance A (NusA). MBP (40 kDa) and NusA (54.8 kDa) act as solubility enhancing partners and are especially suited for the expression of proteins prone to form inclusion bodies. Although many proteins are highly soluble, they are not all effective as solubility enhancers. E. coli MBP proved to be a much more effective solubility partner than the highly soluble GST and thioredoxin proteins in a comparison of solubility enhancing properties . Solubility enhancement is a common trait of maltodextrin-binding proteins (MBPs) from a number of organisms and some of them are even more effective than E. coli MBP . A precise mechanism for the solubility enhancement of MBP has not been found. However, MBP might act as a chaperone by interactions through a solvent exposed "hot spot" on its surface which stabilizes the otherwise insoluble passenger protein [48, 49]. The ability of MBP to promote the solubility of fusion partners can be improved by addition of supplemental tags. Different configurations for MBP fusion proteins have been suggested for high-throughput protein expression and purification .
Wilkinson and Harrison proposed a model for the theoretical calculation of solubility percentages of recombinant proteins expressed in the E. coli cytoplasm . A webserver for the calculation of this index is found at http://www.biotech.ou.edu. The Wilkinson-Harrison model along with experimental data identified NusA as a highly favorable solubility partner . The major advantage of NusA, in addition to the good solubility characteristics, is its high expressivity. Both MBP and NusA have been used for the solubilization of highly insoluble ScFv antibodies in the cytoplasm of E. coli [48, 53]. Numerous examples of MBP and NusA as functional solubility enhancers are found in the literature [54–57].
Fusion partners such as MBP and NusA are relatively large proteins. We recently suggested the use of a highly soluble N-terminal fragment of translation initiation factor IF2 (17.4 kDa) as a solubility partner . The use of a small partner reduces the amount of energy required to obtain a certain number of molecules, diminishes steric hindrance and simplify downstream applications such as NMR. Another relatively small protein, barnase was suggested to exert chaperone like functions both in vivo and in vitro when fused to the C-terminus of the light chain variable domain of an IgG .
In a recent study it was shown that a 17 residue C-terminal extension of Pfg27 resulted in several fold enhancement of soluble expression . Several studies have shown that the nature of terminal residues in proteins can play a role in recognition and subsequent action by proteases [64, 65]. The terminal extension of proteins might therefore indirectly protect them from the denaturaturation/misfolding associated to partial proteolytic degradation. It has also been suggested that large net charges of peptide extensions increases electrostatic repulsion between nascent polypeptides and therefore enhances their correct folding .
Screening strategies have been employed to select for favorable fusion partners in a high throughput manner. In such a system more than 80% of the proteins tested showed high levels of expression of soluble products with at least one of eight fusion partners including NusA, intein, thioredoxin, His-tag, MBP, calmodulin binding protein and glutathione-S-transferase . These results were supported by another similar study .
Structural and functional genomics and proteomics are important elements in the evaluation of gene function. The expression and purification of properly folded proteins in a high throughput manner are key elements in these studies. A number of different approaches to the high throughput screening of soluble expression products have been described recently.
The intrinsic folding yield, stability and solubility of target proteins can be improved by engineering the target protein. When structural information is available, the solubility of the expressed protein has been improved by rational site directed mutagenesis . A more general approach is to find more soluble variants by directed evolution. Libraries generated in this context include random point mutants, deletions and fragments . The generated mutants are screened for solubility either by the function of the protein of interest or by more general screens. A screen based on biological activity implies that a new assay has to be developed for every new protein studied. Moreover, in many cases the protein or protein domain studied does not display any known activity at all. The general screens include fusion reporter methods, stress reporter methods and direct methods and are therefore usually preferred for high-throughput approaches.
Fluorescence of E. coli cells expressing target genes fused to the GFP-gene is related to the solubility of the target gene expressed alone . Hence, protein folding in E. coli can be improved by directed evolution approaches for a certain target protein by screening for fluorescing mutants. This approach evolved three insoluble proteins including Pyrobaculum aerophilum methyl transferase, tartrate dehydratase β-subunit and nucleoside diphosphate kinase to be 50%, 95% and 90% soluble respectively . The GFP reporter system was further used to screen for solubilizing interaction partners to insoluble targets. Fusion of integration host factor β upstream to GFP resulted in aggregation, whereas co-expression of the binding partner (integration host factor α) increased fluorescence dramatically .
A similar approach is the use of selective pressure. By fusing target proteins with chloramphenicol acetyl transferase (CAT) more soluble fusion protein mutants were selected on media containing progressively higher levels of chloramphenicol . Furthermore, selective pressure (fusion to kanamycin phosphotransferase) was used in a system aiming at the obtainment of soluble proteins encoded by cDNA fragments in a high throughput approach .
Another fusion reporter method use the β-galactosidase α peptide as fusion partner in a screen for lacZα complementation in a system where inactive lacZΩ is supplied in trans. Active β-galactosidase can be detected when the α peptide becomes soluble and restore enzyme activity by binding to lacZΩ .
An innate host cell response is induced when recombinantly expressed proteins are misfolded. This response can be monitored by the transcription from E. coli promoters that are up-regulated when misfolded proteins are expressed. It was found that the promoter for the small heat shock protein ibpA could be fused to lacZ and used as a reporter for misfolded protein . This reporter could discriminate soluble, partially soluble and insoluble recombinant proteins. Genetic screens and directed evolution is further reviewed elsewhere .
Soluble fusion proteins are not necessarily biologically active and properly folded. Several reports have demonstrated that soluble preparations of fusion proteins have low biological activity as compared to the non-fused protein . It was shown that a fusion of HPV oncoprotein E6 to MBP formed soluble multimeric aggregates composed of folded MBP and misfolded E6. These "soluble inclusion bodies" could be avoided by optimization of the expression conditions by screening for monodispersity .
A few strategies that are radically different from the conventional fusion partner and selection approaches have been developed for the potential rescuing of recombinant proteins from misfolding in the E. coli cytoplasm.
A system based on artificial oil bodies was developed and illustrated by a fusion protein composed of oleosin and GFP . The expressed fusion protein was found in the insoluble cellular fraction but could be reconstituted as oil-bodies by addition of triacylglycerol and phospholipids to the purified inclusion bodies. GFP could subsequently be separated from the oil bodies using an engineered factor Xa cleavage site and centrifugation.
An in vivo rescuing system based on the E. coli ribosome was recently presented . Target proteins are rescued from in vivo aggregation by fusing them to ribosomal protein L23. The fusion protein is expressed in a strain of E. coli deficient in the essential L23 ribosomal protein. This allows for the covalent coupling of target proteins to the highly soluble ribosomal particles. Ribosomes with coupled target protein can subsequently be isolated by centrifugation methods and the target protein released in a highly enriched form by site specific protease cleavage.
We have reviewed the most recent improvements in obtaining soluble and functional protein preparations from E. coli recombinants. A subset of the methods focus on relieving the cellular stress that is a response to the extreme metabolic situation experienced by the host cell during the process of hyperexpression of a single or a few proteins. A second subset of methods focus on improving the solubility and structural stability of the expressed protein, by the combination of the target protein with specific peptide tags. A common trait in modern expression strategies is the skillful combination of the utensils in the genetic toolbox, but also a constant reconsideration of the accepted paradigms in trade of protein expression.
K.K.M is funded by grants from the Danish Natural Science Research Council and Carlsberg (grants no. 21-03-0592, 21-04-0149 ANS-0987/40 and ANS-1649/40).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.