Expression of lignocellulolytic enzymes in Pichia pastoris

Background Sustainable utilization of plant biomass as renewable source for fuels and chemical building blocks requires a complex mixture of diverse enzymes, including hydrolases which comprise the largest class of lignocellulolytic enzymes. These enzymes need to be available in large amounts at a low price to allow sustainable and economic biotechnological processes. Over the past years Pichia pastoris has become an attractive host for the cost-efficient production and engineering of heterologous (eukaryotic) proteins due to several advantages. Results In this paper codon optimized genes and synthetic alcohol oxidase 1 promoter variants were used to generate Pichia pastoris strains which individually expressed cellobiohydrolase 1, cellobiohydrolase 2 and beta-mannanase from Trichoderma reesei and xylanase A from Thermomyces lanuginosus. For three of these enzymes we could develop strains capable of secreting gram quantities of enzyme per liter in fed-batch cultivations. Additionally, we compared our achieved yields of secreted enzymes and the corresponding activities to literature data. Conclusion In our experiments we could clearly show the importance of gene optimization and strain characterization for successfully improving secretion levels. We also present a basic guideline how to correctly interpret the interplay of promoter strength and gene dosage for a successful improvement of the secretory production of lignocellulolytic enzymes in Pichia pastoris.


Background
Although Pichia pastoris is a relatively simple eukaryotic organism it can perform many posttranslational modifications such as glycosylation, disulfide bond formation, and proteolytic processing [1]. Therefore, Pichia serves as an interesting alternative to other (more difficult to handle) fungal secretory expression systems that are used to produce lignocellulolytic enzymes and other eukaryotic proteins which typically require post-translational modifications for correct folding, stability and activity. The recalcitrant and complex nature of lignocellulosics [2] affords the application of complex enzyme mixtures for efficient hydrolysis of these renewable sources. Consequently, for a sustainable production of fuels, chemical building blocks, and functional macromolecules from plant biomass a multitude of different enzymes is needed. To produce all these enzymes and variants thereof, production strains which can be handled and engineered in a simple way need to be generated. Therefore, being a well-described and widely applied expression host [3] P. pastoris was the first choice for the heterologous expression of the selected target proteins. Furthermore, in contrast to many other eukaryotic expression systems P. pastoris secretes no endogenous lignocellulolytic enzymes in significant amounts [4]. Therefore, recombinant Pichia strains can provide almost pure heterologous enzyme preparations without the need of extensive and costly downstream processing. In addition, simple media requirements and relative easy handling in bioreactors enable inexpensive large-scale cultivations of Pichia [5]. All these characteristic features of Pichia contribute to its high potential for cost reduction during the production of lignocellulolytic enzymes, particularly for application studies when only low-and medium-scale enzyme productions are required.
However, even though Pichia pastoris is a good host for the expression of heterologous proteins [3] there is still space for improvements on transcriptional [6,7] and (post-) translational level [8,9]. In this work we exemplify the impact of gene optimization on the overall expression level of lignocellulolytic enzymes in Pichia pastoris. Most genomes are heterogeneous in codon usage [10] and, accordingly, the codon bias of small subsets of genes may differ clearly from the average codon usage of the genome. To optimize protein coding sequences for enhanced protein expression in Pichia pastoris we use an in-house developed biased codon usage table [11]. This codon usage is biased towards the codons of selected, highly expressed [12,13] endogenous and heterologous genes when the AOX1 promoter and methanol were used for induction in Pichia pastoris. In addition to gene optimization, enzyme expression can be influenced on a transcriptional level by varying copy numbers of the integrated expression cassettes and by the choice of the promoter. So far the wildtype AOX1 promoter (P(AOX1)) and, to a certain extent, the GAP promoter (P(GAP)) were mostly used for heterologous protein production in Pichia pastoris [3]. However, since the P (GAP) is strong and constitutive it is not a good choice for production of physiologically problematic or cytotoxic proteins [14]. In contrast, the P(AOX1) is even stronger but also tightly regulated. Nonetheless, for some heterologous proteins the high transcript level generated by P(AOX1) can overload the cellular post-translational machinery, resulting in misfolded, unprocessed, or mislocalized proteins that can trigger a complex cellular response known as the unfolded protein response [15,16]. To overcome these disadvantages of the wild-type GAP and AOX1 promoter a library of promoters based on the wild-type P (AOX1) was previously generated [6]. The distinct properties of these novel promoters regulate the transcript level of target mRNA in response to the available carbon source level and type and concomitantly achieve a fine-tuned protein expression in Pichia pastoris.
The aim of the present study was to show the functional expression of lignocellulolytic enzymes in Pichia pastoris at high quantities and investigating the effect of gene optimization and of alternate promoters on the expression level of these enzymes. Our expression studies highlight basic principles for designing suitable expression constructs and for successful strain development for different cellulolytic enzymes. For this study Trichoderma reesei cellobiohydrolase 1 and 2 (TrCBH1 and TrCBH2) and beta-mannanase (TrbMan), and Thermomyces lanuginosus xylanase A (TlXynA) were chosen as target enzymes.

Results and discussion
The goal of this study was to evaluate the potential of Pichia pastoris to express lignocellulolytic enzymes. In particular, we improved the expression of selected (hemi-) cellulases by codon optimization of the target genes, investigated the effect of promoter choice, and characterized the performance of selected producer strains in small-scale bioreactors. This characterization also included the effects of multi-copy integration on the productivity for the selected target enzymes.
To investigate the effect of different methods for codon optimization three different gene variants of Trichoderma reesei cellobiohydrolase 2 (TrCBH2) were employed; the native gene variant (TrCBH2-wt), a gene variant with optimized codon pairs by a commercial supplier (TrCBH2-CP) and an in-house-optimized variant (TrCBH2-HM). For the in-house design a codon usage table [17] derived from genes which are highly expressed in Pichia pastoris in methanol containing media was used.
The effects of gene optimization and promoter type were characterized by comparing activity landscapes of different strains (Figure 1). For this purpose P. pastoris strains were cultivated in 96 deep-well plates according to [18] and subsequently screened for lignocellulolytic activities using a reducing sugar assay that was recently adapted to high-throughput [19]. Owing to the low standard deviation of this assay, the detected changes in the activity landscapes mainly reflect actual changes in the expression level [19]. These differences can either be due to the number of integrated expression cassettes or caused by specific effects of the individual gene variants. Figure 1 shows enzyme activity landscapes of TrCBH2wt and the two differently optimized TrCBH2 gene variants which have all been separately incorporated into the same expression vector and host. Stable integration of expression cassettes into the Pichia pastoris genome is generally based on homologous recombination but can also be an effect of non-homologous end-joining (NHEJ). Depending on the length, type and structure of the homologous flanking regions, untargeted (random) genome integration mediated by NHEJ becomes prevalent over locus-specific targeting (own observation for our vector system). Therefore, expression levels may be influenced not only by the number of integrated gene copies [20] but also by the integration locus which influences the transcript levels of the integrated genes. Our results demonstrate a clear effect of gene optimization on expression level. This is corroborated by the fact that our interpretation of expression level does not rely on a single observation but is averaged over a whole activity landscape of many individual transformants ( Figure 1). This could be substantiated by reliably proving low copy numbers among differently optimized genes, in order to get a decent comparability of the influence. The 2-fold increase in expression level of TrCBH2-HM compared to TrCBH2-wt suggests a more efficient transcription and/or translation of this variant in P. pastoris. Contrary to this, the gene optimized by the commercial service using codon pair optimization, TrCBH2-CP, showed a 2fold lower expression level than TrCBH2-wt. Being originally designed to assist co-translational protein folding [21] of multi-domain proteins we expected the optimization based on codon pair signaling to show improved expression for the two-domain enzyme TrCBH2. However, as we observed the opposite effect for TrCBH2-CP we speculate that the bottleneck of TrCBH2 expression is rather on transcriptional level than on the posttranslational level of protein folding. Summarizing, the optimized gene variant TrCBH2-HM was superior to all other variants under the tested methanol-inducing conditions. This suggests that preferring codons with a high codon adaptation index (CAI) for highly expressed proteins under methanol inducing conditions is a good choice for TrCBH2. Especially for secreted proteins the level of expression strongly depends on the number of integrated expression cassettes. Therefore, often the production efficiency of a strain can be predicted by quantifying the number of genome-integrated expression cassettes (copy number, CN) [15,20,22,23]. In P. pastoris an initial (linear) positive correlation between copy number and productivity that stagnates at a defined upper limit can be observed [20,22]. Furthermore, in some cases also a loss of productivity above a certain number of integrated copies has been described [15,23]. In fact, high mRNAlevels caused by strong promoters or by high numbers of the expression cassettes can overload the folding and secretion machinery of the host. Depending on the protein this can entail an accumulation of unfolded proteins which triggers dedicated signaling pathways, commonly known as the unfolded protein response [24]. For comparative studies it is, therefore, essential to characterize the strains with regard to their copy numbers. Doing so will also allow the separation of promoter and/or copy number related effects on expression levels.
To determine the individual expression levels of TrCBH2 expressing P. pastoris strains under bioreactor conditions we selected suitable strains based on initial micro-scale screenings in 96 deep-well plates and by quantitative gene copy number determination using qRT-PCR. Gene expression was driven either by the wild-type promoters P(AOX1) and P(GAP) or by synthetic promoter variants. These synthetic promoter variants are part of a newly generated promoter library based on P(AOX1) which was designed to fine-tune protein expression in Pichia pastoris [6]. With regard to their particular regulatory features two of these synthetic promoters were chosen to be tested in this study, namely P(En) and P(De). In the original publication by Hartner et al.  Codon pair optimized sequence in black (CP), wt sequence in grey, high CAI codons for methanol induced gene expression in white (HM) [11]. Released cellobiose concentration is represented in bars. The untransformed strain P. pastoris CBS7435 Mut S was used as negative control. methanol induction, respectively. P(De) showed more than 4-fold higher GFP fluorescence intensity under derepressed conditions but decreased expression down to 55%, if compared to the wild-type promoter P(AOX1) on single copy level after 0 h and 72 h of methanol induction, respectively. Even though, GFP expression driven by P(De) resulted in decreased protein production it was shown that this promoter was favorable for difficult to secrete proteins such as horseradish peroxidase (HRP). The overall productivity in fed-batch cultivations of HRP expression driven by P(De) was significantly higher than compared to the overall productivity of HRP expression driven by P(AOX1) [6]. In Figure 2A the time-courses of protein concentrations in the supernatants during fed-batch cultivations of TrCBH2 are compared. Figure 2A shows that the strain P(De)-TrCBH2-CP-CN25 ± 7 which harbors about 25 expression cassettes achieved around 4 g/l of TrCBH2. This is comparable to the expression of P(De)-TrCBH2-HM-CN7 ± 1 which is optimized using our in-house HM method. It can also be seen from Figure 2B that our inhouse gene optimization method HM outperforms that of the commercial supplier (compare also Figures 1 and  3B). Although different promoters were used the expression of P(De)-TrCBH2-CP-CN25 ± 7 and P(AOX1)-TrCBH2-CP-CN7 ± 1 normalized to the same level suggesting that a linear correlation of expression independent of promoter type up to a CN of 25 exists for the CP optimization. Based on these data and on data from literature [20,22] we observed two properties for TrCBH2 expression. Firstly, using the AOX1 promoter variant P(De) we observed a positive initial (linear) correlation up to at least 7 copies between copy number and productivity. Secondly, gene optimization with our in-house method results in higher expression level at low copy numbers.
As observed in the micro-scale screening ( Figure 1) TrCBH2-HM led to a higher expression level than TrCBH2-CP ( Figure 3) also in fed-batch cultivations. This effect was even more pronounced for the expression regulated by P(AOX1) with a 5-fold improvement of TrCBH2-HM over TrCBH2-CP ( Figure 3B). In contrast, for the P(De)-TrCBH2 variants we only observed a 3-fold improvement ( Figure 3B) which can be explained by the lower promoter strength of P(De) as described previously [6]. In addition, the relative ratios of the different gene optimization variants ( Figure 3B) and the relative ratios of the different gene promoter variants ( Figure 3C) allow also a better comparison between the methanol inducible promoters P(AOX1) and P(De). ( Figure 3C) shows that the strong methanol-inducible P(AOX1) increases expression of TrCBH2-HM around 1.7-fold compared to expression under the control of P(De). For TrCBH2-CP expression under the control of different promoters only a 1.2-fold improvement can be seen ( Figure 3C). These results clearly indicate for TrCBH2 expression that the codon optimization which is based on the codon bias of highly transcribed genes under methanol-inducing conditions gives even higher expression when a strong methanol-inducible promoter is employed.
After 90 h of induction the single copy expression level of P(De)-TrCBH2-HM-CN1 is approximately 0.43 g/l ( Figure 2B) whereas a higher single copy expression level of about 0.930 g/l (normalized to CN) can be calculated for P(AOX1)-TrCBH2-HM-CN3. Based on these data, P(AOX1) gives an around 2-fold higher expression level than P(De). Although the strain P(De)-TrCBH2-HM-CN7 ± 1 performed best under the tested MeOH-inducing conditions our results, based on the normalized data, indicate that strong methanol-inducible promoters such as P(AOX1) or the even stronger methanol-inducible P(En) [6] can further increase the expression of TrCBH2. To verify this hypothesis on fermenter scale we decided to screen for higher copy number strains expressing TrCBH2 under the control of P(AOX1) and P(En). As seen in Figure 4/A the selected strains with increased copy numbers P(AOX1)-TrCBH2-HM-CN5 ± 1 and P(En)-TrCBH2-HM-CN6 ± 1 indeed produced significantly more protein over the whole induction period than the best strain of the first fermentation P(De)-TrCBH2-HM-CN7 ± 1. Within the first 70 h of induction the productivity of P(En)-TrCBH2-HM-CN6 ± 1 was higher than the productivity of P(AOX1)-TrCBH2-HM-CN5 ± 1. This confirms the results of the previously reported GFP expression experiments [6] using an improved synthetic AOX1 promoter variant. The final protein yield of both strains was comparable at around 6 g/l. Summarizing, using strong methanol inducible promoters in combination with high copy numbers of genes that are optimized to a high CAI for highly expressed proteins under methanol induction can further increase the yield of TrCBH2. Moreover, we showed that pre-selection of strains using micro-scale screenings and further strain characterization using qRT-PCR for copy number determination is a useful tool to reduce bioreactor cultivations to a reasonable number.
Although Trichoderma reesei typically can produce more than 100 g/l of cellulases [25], individual enzymes such as TrCBH2 are expressed in much lower quantities (10-15%) [26]. Table 1 gives an overview of published expression yields and activities of the different lignocellulolytic enzymes in different host systems. So far, Miettinen-Oinonen et al. achieved the highest protein concentration of 0.7 g/l TrCBH2 in T. reesei strains cultivated in shake flasks [27] which is around 9-fold less than compared to our highest cellobiohydrolase concentration in Pichia pastoris bioreactor cultures. For other heterologous host systems such as S. cerevisiae [28] and S. pombe [29] even lower TrCBH2 concentrations in the range of 0.1 g/l have been reported. Regarding the specific activities of TrCBH2, we obtained 3.04 U/mg on Avicel, 5.30 U/mg on PASC and 1.51 U/mg on CMC whereas 2.52 U/mg on PASC and 0.09 U/mg on CMC have been reported for the S. pombe system [29].
To evaluate P. pastoris's capability for expressing various other lignocellulolytic enzymes we also expressed xylanase A from Thermomyces lanuginosus (TlXynA), beta-mannase from Trichoderma reesei (TrbMan) and cellobiohydrolase 1 from Trichoderma reesei (TrCBH1). All genes were optimized using the in-house codon usage table and subsequently cloned downstream of the synthetic promoter P(En) [6]. TrbMan was also cloned downstream of the constitutive P(GAP) promoter. For TlXynA only the strong promoter P(En) was selected to push its already proven high expression in Pichia pastoris with the native AOX1 promoter [34]. Similar to the experiments for TrCBH2 high-throughput deep-well plate screenings were performed with the adapted pHBAHassay and the CNs were determined by qRT-PCR. Five Figure 2 Time-course of protein concentration during fed-batch cultivations. Gene sequences were optimized either by codon pair optimization (CP) [21] or by applying the high CAI codons for methanol induced genes (HM) [11]. Copy numbers (CN) are specified in the legend.  [6]. For virtual gels of the protein yields during the fermentation runs please refer to Additional file 1 and the raw data can be found in Additional file 2.
TrbMan P(En) strains harboring 1, 4, 6 ± 1, 16 ± 4 and 39 ± 9 copies, one single copy strain under the control of P(GAP) and three TlXynA P(En) strains with 6 ± 1, 10 ± 3, and 18 ± 4 copies were fed-batch cultivated. Although we successfully expressed TrCBH1 in the micro-scale screening bioreactor fermentations yielded similar low protein concentrations for heterologously protein expression in the range of a few mg per liter as previously reported in literature [28,35,36]. Therefore, those strains were not characterized in more detail for this paper.  Although the initial micro-scale screening revealed expression of TrbMan under the control of P(AOX1) and P(GAP) over a broad range of CNs only the single copy strain of TrbMan P(En) successfully produced TrbMan in the bioreactor ( Figure 2). All other P(En) regulated strains with more than one copy had major growth problems shortly after induction resulting in attenuated growth (data not shown) when grown under our standard cultivation conditions. In contrast, the P(En)-TrbMan-HM-CN1 strain showed normal growth after recovering from an initial cessation of growth post methanol-induction (data not shown). Under constitutive expression of TrbMan using P(GAP) the growth rate was slowed down and no TrbMan was produced even though just a single expression cassette was integrated into the Pichia genome (data not shown). This could be a further example of the potentially cytotoxic effects [14] of constitutive heterologous protein expression with P(GAP) in P. pastoris. In addition, TrbMan seems to be generally difficult to express in yeasts under constitutive promoters. As an example, TrbMan was only produced at a level of 0.150 mg/l in Saccharomyces cerevisiae [30] under the control of the constitutive phosphoglycerate kinase (PGK) promoter. Our experiments revealed that the molecular weight ratio of glycosylated and deglycosylated TrbMan was about 4 (determined by capillary electrophoresis, data not shown). Therefore, hyper-glycosylation of TrbMan in P. pastoris might be another problem for expression.
To test if a weaker methanol-inducible promoter can increase the productivity we also tested the synthetic promoter P(De) for TrbMan. As seen in Figure 4B, over the whole induction period the strain P(De)-TrbMan-HM-CN5 ± 1 achieved significantly more protein than compared to P(En)-TrbMan-HM-CN1 ( Figure 4B). As previously seen in deep well experiments [6] P(De) has a weak onset of expression during the glucose depletionderepression phase which could also be presumed for bioreactor cultivations. Based on that, we further assume that a weaker onset leads to a better adaptation of the Pichia system for the production of TrbMan. In addition, it was recently shown that expression under the control of P(De) can result in positive effects on cell physiology compared to expression under the control of P (AOX1) [37]. Consequently, P(De)-TrbMan-HM-CN5 ± 1 was capable of producing 1.142 g/l of protein ( Figure 4B) devoid of any directly observable growth problems during fermentation (data not shown). The obtained yield of 1.142 g/l of TrbMan is~7600-fold higher than the so far highest reported heterologous yield of 0.150 mg/l expressed in Saccharomyces cerevisiae [30] ( Table 1). The activity of TrbMan expressed in our study was 109 U/ml  [31] showed an activity of 1.8 U/ml. This value is about 60 times less than compared to our results. Comparing heterologously expressed TrbMan our obtained activity of 109 U/ml is approximately 11000-fold higher than the 0.01 U/ml expressed in S. cerevisiae [30]. Regarding the specific activities for TrbMan, we achieved 95.4 U/mg compared to 66.7 U/mg for TrbMan expressed in S. cerevisiae [30] (see Table 1) which usually shows an even higher tendency for hyper mannosylations that could limit the activity of TrbMan.
For the fourth target, TlXynA, fed-batch bioreactor cultivation of Pichia strains regulated by P(En) and harboring 6 or 10 integrated expression cassettes produced around 1 g/l whereas one strain with 18 integrated expression cassettes showed a reduced protein concentration of 0.25 g/l ( Figure 2). As mentioned before, such negative correlation at higher copy numbers and productivity had already been described in literature for other proteins expressed in Pichia [15,23].
Our yield of 1.2 g/l TlXynA represents a 5 to 8-fold increase in yield compared to earlier expression studies in Pichia pastoris by Damaso et. al. [32] and Gaffney et. al. [33], respectively. Compared to T. lanuginosus shake flask cultures we achieved about 4 times more protein than reported before in [32] (Table 1). Our obtained specific activity of 115.00 U/mg was similar to the specific activities of 113.56 U/mg and 271.62 U/mg that were obtained by Pichia pastoris, [33] and [32] respectively. The specific activity of homologously expressed TlXynA of 327.8 U/mg [32] was approximately 3-fold higher than compared to our obtained values. The comparison to homologously expressed TlXynA indicates that the enzyme produced in P. pastoris showed lower specific activity although total volumetric yields were higher.
Generally, we speculate that the variation in specific activities of all enzymes could predominantly be attributed to the different glycosylation pattern that is produced by P. pastoris [38]. This phenomenon has already been described in literature e.g. by Macauly-Patrick et.al. [14].
Unfortunately, there are only limited bioreactor cultivations reported for TrCBH2, TrbMan, and TlXynA, therefore, above made direct comparison of bioreactor results to published shake flask expression experiments are biased. However, we can still conclude that homologous expression yielded the highest specific activities but not necessarily the highest total protein yields. Although P. pastoris is an excellent host for achieving high protein concentrations heterologous expression can also influence the activity of the expressed enzymes.
Nevertheless, comparing the calculated specific activities from Table 1 there is a general trend that the specific activities of the enzymes produced by P. pastoris are in the range or even higher than the specific activities of the same enzymes expressed in other heterologous hosts. This makes Pichia a good compromise for the expression of high quantities of enzymes with relatively high specific activities. Furthermore, it also shows the possible relevance of host strain glyco-engineering for industrial enzyme production as it already has for the production of biologically active pharmaceutical proteins.

Conclusions
We have successfully constructed P. pastoris strains capable of producing maximum protein concentrations of 1.142 g/l TrbMan, 6.55 g/l TrCBH2, and 1.2 g/l TlXynA in fed-batch bioreactor cultivations. Moreover, we showed that suitable codon optimization of the target genes helps to increase heterologous protein production by P. pastoris, thus providing a simple way of increasing heterologous protein production for individual enzymes.
Furthermore, we emphasize the importance of transcript level optimization by alternative promoters and gene dosage (numbers of integrated gene copies) for expression optimization. This was particularly evident for the functional expression of TrbMan. The strong constitutive and methanol inducible promoters P(GAP) and, P(AOX1) respectively, secreted no or less protein than the weaker synthetic promoter P(De).
Basically there are three classes of genes (A,B,C) with varying dependence of yields of active proteins in relation to copy numbers: For class A genes an increase in copy number to more than 10 copies has a positive effect on protein expression, as seen in the case of TrCBH2. For class B genes the yield of active protein increases within the number of integrated copies up to a copy number of 2-10 and decreases with higher copy numbers, as seen in the case of TlXynA. Finally, class C genes where yields of active protein get worse with increasing copy numbers, as seen in the case of TrbMan. However, these effects definitely depend on the strength of the employed promoter as well as the gene encoding the respective target protein.
Our conclusions are based on a better understanding of promoter and/or copy number-related effects. Codonoptimized genes together with optimized promoters and numbers of integrated expression cassettes allowed us to develop P. pastoris strains producing high levels of lignocellulolytic enzymes. In combination with the high specific activities compared to the same enzymes expressed in other hosts, Pichia seems to be a good choice for the heterologous expression of individual lignocellulolytic enzymes.

Chemicals and Materials
Oligonucleotide primers were obtained from Integrated DNA Technologies (Leuven, Belgium). For plasmid isolation the GeneJET™ Plasmid Miniprep Kit of Fermentas (Burlington, Ontario, Canada) was used. All DNAmodifying enzymes were obtained from Fermentas GmbH

Construction of P. pastoris strains
The coding sequences of xylanase A from Thermomyces lanuginosus (TlXynA) [UniProtKB/Swiss-Prot: O43097], beta-mannanase (TrbMan) [UniProtKB/TrEMBL: Q99036], cellobiohydrolase 1 (TrCBH1) [UniProtKB/Swiss-Prot: P62694] and cellobiohydrolase 2 (TrCBH2) [UniProtKB/ Swiss-Prot: P07987] from Trichoderma reesei inclusive of their natural secretion leaders were codon optimized for P. pastoris expression applying the Gene Designer software (DNA2.0, Menlo Park, CA, USA) based on an in-house developed codon bias [11]. The GC content was set to be between 40 and 60% without local peaks and restriction sites for cloning were avoided. In addition, one other variant of TrCBH2 was ordered from a commercial supplier (CODA Genomics, Laguna Hills, CA) which optimized the genes based on the method of codon pair signaling [21]. The native DNA sequence was kindly provided by Frances H. Arnold. To further optimize translation all genes were cloned after a defined Kozak consensus sequence (gaaacg) [39]. The synthetic genes were cloned into the multiple cloning site of the E. coli/P. pastoris shuttle vector pPpB1 [11] via EcoRI/ NotI restriction sites. The TrCBH2 variants were cloned downstream of the wild type promoters P(GAP) and P(AOX1) and synthetic promoter variants with distinctly different regulation patterns were also included, namely P(En) and P(De). P(En) can be induced by methanol and showed increased GFP expression up to 166%, if compared to the wild-type promoter P(AOX1). P(De) can either be induced by methanol or under derepressed conditions as described by Hartner et al. [6]. TrbMan and TlXynA were cloned downstream of the synthetic promoter P(En) [6] and in addition TrbMan was cloned downstream of P(GAP) and P(De). Plasmids were linearized with BglII, subsequently purified and concentrated using the Wizard_ SV Gel and PCR Cleanup System (Promega Corp.). Electro-competent P. pastoris CBS 7435 mut S cells were prepared and transformed with 1-to 2 μg of the BglII-linearized pPpB1 vector construct according to Lin-Cereghino [40]. Transformants were plated on YPD-Zeocin (100 μg/ml Zeocin) agar plates and grown at 28°C for 48 h.
Micro-scale cultivation and high-throughput screening P. pastoris strains expressing TlXynA, TrbMan, and TrCBH2 were cultivated in 96-deep well plates as described by Weis et al [18]. Incubation was done in shakers (INFORS Multitron, Bottmingen, Switzerland) at 28°C, 320 rpm, and 80% relative humidity. After an initial batch phase for 60 h on 1% glucose the cultures were induced with 0.5% of methanol for a total of 72 h (additional supplementations to 0.5% methanol were added after 12, 36 and 60 h of the first induction with methanol). After induction the cells were pelleted at 4000 rpm and enzymatic activities were determined in the supernatants using the pHBAH-assay as previously described by Mellitzer et al. [19]. Substrate conversions were performed in 50 mM citrate buffer containing appropriate substrate for each enzyme (either suspensions of 1% Avicel W , 0.25% PASC or solutions of 0.25% CMC, 0.5% xylan or 0.2% locust bean gum) at 50°C (TrCBH2, TrCBH1 and TrbMan) or at 59°C (TlXynA). The incubation time was 2 h for the cellobiohydrolases and 20 min for TlXynA and TrbMan. For the subsequent reducing sugar assay 50 μL of the substrate reaction (or, in the case of the standard sugars, appropriate dilutions of the reducing sugars) were pipetted into 150 μl of the pHBAH working solution in a 96-well PCR plate. The plate was sealed and incubated at 95°C for 5 min and then cooled to 4°C. 150 μl of the assay samples were transferred to a new micro-titer-plate and the absorption measured at 410 nm in a SPECTRA MAX Plus384 plate reader (Molecular Devices Corp., Sunnyvale, CA, USA). For exact quantification of reducing sugars a standard curve of the respective reducing sugar (0-1 mg/ml) was included on each plate. Activity units for the expressed enzymes refer to the amount of released reducing sugar over time and correspond to the standard IUPAC definition μM/min.

Copy number determination by quantitative real-time PCR
Copy numbers of integrated expression cassettes in the Pichia genome were determined using quantitative realtime PCR (qRT-PCR) as described by Abad et al. [17].

Fed-batch cultivations of Pichia pastoris strains
Pre-cultures of individual strains were grown in 50 and 200 ml YPhyD in wide-necked, baffled shake flasks at 120 rpm at 28°C. Each fermenter (Infors Multifors system (Infors AG, Bottmingen, Switzerland)) containing 450 ml BSM-media (pH 5.0) was inoculated from the pre-culture to an OD600 of 2.0. During the batch phase P. pastoris was grown on glycerol (4%) at 28°C. At the beginning of the glycerol feeding phase the temperature was decreased to 24°C. For methanol-fed cultures, the fed-batch phase was started upon depletion of initial glycerol with 16 g/(l*h) glycerol feed solution followed by methanol induction. In the early stages, the methanolfeed was set to 2 g/(l*h) and was gradually increased within the next 70 h to 6 g/(l*h). Likewise, the glycerolfeed was phased down during the first hour of methanol induction to 0 g/(l*h). Dissolved oxygen was set to 30% throughout the whole process. After 91.5 h of methanol induction the fermentations were stopped. For glycerolfed strains, the batch phase was directly followed by a constant glycerol-feed with 6 g/(l*h). Protein concentrations were determined by micro-fluidic capillary electrophoresis (CE) using fluorescence detection (Caliper GXII System, Hopkinton, USA). Standard deviations of this robust system are usually below 10%, even at high protein loads (exemplified in Additional file 3). Therefore, just single measurements of every sample were performed. More specifically, proteins were quantified by calibrating the integrated areas of the protein-specific peaks in the electropherograms to an external reference protein standard (BSA) of known concentration. For glycosylated proteins, peak areas of diluted deglycosylated samples were compared to those of untreated samples to compensate for glycosylation-related differences in quantification (a comparison of glycosylated and nonglycosylated enzyme samples is exemplified for TrbMan in Additional file 4). The dilutions of samples were in a range to give peak areas of the samples that were comparable to those of the reference protein standard. Importantly, the absence of comparable protein peaks in the vector-only control strains further validates the quantification of the secreted enzymes (see Additional file 1).