Versatile modeling and optimization of fed batch processes for the production of secreted heterologous proteins with Pichia pastoris

Background Secretion of heterologous proteins depends both on biomass concentration and on the specific product secretion rate, which in turn is not constant at varying specific growth rates. As fed batch processes usually do not maintain a steady state throughout the feed phase, it is not trivial to model and optimize such a process by mathematical means. Results We have developed a model for product accumulation in fed batch based on iterative calculation in Microsoft Excel spreadsheets, and used the Solver software to optimize the time course of the media feed in order to maximize the volumetric productivity. The optimum feed phase consisted of an exponential feed at maximum specific growth rate, followed by a phase with linearly increasing feed rate and consequently steadily decreasing specific growth rate. The latter phase could be modeled also by exact mathematical treatment by the calculus of variations, yielding the explicit shape of the growth function, however, with certain indeterminate parameters. To evaluate the latter, one needs a numerical optimum search algorithm. The explicit shape of the growth function provides additional evidence that the Excel model results in correct data. Experimental evaluation in two independent fed batch cultures resulted in a good correlation to the optimized model data, and a 2.2 fold improvement of the volumetric productivity. Conclusion The advantages of the procedure we describe here are the ease of use and the flexibility, applying software familiar to every scientist and engineer, and rapid calculation which makes predictions extremely easy, so that many options can be tested in silico quickly. Additional options like further biological and technological constraints or different functions for specific productivity and biomass yield can easily be integrated.


Background
Modeling of bioprocesses has been pursued since the 1970s, with the aim to rationally optimize processes. While the mathematical description of processes like growth and product formation have been fairly well achieved, it is still not routine practice to design biotechnological production processes based on model prediction. An especially difficult case in this respect is fed-batch as a dynamic system usually not reaching steady state. General attempts to model fed-batch processes have been described (for an overview see [1]). Based on these modeling approaches, optimization of fed batch processes has been attempted using Pontryagin's Maximum Principle [2,3], Green's Theorem [4], or Dynamic Programming [5]. These approaches are rather complex, and they did not find their way in routine application.
A typical case of fed-batch process is the production of recombinant proteins with microorganisms or mammalian cells. While the description of product concentration in the cell mass is rather straight forward (in the case of an intracellular product), it is more complex to predict the kinetics of a secreted product. A typical case for secretion systems are recombinant yeasts [6]. As the production of many proteins in yeasts is quite cost sensitive, it will be highly desirable to have a tool available that allows a simple yet reliable prediction of productivity, process time and product titers. Approaches to optimize fed batch processes for the methylotrophic yeast Pichia pastoris have been described [3,5]. The latter employ dynamic programming by dividing the total process time into a discrete number of intervals, and assigning a value of the specific growth rate µ selected from a discrete set of values. The major drawback of this approach is that the process time is fixed and not an issue of optimization. The algorithm used by this group is complex, and not readily available to others. Zhang and coworkers [3] present an approach based on Pontryagin's Maximum Principle. A general applicability seems hampered by the complex calculation, complicating a simple recalculation with modified data or calculation procedures.
With this work we aimed at the development of an optimization tool for fed batch processes using calculation tools available for every PC. MS Excel allows the approximation of a model by numerically solving equations describing the system, and the optimization of an objective function by modifying defined fields (the decision variables), while different constraints can be defined which have to be complied with. The calculation is based on the generalized reduced gradient method described in [7]. While the general concept of calculation is similar to the approaches above, the definition of the optimization objective -while obviously a crucial step -is not consistently resolved in the existing literature. The variable costs of a bioprocess correlate with the volumetric capacity of the required fermentation unit, and the process time this unit is required to produce a defined amount of the product [8]. Thus the volumetric productivity Q P is the most plausible target for optimization. At a given process time point t, Q P is defined as: Expanding this concept to total manufacturing costs is feasible but depends on a profound and reliable cost calculation. As outlined below, Q P can be calculated from the specific growth rate µ and the specific production rate q P . µ should be one of the decision variables of the optimization (defining the feed rate profile to be developed), while q P depends on µ. The exact function, q P = f(µ), of this dependence for secreted recombinant proteins has been subject to discussion [9]. These authors provide some evidence that secreted protein productivity is saturated at high µ, but a clear experimental solution of this function and its biological basis has not been achieved yet. Zhang et al. approximate this relation by an empirical 3 rd order function [3], while Ohya and coworkers model it by a two step linear function [10]. Both groups base their model functions on rather few experimental data. To improve the accuracy of the model q P = f(µ), we examined the entire space of µ of a P. pastoris strain in chemostat cultures for the respective values of q P , as well as the observed biomass yield coefficient Y' XS in order to calculate the substrate needed for each increment of biomass increase. It has been discussed whether data derived from steady state are applicable to model transient situations as they usually occur in fed batch. The parameters determining the accuracy of a steady state model are the relaxation time constants of environmental changes and of biological processes based on the change in environment, which become critical at highly transient situations like a shift to growth limiting conditions at the end of batch [11]. However, substrate limited fed batch cannot be considered as highly transient, so that the steady state model should be applicable.
A P. pastoris strain expressing the Fab fragment of the anti-HIV antibody 2F5 [12] was employed as a model. As expression is based on the glyceraldehyde phosphate dehydrogenase (GAP) promoter, glucose is used as a substrate for growth. Modeling of the fed batch process and optimization of Q P was used to predict an optimal feed protocol, which was then evaluated experimentally. The model optimization was also solved analytically in order to prove the accuracy of the Excel approximation.

Chemostat
The 2F5 expression strain was cultivated in chemostat at dilution rates D between 0.0086 h -1 and 0.2 h -1 . Steady Q P V t P = ⋅ ( ) 1 state samples were taken after 5 volume changes each. The setpoints were passed through once from high to low dilution rates, and once from low to high dilution rates. Specific production rates q P and observed biomass yield coefficients Y' XS are plotted against D = µ in figure 1. The constants of eq (22), describing q P , were derived by the method of least squares as q Pmax = 0.0735 mg g -1 h -1 and k q = 0.116 h -1 . Y XS = 0.559 and m S = 0.0161 h -1 were derived according to eq (26). The estimated standard deviation of q P is s q = 0.0048 mg g -1 h -1 , and that of Y' XS is S Y' = 0.023.

Standard fed batch
Two independent fed batch cultures using a standard protocol with constant feed rate [12] were performed. The final Fab titer was p = 46 mg L -1 , and the final biomass concentration x = 96 g L -1 , both at a total process time t = 117 h (92 h feed). Fig. 2A shows the development of these parameters over time, while Q P and q P are plotted in Fig.  2B. Apparently Q P has a maximum of 0.31 mg L -1 h -1 at t = 94 h (69 h feed).

Optimized fed batch -model and experimental
Using the optimization algorithm described in Materials and Methods, a feed protocol leading to maximum volumetric productivity was determined. As the maximum final biomass concentration was set to 100 g L -1 , and the feed medium was identical to the standard fed batch, almost the same biomass concentration and total feed volume was to be expected for both processes. Optimal µ and feed rate is plotted over time in Fig. 3. The feed starts with an exponential phase of 3.6 h, followed by a 16 h phase with more slowly increasing feed, which was approximated by a linearly increasing feed rate following the function F L = 0.012 g·h -2 ·t L + 30.672 g·h -1 (2) All values were calculated for a unit batch volume of 1 L and needed to be adjusted to the respective batch volume of 1.2 L. The modeled process was verified experimentally in two independent fed batch cultures. The resultant plots of biomass and product concentrations, as well as the predicted values, are displayed in Fig. 4A, while Q P and q P and their model prediction are shown in Fig. 4B. A final product concentration of 45 mg L -1 was reached after 21 h feed at a biomass concentration of 94 g L -1 . The maximum volumetric productivity Q P was 0.67 mg L -1 h -1 (slightly below the predicted value of 0.77 mg L -1 h -1 ), which is a 2.2 fold improvement over the standard fed batch.

Modeling of the standard fed batch
Using the same equations as for optimization we also attempted to model the standard fed batch process. However, the predicted values of biomass and product concentrations deviated significantly from the experimental data. Therefore we reconsidered the data source used to obtain the functions of Y' XS and q P based on µ. The obvious difference between the standard and optimized fed batch protocol is that the standard culture is performed at very low µ from 0.05 h -1 decreasing to 0.005 h -1 , while the optimized culture starts at µ = 0.2 h -1 , decreasing to 0.05 h -1 .
As saturation functions like the Monod function are more susceptible to low values of the x-coordinate, and the majority of the data from chemostat were naturally obtained at higher µ values, we remodeled the function q P = f(µ) for low µ based on data derived from previous fed batch cultures. The best approximation in the range of µ ≤ 0.05 h -1 was a linear function q P = 0.2051·µ + 0.002 (3) Similarly, Y XS and m S were remodeled in this range of µ ≤ 0.05 h -1 .
Based on these refined approximations, the predictions of biomass and product concentration, volumetric productivity and specific product formation rate fit well to the experimental values (Fig. 5).
Apparently the model based on chemostat data fits well at higher µ, while it needed adjustment at values below µ = 0.05 h -1 . Most importantly, it was valid for the optimized feed protocol derived from the model, which led to a 2.2 fold increase of volumetric productivity. The sensitivity of the model to the accuracy of the function q P = f(µ) stresses the importance of an accurate experimental determination of q P both at low and high µ. Importantly, this function can be refined in future utilizing additional data from fed batch (and chemostat) cultures, so that the model acquires features of a self learning model. q P and Y' XS in chemostat cultures Figure 1 q P and Y' XS in chemostat cultures. Specific product formation rate q P (blue diamonds) and observed biomass yield coefficient Y' XS (red triangles), as well as the respective approximations.

Analytic approach
The calculus of variations yields a method to derive an analytic formula (containing indeterminate parameters) for our optimization problem. Let X = X(t) be the amount of biomass, P = P(t) the amount of product and µ = µ(t) the specific growth rate of biomass at the point of time t. The process to be controlled is described by the following equations: The growth of biomass is modeled by the equation The yield of product is modeled by the equation with a Monod like formula and the initial value P(0) = P 0 .
In a first step we maximize the cumulative yield of product in a fixed time interval [0, T]. Therefore, P(T) → Max is equivalent to P(T) → Max. We therefore consider by formulas (5) and (6). Inserting X'(t)/X(t) for µ(t) results in maximizing the integral This integral has an extremum only if the Euler-Lagrange differential equation Evaluating the Euler Lagrange equation (9) yields Calculating the differential and reducing to a common denominator gives 2·k q ·µ'(t) + µ 3 (t) + k q ·µ 2 (t) = 0 (13) We solve this differential equation and get the following equation for µ(t):

Kinetics of standard fed batch cultures
Eq (14) defines only a necessary condition for the optimum trajectory of µ over time, with indetermined parameter c and indetermined optimal value for the total feed time T. Here the parameter c and the optimal value for the total feed period T depend on the constraints µ min , µ max and q Pmax . Since we know a numerical optimal solution for the growth function, we calculate the value c by fitting the analytic curve (eq 14) to the optimal solution calculated by Excel with the method of least squares. We get as numeric value c = -1.49967812 h. Figure 6 shows the excellent correspondence of the analytic solution and the solution calculated by the Excel Solver. This proves that the Solver solution obeys the necessary condition of the maximization problem, as defined by the Euler-Lagrange equation.

Conclusion
We  consist of an exponential phase followed by a phase of steadily decreasing µ, the analytic approach could only serve as evidence of the correct solution of the optimization problem obtained with Excel Solver. Both the Euler-Lagrange approach used here and Pontryagin's Maximum Principle depend on data fitting to obtain a numeric solution. Given the perfect match of the two approaches presented here, we consider it much more straight forward to apply a numeric data fitting approach directly to the equations of growth and product formation.
The advantages of the procedure we describe here are the ease of use, applying software familiar to every scientist and engineer, and rapid calculation which makes predictions extremely easy, so that many options can be tested in silico quickly. Additional options like further biological and technological constraints or different functions for specific productivity and biomass yield can easily be integrated.
We could prove that the experimental data basis for the functions behind the algorithm is very important. Different to previous work this was taken into account, and especially the sensitivity at very low specific growth rates needs to be highlighted.
The Excel file containing the model and optimization procedure is provided as accompanying file [see additional file 1].

Materials and methods
Unless stated otherwise, all chemicals were purchased from Merck Eurolab and all antisera were from Sigma.

Strain
A P. pastoris strain X33 (wild type strain) expressing extracellularly the Fab fragment of the anti-HIV antibody 2F5 under control of the GAP promoter was used in this study. The development of this strain has been described elsewhere [12]. A cell bank of the strain was prepared, divided in 1.8 mL aliquots and stored at -80°C.
Optimized time course of specific growth rate Figure 6 Optimized time course of specific growth rate. Overlay of the solution obtained by Solver (blue circles) and analytic solution (red line).

Fermentation
A shake flask containing 100 mL of YPG medium (per liter: 10 g yeast extract, 10 g peptone, 10 g glycerol) was inoculated with one cryovial from the P. pastoris cell bank, and incubated at 28°C for approximately 24 hours and agitated at 180 rpm.
This culture was used to inoculate the starting volume in the bioreactor to a starting optical density (OD 600 ) of 1.0. Depending on the operation mode the starting volume was either 1.2 L for fed batch or 1.4 L for chemostat process.
Fermentations were carried out in a 2.0 L working volume bioreactor (MBR; Wetzikon, Switzerland) with a computer based process control (ISE; Vienna, Austria). Fermentation temperature was controlled at 25°C, pH was controlled at 5.0 with addition of 25% ammonium hydroxide and the dissolved oxygen concentration was maintained above 20% saturation by controlling the stirrer speed between 600 and 1200 rpm, whereas the airflow was kept constant at 100 L h -1 .
The media were as follows: Batch medium contained per liter: After approximately 24 hours the batch was finished and -depending on the fermentation strategy -the feed and if required the harvest was started.
The continuous fermentation was initiated at a D = 0.15 h -1 and performed at least for 5 resident times τ to reach steady state conditions.
Then the dilution rate was decreased stepwise, always achieving steady state conditions before the next change of the dilution rate. At D = 0.0086 h -1 the procedure was reversed and the dilution rate was increased stepwise up to the critical dilution rate D crit = 0.2 h -1 . Samples were taken after 3 and 5 τ and analyzed as described below.
The standard fermentation strategy was a fed batch with a constant feed, this means that the batch phase was followed by the glucose fed batch with a feed rate F = 8.925 g h -1 . The fermentations were terminated at appr. t = 120 h. Samples were taken frequently and processed as described below.
The optimized fermentation strategy consists of different phases to perform the calculated growth kinetic. The batch phase was followed by an exponential feed phase with a growth rate of 0.2 for 3.6 hours, followed by a linearly increasing feed rate calculated by equation (16), where k = 0.0144 g h -2 and d = 36.8064 g h -1 for 16.0 hours.

Optical density
The samples were diluted in ddH 2 O up to 1:500 to measure the OD at 600 nm.
Biomass determination 2 × 5 ml culture were centrifuged and the supernatants frozen for further analysis. The pellets were resuspended in ddH 2 O, recentrifuged, and the pellets again resuspended in ddH 2 O, transferred to a weighed beaker, dried at 105°C until constant weight.

Method of calculation 1. Setup of calculations
We divide the total feed period in equal intervals [t n , t n+1 ] (1 ≤ n ≤ N) of length dt. Therefore, t n+1 = t n + dt (17) We start with an initial value dt = 1 [h]. The best value for dt is determined within the optimization process.
At every point of time t n we denote by X n = X(t n ) the amount of biomass and by P n = P(t n ) the amount of product in the bioreactor. At the beginning of the fed-batch process the initial values are X(0) = X 0 and P(0) = P 0 , as achieved at the end of the batch phase.
First we have to describe the growth of the biomass. We use the simplest model, the exponential growth model, Since the specific growth rate µ of the biomass depends on time, we calculate (eq. 18) in discrete time steps where µ n is the specific growth rate during the interval [t n , t n+1 ]. The initial values for µ n are chosen arbitrarily, for instance µ n ≡ µ max . The optimal values for all of the µ n 's are determined within the optimization process.
Second we have to describe the accumulation of the product. We simply calculate the total product yield during the interval [t n , t n+1 ] by the following formula P n+1 = P n + dP n (20) with dP n = q Pn · X n ·dt (21) The relationship between the specific rate q P of product formation and the specific growth rate µ was experimentally determined in chemostat cultures. The dependence of q P on µ was described analogous to Monod equation: The values for q Pmax and k q are derived from the experimental data by the method of least squares, i.e. the parameters q Pmax and k q are chosen that the sum of the deviations from the experimental data squared is minimal.
Next we have to calculate the amount of substrate dS which we must feed in the time interval [t n , t n+1 ]. To do this, let S n be the amount of substrate added to the bioreactor until the time point t n . Then the substrate consumption rate depends on the amount and on the increase of biomass, i.e.
where m S is the maintenance coefficient and Y XS is the true yield coefficient of biomass from substrate. Inserting formula (18) in (23) the amount of substrate feed in the interval [t n , t n+1 ] calculates as To calculate the parameters Y XS and m S from experimental data of chemostat cultures by the method of least squares, we use the observed biomass yield coefficient Y' XS depending on the specific growth rate µ. This is done by dX = - 27 ρ the culture supernatant. In order to calculate the product concentration, the available liquid volume V l is calculated at each time interval with the specific volume of wet biomass, which is derived from dry biomass as the specific volume per dry biomass ν YDM = 0.0033 L g -1 .
V ln = V n -X n ·ν YDM (28) Finally, we calculate the biomass and the product concentrations. The product concentration p at the time point t n is calculated as and the biomass concentration x at the same time point is The medium feed rate F n at each time point is These values are used to determine the feed rate profile of the optimized fed batch process.

Optimization
The goal of our optimization problem is to find the best values for the specific growth rates µ n and the best value for dt (which implies that the total feed period undergoes the optimization process too) such that the volumetric productivity Q P calculated at the point of time t N+1 as is maximized under the following constraints: µ min ≤ µ n ≤ µ max for (1 ≤ n ≤ N) (33) and X(t N+1 ) = X max (34) Here µ max = 0.2 h -1 is the maximum specific growth rate at just below washout in chemostat cultures. Since below µ = 0.02 h -1 significant product degradation appeared, the lower boundary was set at µ min = 0.03 h -1 . Also the biomass concentration needs to be limited. The upper limit is mainly defined by the cell separation step, which is practically limited with approximately 100 gL -1 dry mass.

Remark
Additional constraints may be entered, e.g. the final product concentration may be set at a minimum level.
In the Excel sheet we set N = 150. The values t n , µ n , X n , ... are organized in columns, with each time point t n ... a row. The values of X n , P n , V n , ... are calculated from the respective previous row using the equations provided above. The optimization process is performed by the Excel Solver as a black box. It maximizes the final Q P field by varying the µ fields within the boundaries and the dt field.
The Excel file used for this work is provided as an additional file.

Analytic approach
To verify the Excel Solver solution, the exact solution of the optimization problem was determined with calculus of variation.

List of symbols
The symbols, their definitions and units are summarized in table 1.