OpenFLUX2: 13C-MFA modeling software package adjusted for the comprehensive analysis of single and parallel labeling experiments

Background Steady-state 13C-based metabolic flux analysis (13C-MFA) is the most powerful method available for the quantification of intracellular fluxes. These analyses include concertedly linked experimental and computational stages: (i) assuming the metabolic model and optimizing the experimental design; (ii) feeding the investigated organism using a chosen 13C-labeled substrate (tracer); (iii) measuring the extracellular effluxes and detecting the 13C-patterns of intracellular metabolites; and (iv) computing flux parameters that minimize the differences between observed and simulated measurements, followed by evaluating flux statistics. In its early stages, 13C-MFA was performed on the basis of data obtained in a single labeling experiment (SLE) followed by exploiting the developed high-performance computational software. Recently, the advantages of parallel labeling experiments (PLEs), where several LEs are conducted under the conditions differing only by the tracer(s) choice, were demonstrated, particularly with regard to improving flux precision due to the synergy of complementary information. The availability of an open-source software adjusted for PLE-based 13C-MFA is an important factor for PLE implementation. Results The open-source software OpenFLUX, initially developed for the analysis of SLEs, was extended for the computation of PLE data. Using the OpenFLUX2, in silico simulation confirmed that flux precision is improved when 13C-MFA is implemented by fitting PLE data to the common model compared with SLE-based analysis. Efficient flux resolution could be achieved in the PLE-mediated analysis when the choice of tracer was based on an experimental design computed to minimize the flux variances from different parts of the metabolic network. The analysis provided by OpenFLUX2 mainly includes (i) the optimization of the experimental design, (ii) the computation of the flux parameters from LEs data, (iii) goodness-of-fit testing of the model’s adequacy, (iv) drawing conclusions concerning the identifiability of fluxes and construction of a contribution matrix reflecting the relative contribution of the measurement variances to the flux variances, and (v) precise determination of flux confidence intervals using a fine-tunable and convergence-controlled Monte Carlo-based method. Conclusions The developed open-source OpenFLUX2 provides a friendly software environment that facilitates beginners and existing OpenFLUX users to implement LEs for steady-state 13C-MFA including experimental design, quantitative evaluation of flux parameters and statistics. Electronic supplementary material The online version of this article (doi:10.1186/s12934-014-0152-x) contains supplementary material, which is available to authorized users.

 -the Jacobian matrix that equals the Hessian matrix of the scalar function SSR with respect to variables θ , evaluated at θ θ   -the same value as previous, but at a confidence level of  , where     1 .

SF-1.1. Stoichiometric model and assignment of free fluxes
In 13 C-MFA, a stoichiometric model is based on a mass balance equation for metabolic quasi-steady state conditions: is a flux column-vector, The stoichiometric model is automatically generated by OpenFLUX (2)  , which is obtained from the corresponding part of S after elementary row operations and after column rearrangements with the subsequent reordering of the variables in v . Finally, the u flux column-vector, , is obtained: where the upper elements dep v , , and the lower elements θ , , are assigned as "dependent" and "free" fluxes, respectively. The following equation is derived from Eqs.   , according to their assignment, the "dependent" fluxes could be expressed as a linear combination of the "free" fluxes as follows: , the equation for the flux calculation is finally obtained as follows: Thus, the general solution to the stationary Eq.  The OpenFLUX(2) software essentially follows the above-described procedure of N and θ calculation, with the following differences, which improve computational stability: First, matrix N of the orthonormal basis vectors of the null space of S is obtained using singular value decomposition.
Because matrix N is composed of vectors of a specific basis of the null space of S , this matrix can be acquired from the N matrix using the following linear transforms. First, the N   matrix is obtained by swapping rows in N , according to the reordering of the initial flux column-vector. Finally, N is acquired from the row-reduced echelon form of N   using elementary row and column operations.
In OpenFLUX (2) Furthermore, some of the rest of the model-based fluxes can also be automatically included in the set of free fluxes. Indeed, let i v and j v , , be the elements of column-vector v , which correspond to the forward and reverse fluxes, respectively, of one of the t bi-directional reactions.
Then, the following equality of the stoichiometric coefficients must be accomplished for the arbitrary l -th metabolite, m l   1 : Thus, the i -th and j -th columns of the S matrix are linearly dependent, indicating that all t columns corresponding to  v , for example, could be    -9] to improve the calculation efficiency [S-3, S-8]. In this situation, , where the k i e , entry of each k e is the following: Then, two linear programming problems must be solved to provide the dependent flux variability analysis, i.e., to establish the available lower and upper bounds ( distributed points has not been implemented in OpenFLUX2 yet (but already realized in high-performance 13CFLUX2 software [S-11] inequality.

SF-1.3. Flux calculations using regression analysis
Usually (see e.g., [S-11]), it is assumed that the noise vector, mea δ , is normally distributed with an expectation vector , where E is the expected value operator, with a measurement variance-covariance matrix, x  , which is expressed as follows: Notably, the correct bias estimation and the removal of the background signals from "raw" MS-mediated data, in particular, are crucial for the evaluation of mea MID x , which is consistent with the assumed normality of the mea MID δ distribution [S-12].
OpenFLUX(2) assumes individual measurements to be pair-wise, independent, and uncorrelated. Under these assumptions, x  can be considered diagonal, with squared measurement-variances as diagonal elements as follows: Let input x be a column-vector of the labeling data of input substrate(s) and , ..., , 2 , 1 ), , ( is parameterized by the input x function, which describes the relation between the free fluxes θ and the i -th MID fraction of the intracellular metabolite (e.g., proteinogenic amino acids). Then, the following isotopomer balance model could be generated as a nonlinear f vectorfunction, which maps input x and θ to a column-vector of simulated measurements data calc x , Then, the measurement-variance-weighted sum of squared residual errors is assigned as the following objective scalar-function of the θ variables and is parameterized by mea input x x , and mea σ : , which are the vector parameters, could be presented in the following form: and the following constrained NLLSP must be solved to calculate the optimized ) (θ u fluxes, which are the variable parameters of the common metabolic model: If individual LEs introduce significant grouping factors or correlation effects in the flux estimation, then hierarchical regression models, i.e., multilevel models, must be used for flux calculations, with the corresponding application of the modified least-squared approaches [S-19]; however, these strategies were not implemented in the presented software.
One of the possible solutions of the NLLSP expressed by Eqs.   can be obtained using a gradient-based local optimization algorithm [S-20, S-21], i.e., a variation of Newton's method, which is explained in SF-1 Appendix 2.
In contrast, termination can occur not only if the minimization criteria are satisfied but also, for example, if the amount of the iterations, g , which are valid during one trial, is exceeded. The result of such a trial cannot be recognized as an intrinsic solution of the raised minimization problems. In OpenFLUX (2), the reasons for each termination are included in the Optimization report; therefore, only the successfully terminated trials can be considered. The further statistical evaluation of the resulting flux pseudo- , can indicate the existence of feasible unique or multiple optimization solution(s).

SF-1.4. Statistical tests of the model adequacy
The computation of a set of NLLSP solutions, When the flux model adequately represents measurement data, then the arbitrary i -th residuals in the Therefore, according to the 2  -statistics, the null hypothesis, 0  as an indication of model overfitting using the specified set of parameters.
Usually, the effect of overfitting occurs in complex models due to an imbalance between the number of measurements and the number of fitted parameters [S-24].
Initially, to analyze the set of NLLSP solutions obtained in K trials,

SF-1.5. Local linearized statistical approximations
Several important statistical properties of the NLLSP solution could be determined using local linearized approximations of the nonlinear ) , ( θ x f input vector-function in the neighborhood of the convergence point [S-4, S-10, S-12].
The first of these properties is the flux identifiability analysis, i.e., whether the NLLSP has unique or multiple solutions that can be estimated from the available experimental data.
Assuming that the obtained estimations θ  are close to the true free flux values, the nonlinear ) , ( can be linearly approximated using the Taylor's expansion in the neighborhood of θ  : , then the initial nonlinear regression model in Eq.

 
can be approximated using the following linear regression model: , which is assigned as follows: as a solution of the following generalized least squares problem (GLSP): The analytical solution of this GLSP can be found as a solution of the following equation: can be transformed into the following form: which results in the linear estimation of the θ parameters in the neighborhood of θ  as a particular solution of Eq.   with the θ   covariance matrix of the estimated free fluxes as follows: Simultaneously, it is known (e.g., see [S-3, S-12]) that a general solution of a linear nonhomogeneous system (such as in Eq.   , where β is a vector containing arbitrary non-zero entries. Specifically, the solution set to Eq.   , the following equation has to be true: This relation is usually considered one of the convenient ways to compute the approximate covariance matrix of the estimated parameters [S-23].
From the covariance matrix properties and from the relation between the θ and u fluxes expressed by Eq.   , the covariance matrix for the estimated optimal fluxes, u   , can be presented as follows: , u   can be computed as follows: These intervals must be additionally constrained according to Eq.   Another field where the flux parameters evaluated by local linearized statistics play an important role in 13 C-MFA is in computing and applying the so called "contribution matrix", which, in fact, is the squared value of the where the symmetry of the Hessian matrix, i.e.,

SF-1.6. Nonlinear search of flux confidence intervals
The nonlinear-based approach developed by Antoniewicz et al. [S-4] was implemented in OpenFLUX(2) to search for an estimation of the flux confidence intervals, ) ( i lin n u CI   , which is more accurate than the linear approximation, . This approach is based on the assumption that provided measurement's errors data for the flux model used in SLEs (or in PLEs (Exp. (S-1.4.1))) follows normal distribution and correctly weights model's residuals.
Therefore, at a convergence point,  (2) where  -a user-provided constraint tolerance, and the optimization procedure is started from the convergence point,    (2)  It was detected, that in some cases significant bias was introduced in lin n CI   estimation, which made it impossible to compare connfidence intervals for different  (e.g., for certain models it could be observed, that estimated One of the main advantages of the lin n CI   evaluation is its computation speed that is significantly higher than for more precise, but rather slow Monte Carlo-based approach (see, the item 2.4 of Results, and SF-1.7 below). It means, that even the current, not perfect non-linear search algorithm implemented in OpenFLUX(2) could be efficiently used for the fast preliminary evaluation of the flux confidence intervals according to Eq.
, which is assigned as "the normalized flux precision at a determined confidence level of  " or "normalized flux precision", which must be computed in the following manner:  -1.7). The corresponding superscript, 1 , , in the definition of the flux precision function, e.g., , directly indicates the method used for computing the flux confidence intervals.

SF-1.7. Determination of flux confidence intervals using the Monte Carlobased approach
Usually in the present study, the flux confidence intervals were determined using the Monte Carlo-based approach included previously in OpenFLUX [S-6], but substantively modified in the final design of OpenFLUX2 to allow finetuning of procedures parameters and add convergence control.
According to [S-16] . Two different methods were implemented in OpenFLUX (2)   determination with the help of OpenFLUX2. Notably, runs that encounter any problems during optimization step (e.g., the optimization procedure stops before reaching a prescribed convergence threshold) are completely ignored.
Accordingly, the reiterative run solutions are accumulated. The procedure terminates when the size of the accumulated set exceeds a certain threshold value N AS (it is a tunable parameter, and by default N AS =3 in OpenFLUX2).
Additionally, it is possible to constrain the number of runs in one trial. In this case, optimization procedure in each trial will terminate after performing the prescribed number of runs K NR (it is a tunable parameter, and by default K NR =50 in OpenFLUX2), when the set of accumulated run solutions is not filled completely, and the parameters corresponding to the best stored run would be considered as optimized for this trial. This mandatory termination leads to faster performance but can potentially demonstrate poorer results in comparison with the case when the number of run per trial is not restricted.
Certainly, optimization procedure that was performed according to the second strategy, "multiple runs per trial", results in slower performance, but increases the chances of finding the global minimum for each out of L trials and, therefore, better estimates the flux confidence intervals. Namely, this strategy is used as a default approach for Monte Carlo procedure in OpenFLUX2, but the alternative, "one run per trial", method could be used according to the user's choice. Furthermore, in OpenFLUX2, it is possible to fine-tune the TT,  , N AS and K NR parameters of the above-described procedure Therefore, one of the above-described strategies is used iteratively to generate a set of independently obtained flux distributions of size L (Eq. ). To determine the flux confidence intervals on the basis of these distributions, two different approaches are implemented in OpenFLUX2.
According to the first approach, "discarding" strategy, for each , ,..., 1 ,  Figure 4 in the main text for clarity).
Alternatively [S-6, S-16, S-17], the "mean-varianced" strategy could be used, where i -th flux mean, , and the unbiased estimator of the variance, Then an absolute spread in the estimated lower (and upper) bounds of flux confidence intervals can be finally calculated as: Consequently, values of a relative spread for all estimated bounds can be determined as: If the obtained relative spread does not exceed the predetermined threshold for the targeted flux, it could be considered that the corresponding bound of   i MC u CI  converged (see, Figure 5 for clarity). OpenFLUX2 allows to fine-tune sliding control "window" to allow the user to balance between computation time and precision of   i MC u CI  determination. Indeed, a user can tune different parameters of "window" (e.g., size, deepness, frequency), modify the value of predetermined threshold for MC CI  bounds. To save computation time, the user can preset the frequency of a sliding control tests (e.g., once per 40 performed trials). Moreover, OpenFLUX2 allows to analyze using the described sliding control "window" strategy  are also retained in OpenFLUX2.

OpenFLUX(2)
Suppose that the metabolic system consisted of t bi-directional (reversible) reactions and t n s 2   unidirectional (irreversible) reactions. It is possible to reorder the variables in the initial flux vector v for generating the following vector: be a new net forward flux vector that is determined as follows:   A new matrix, S   , could be obtained due to reordering the linearly dependent columns in the right part of the modified matrix and to the Gaussian elimination-mediated conversion of S to the following form: