Research  Open  Published:
OpenFLUX: efficient modelling software for ^{13}Cbased metabolic flux analysis
Microbial Cell Factoriesvolume 8, Article number: 25 (2009)
Abstract
Background
The quantitative analysis of metabolic fluxes, i.e., in vivo activities of intracellular enzymes and pathways, provides key information on biological systems in systems biology and metabolic engineering. It is based on a comprehensive approach combining (i) tracer cultivation on ^{13}C substrates, (ii) ^{13}C labelling analysis by mass spectrometry and (iii) mathematical modelling for experimental design, data processing, flux calculation and statistics. Whereas the cultivation and the analytical part is fairly advanced, a lack of appropriate modelling software solutions for all modelling aspects in flux studies is limiting the application of metabolic flux analysis.
Results
We have developed OpenFLUX as a user friendly, yet flexible software application for small and large scale ^{13}C metabolic flux analysis. The application is based on the new Elementary Metabolite Unit (EMU) framework, significantly enhancing computation speed for flux calculation. From simple notation of metabolic reaction networks defined in a spreadsheet, the OpenFLUX parser automatically generates MATLABreadable metabolite and isotopomer balances, thus strongly facilitating model creation. The model can be used to perform experimental design, parameter estimation and sensitivity analysis either using the builtin gradientbased search or Monte Carlo algorithms or in userdefined algorithms. Exemplified for a microbial flux study with 71 reactions, 8 free flux parameters and mass isotopomer distribution of 10 metabolites, OpenFLUX allowed to automatically compile the EMUbased model from an Excel file containing metabolic reactions and carbon transfer mechanisms, showing it's userfriendliness. It reliably reproduced the published data and optimum flux distributions for the network under study were found quickly (<20 sec).
Conclusion
We have developed a fast, accurate application to perform steadystate ^{13}C metabolic flux analysis. OpenFLUX will strongly facilitate and enhance the design, calculation and interpretation of metabolic flux studies. By providing the software open source, we hope it will evolve with the rapidly growing field of fluxomics.
Background
Metabolic flux analysis (MFA) plays a central role in metabolic engineering and systems biology [1]. Metabolic fluxes most closely reflect the underlying metabolic phenotype, whereas other 'omics approaches only yield a sense of metabolic capacities (transcriptomics/proteomics) or thermodynamic driving forces (metabolomics). Metabolic flux analysis is particular important in rational strain engineering, where we specifically seek to manipulate the metabolic phenotype.
Due to the high complexity of the examined metabolic network, flux analysis typically involves the use of a stoichiometric model, in which the metabolic reactions available to the cell are parameterized before the fluxes are estimated from experimental data [2]. Stateofart flux analysis today includes the use of stable isotopes to overcome problems such as incomplete resolution of important cellular pathways or the need to rely on stoichiometric parameters with high uncertainty such as ATP yield (Y_{x/ATP}) or P/O ratio which are inherently linked to the purely stoichiometric approaches [3]. ^{13}Cbased MFA therefore is a powerful extension of MFA [3]. In such studies, after feeding ^{13}Clabelled substrate(s), one measures the ^{13}C tracer enrichment patterns of metabolites that are rich in flux information, using instruments such as nuclear magnetic resonance spectroscopy (NMR) [4, 5] or mass spectrometry (MS) [6]. There are mainly two different approaches to extract flux information from the labelling patterns: by modelbased flux fitting [3], and by analytical interpretation of flux ratios [7] (both approaches briefly reviewed in [8]). Redundant pathways that contribute differently to tracer distribution can thus be resolved. Flux analysis is carried out independently from energy and redox balancing, because the balancing equations only involve the carbon backbone. Conversely, the flux results can be used to check the consistency of energy and redox balances [9].
There has been significant development especially concerning the experimental framework for ^{13}C MFA [10]. ^{13}C MFA has been applied to various prokaryotic and eukaryotic systems [11–13] involving miniaturized screening studies in small scale [14, 15]. There is an increasing trend towards largescale networkbased stationary ^{13}C MFA [16, 17], as well as nonstationary (i.e., dynamic) ^{13}C MFA [18–20]. Largescale metabolic models are preferred in order to capture, as many reactions as possible, bearing effects on carbon labelling, and to maintain global consistency of flux estimates. Considering metabolism in isolated parts or using overly summarized metabolic models can lead to biased results [17]. However, specifying large sets of isotopomer balances and subsequently performing parameter estimation can be very cumbersome.
Several software packages have been developed to facilitate flux analysis, the most popular being FiatFlux [21] and 13CFLUX [22]. FiatFlux implements the flux ratio approach to ^{13}C MFA [7] and comes preconfigured to derive flux ratios and net fluxes for [1^{13}C] and [U^{13}C]glucose experiments and GCMS analysis of proteinogenic amino acids for several microorganisms. Recent developments allow to generate equation systems automatically [23], which facilitates the extension of the flux ratio approach to various metabolic models, input substrates and labelling data.
In contrast, 13CFLUX is a general purpose package for modelling, simulation, design, evaluation, and statistical analysis of ^{13}Clabelling experiments [22]. Unfortunately, 13CFLUX is relatively cumbersome to use in terms of requiring the user to specify free fluxes, to set up the initial solution, and to manually initialize and terminate each optimization. It is not possible to perform multiple rounds of optimization unsupervised, which is frequently used to check convergence of the optimization results. There is a general lack of support in aspects of experimental design, i.e., explore change in labelling patterns for a different flux distribution and/or various combinations of input substrates. For expert users, there is limited opportunity to modify source code for implementation of new algorithms and workflows for different labelling problems.
There is a need for ^{13}C MFA tool that is simple, flexible and transparent. Fast computation is crucial. For a nonexpert user, the software must enable a smooth reproducible workflow covering the whole process from metabolic model definition to flux estimation. A flexible approach necessarily supports userdefined metabolic systems, while a transparent computational model offers expert users the opportunity to tailor make downstream algorithms for parameter estimation and statistical analysis.
To meet this challenge, we have developed OpenFLUX, a simple yet flexible application to perform steadystate ^{13}C MFA using mass isotopomer distribution data. OpenFLUX provides the user a versatile and intuitive spreadsheetbased interface to control the underlying metabolite and isotopomer balance models used for flux analysis and allowing for the implementation of largescale metabolic networks. The user then has the option of using the accompanying algorithm package for flux estimation and sensitivity analysis, or applying alternative numerical approaches for flux analysis (e.g., [24]). OpenFLUX generates isotopomer balance model based on the EMU decomposition algorithm [25]. Using EMU variables is computationally more efficient because the number of necessary isotopomer balances is significantly reduced [25] compared to alternative representations of labelling distribution of metabolites, such as AAV (atom activity vector) [26], IDV (isotopomer distribution vector) [27], cumomer [28] and bondomer [29].
The present work describes the implementation and validation of OpenFLUX. Specifically, we explain the tasks performed by OpenFLUX, provide an illustration of the model definition setup of a hypothetical metabolic model, and also describe the structure and contents of the resulting metabolic models. The software is then validated by reproducing published ^{13}C MFA results.
Methods
Implementation of OpenFLUX
OpenFLUX consists of two parts for (i) automated model setup or modification from user entered reaction data and (ii) the application to flux analysis by calculating fluxes from experimental data as well as statistical evaluation (Figure 1). The first part consists of a parser that automatically generates the metabolite and isotopomer balance models from a textbased model definition. The second part is a structured workflow that implements a series of numerical optimization algorithms for flux parameter estimation and sensitivity analysis.
Automated generation of metabolic models
OpenFLUX parses the metabolic model from a structured tabdelimited text file, created using a spreadsheet program, into MATLAB readable metabolite and isotopomer balance models. The former is the stoichiometric matrix (S) in the mass balance
while the latter is a nonlinear function (F) that maps input substrate(s) mass isotopomer distribution(s) (MID) () to the simulated MIDs () using the flux distribution () as a parameter
The generation of the isotopomer balance model closely follows the recently developed EMU framework [25], whereby the model generation algorithm progressively constructs EMU balances by identifying all reactant EMUs that contribute to a given product EMU. The algorithm progresses until each of the simulated EMU products can be traced to the input substrate EMUs via a series of balancing equations. These equations are subsequently organized into matrix equations (see [25] for details).
Assignment of free fluxes
The solution to (1) can be expressed as a linear combination of free fluxes () [28]
where NS is the null space of S and the dimension of equals the nullity of S, i.e., degreesoffreedom in the system. It is these free fluxes that the optimization program will manipulate in order to achieve the best fit to the measured data.
The choice of free fluxes in (3) is not unique. In OpenFLUX, the flux vector is first decomposed into four different reaction types: bidirectional forward and reverse reactions ( and ), and irreversible reactions that are either free fluxes () or dependent fluxes (), and rearranged to the form
Using this form, the reduced column echelon form of the nullspace matrix (NS)
explicitly maps the free fluxes defined as
to the full flux vector, .
This transformation confers two main advantages. Primarily, the assignment of free fluxes is automated to include all reverse fluxes and a subset of irreversible fluxes (6). The reverse fluxes are determined from the spreadsheet, where the user has identified the bidirectional reactions in the network, and, for each bidirectional reaction pair, the forward and reverse counterpart. Additional information may be specified, such as known reaction rates derived from experimental data (e.g., biomass precursor drain or extracellular rates), or the user's preference for an irreversible free flux assignment. OpenFLUX will then prioritize the assignment of free fluxes, but ultimately, the assignment is determined by the stoichiometric matrix.
A secondary advantage is that is calculated explicitly from in a single matrix operation (5). This circumvents the use of the stoichiometric matrix as an implicit constraint during optimization [17]. The formulation of the equation system used for calculation of absolute fluxes is greatly simplified compared to the convention proposed by Wiechert and de Graaf (1997) [28]. Furthermore, analytical derivation of the gradient matrix is straightforward (i.e., ).
We note that the flux coordinates for bidirectional fluxes, v^{←} and v^{→}, while representing a natural choice, differ from the conventional flux coordinates, net flux (v^{net}) and exchange flux (v^{xch}), used in flux analysis [28]. Both flux coordinates can be compactified to the same effect whose main purpose is to improve the numerical performance. The rationale for the change in coordinates is detailed in the Appendix. It is straightforward to interchange between the coordinates using:
Flux calculation via numerical optimization
OpenFLUX uses FMINCON, a gradientbased minimization search function contained in MATLAB's Optimization Toolbox, to perform both flux parameter estimation and sensitivity analysis. FMINCON utilizes a quasiNewton sequential quadratic programming (SQP) method for constrained, nonlinear optimization. The algorithm is wellsuited for metabolic flux analysis where physiologically meaningful boundaries exists for the free fluxes [22, 30].
For parameter estimation (Figure 2), the program searches for values of the free fluxes within the domains of the flux constraints in order to minimize the weighted sum of squared residual errors () between experimental data () and calculated values (), where the weight is the inverse of the variancecovariance matrix (D)
The program assumes measurements to be uncorrelated, i.e., D contains as a diagonal the individual measurement variances (). The constraints applied to the optimization are based on the definition that all fluxes must be positive, since reversible reactions are separated into the forward and reverse components
The optimization is also bounded to improve numerical stability. The irreversible fluxes are nonnegative and upward bounded by the net flux in the system. The upper bound (UB) is bydefault set to 20 times the maximum substrate uptake rate, but can be varied
The reverse free fluxes are nonnegative, but have no natural upper limit. Hence, a compactification operation is used to transform from a physical [0, infinity] scale to a numerical [0, 1] scale [28].
A scaling factor (P) is used, whose value is typically of the same orderofmagnitude as the largest input substrate flux [31]. Alternatively, P can be adjusted during optimization to obtain a better conditioned Jacobian matrix [24].
Statistical evaluation of flux data
Flux statistics, e.g. confidence intervals of single flux parameters, are important to evaluate flux data and determine if observed differences between mutants or culture conditions reflect physiological differences. Confidence intervals for flux parameters (Para_{ i }) are determined using the nonlinear approach developed by Antoniewicz et al. (2006) [30]. This approach provides a more accurate estimate of flux uncertainty than local estimates of the standard deviations. Briefly, the approach employs that the minimized varianceweighted sum of squared residual is approximately χ^{2} distributed and therefore the difference (Δε;) between the objective function evaluated at the optimal solution and the objective function when one flux is fixed follows a χ^{2}distribution with one degree of freedom. Accordingly, an approximate (1α) confidence interval for Para_{ i }can be defined by the two solutions for para_{io} to
FMINCON is used to search for the solution pair, and each search is initialized from the optimum solution found during parameter estimation. The constraints applied in parameter estimation, (10)(12), are also applied here ensuring that the boundaries to confidence interval represent feasible solutions. Analysis can be performed on different forms of Para_{i}, such as free or dependent flux, flux ratio, or reversibility ratio. Alternatively, flux standard deviations can be estimated using the Monte Carlo approach included in OpenFLUX [32].
Flexible use for different types of experimental labelling data
OpenFLUX automatically indicates to the user the input substrate EMUs that must be specified prior to label simulation. Specifying these EMUs is done either by directly defining the vector elements of the input substrate EMUs, or indirectly using one of OpenFLUX's builtin functions. These functions calculate the corresponding vector elements using either the atom activity vector (AAV) or the isotopomer distribution vector (IDV) of the input substrates, both of which are well known notations in flux modelling [26, 27].
The simulated output vector () can be modified by the user to suit the experimental data type. The mathematical manipulations that can be performed on include (i) integrating mass interference from nonbackbone stable isotopes using Cauchyproduct [33], (ii) normalizing and truncating EMU variables to match the length of a shorter MID vector (typically 2 to 3 elements per metabolite), (iii) converting EMU variables into summed fractional labelling (SFL), and (iv) include or exclude specific element in an EMU variable (e.g. to represent positional enrichment data). These modifications are not automated by the software because they can be very diverse.
An EMU variable is essentially a MID vector. One can immediately simulate mass spectroscopy data using EMU variables. It is, however, possible to generate NMR fine spectra from EMU variables (detailed in [25]). OpenFLUX currently does not support automatic generation of NMR fine spectra, thus the user needs to list all essential EMU variables required for the transformation process during model set up and enter the transformations manually.
Design of experiments
Not all intracellular fluxes can be resolved in a single labelling experiment, but careful experimental planning can ensure that key unknown fluxes are determinable [34, 35]. Determining the optimal labelled substrates to use and optimal metabolites to measure is a nontrivial task, often requiring sophisticated design strategies [31, 36–38]. A common theme emerging from these design approaches is that one must be able to visualize the relationship between flux distribution, input substrates used and labelling response measured. OpenFLUX partially supports experimental design by allowing user to perform forward label simulation, that is to predict the labelling state of specified EMUs based on fluxes and input labels. Using MonteCarlo simulation to explore fluxes in the expected range, it is possible to establish – for a given input label – what EMUs are most responsive to changes in flux, and hence what EMUs should preferably be measured.
Excluded metabolites and reactions
The spreadsheet based model specification allows the user to specify input substrate(s), simulated EMU variables as well as metabolites excluded from the stoichiometric matrix.
Five reaction types are supported. In addition to irreversible ("F") and reversible forward ("FR") and reversible reverse ("R") fluxes, it is possible to specify reactions used only for metabolite balancing ("B") and reactions used only for isotopomer balances ("S"). The atom transition equation for "B" type reaction is not required so that only relevant information has to be generated prior to the calculation. Type "S" reaction is a convenient approach to map a product's MID to their respective precursor(s) without incurring additional degreesoffreedom in the metabolite balance model. Metabolites unrelated to the isotopomer balances, such as ATP and NADH, are marked with "X" in the atom transition equation. This allows user to set up a metabolic model that includes cofactor balances.
Test case flux analysis TCA cycle
OpenFLUX' semantics and algorithms were tested using a small network. The metabolic network is a representation of a condensed TCA cycle (Figure 3, 4), which also includes gluconeogenic (R9) and anaplerotic (R10) fluxes. The network has 2 input substrates (pyruvate (R1), glutamate (R2)), 3 biomass precursor drain fluxes (oxaloacetate (R18), αketoglutarate (R17), pyruvate (R16)), and 1 reversible reaction (R7 – forward, R8 – reverse) between succinate and oxaloacetate. Oxidative phosphorylation reactions are also included (R11, R12). It is assumed that the CO_{2} efflux (R14) is unidirectional [see Additional file 1]. The synthetic experimental data assumed available consists of the production rates of valine and lysine (R19 and R20), the precursor withdrawal rates from pyruvate, αketoglutarate and oxaloacetate towards biomass formation, and values on the relative fraction of the three mass isotopomers M_{0}, M_{1} and M_{2} for valine, lysine, aspartate and succinate. The labelling data were corrected for mass interference from noncarbon backbone atoms. Fluxes were normalized to the pyruvate input flux of 1 mmol g^{1} h^{1}. A detailed instruction of how to formulate, set up and implement the model using OpenFLUX is provided in the Appendix.
Metabolic model for lysine producing Corynebacterium glutamicum
We reconstructed the metabolic network of lysine producing C. glutamicum based on published information [39] and additional modelling details kindly provided by the authors [see Additional file 2]. The input substrate used was [1^{13}C]glucose (with 99% enrichment purity), and all fluxes were normalized with respect to the glucose uptake rate (i.e., fluxes are expressed in percentage of glucose uptake rate). As the published MIDs are uncorrected, all the simulated EMU variables were modified for mass interference from noncarbon backbone isotopes using the molecular formula of the amino acids fragments (i.e., parent ion cluster). The first n+1 signal elements were normalized (n indicates number of backbone carbon), and then truncated to the correct vector length (equivalent to the measured MIDs) before performing weighted leastsquare analysis. The inferred metabolic model consisted of a total of 71 reactions and 42 balanceable metabolites. The metabolite model yielded a total of 26 degreesoffreedom and 18 fluxes were determined experimentally: anabolic precursor yields (11), biomass yield (1), secreted product yields (5), and glucose uptake rate (1). To reduce the number of unknown parameters, these 18 fluxes were chosen as free fluxes, and the associated flux values were used deterministically as no redundant data exist in the measurement set. Note that if one suspects gross measurement errors in the flux measurement set, then these fluxes should be set free and the flux values subjected to the leastsquare analysis together with the MIDs. Five (5) of the remaining 8 free fluxes are associated with the reversibility of nonoxidative pentosephosphate pathway enzymes (3), glucose6P isomerase (1) and intercellular CO_{2} exchange (1). The other 3 free fluxes were assigned (by the software) to the irreversible fluxes of glucose6P dehydrogenase, pyruvate carboxylase, and glycine synthesis via the serine route.
The MID of 9 amino acids and trehalose were reported. Three "S" type reactions were included in the metabolite network to directly map label distribution of alanine, aspartate and glutamate to pyruvate, oxaloacetate and αketoglutarate, respectively. This was not necessary for all other amino acids and trehalose because these metabolites were already described in the isotopomer balances.
Computational requirements
All computational work was performed on a Pentium D 3.00 GHz computer. OpenFLUX was implemented in MATLAB 7.3.0.267 (R2006b) (The MathWorks, Natick, MA, USA). Numerical optimization was carried out using FMINCON function from MATLAB's Optimization Toolbox. Since we used numerical gradient, FMINCON automatically used the activeset algorithm (also known as "medium scale"). Termination tolerance on the function value was set to 1 × 10^{4}, and the optimizations were terminated when the magnitude of directional derivative in search direction is less than 2 × 10^{4}. 13CFLUX was implemented in Ubuntu Linux 7.1 operating system.
Results and discussion
Test case flux analysis TCA cycle
The specified network (Figure 3) has 9 degreesoffreedom. Of the required 9 free fluxes, 7 were assigned by default from the definition: R1 (specified input flux), R8 (reverse flux), R16, R17 and R18 (specified biomass precursor drain fluxes), and R19 and R20 (specified amino acid production flux). R2 was specified as a preferred free parameter in the model definition, and – since this was an acceptable choice – was assigned by the parser. OpenFLUX automatically assigned the remaining free parameter to R9. Note that the correct NS structure (Eq. 3) is automatically produced (Figure 5).
The parsergenerated isotopomer balance model consists of 50 EMU variables organized into 7 matrix equations, of which 14 are known input substrate EMUs. With the EMU approach, the total number of unknown scalar MID variables (116 variables) involved in the optimization is significantly less than a complete isotopomer model (262 variables).
The model created by OpenFLUX was validated for the TCA cycle test case by comparing the labelling data simulated for a given set of fluxes with cumomer based simulation (i.e., 13CFLUX) [22]. Forward simulation based on hypothetical fluxes yielded identical labelling data results for all cases (data not shown). This means that the semantics used to specify the reaction network and the underlying isotopomer balance model constructed by OpenFLUX via the EMU decomposition algorithm is consistent.
The labelling patterns obtained for the test case were next corrupted with 1% Gaussian noise, and subjected to leastsquare analysis using 13CFLUX [22] and OpenFLUX. Randomized initial solutions were used to start each optimization. All approaches yielded identical optimum solutions, consistent with the hypothetical values (Figure 3). This is an essential performance benchmark, since new semantics were introduced to define reversible and scrambling reactions.
The introduction of "X" type metabolites and "S" type reactions in OpenFLUX' flagging system for metabolites and reactions has several advantages. In this example, OpenFLUX automatically calculated the cofactor balances and the required oxygen uptake rate based on a given P/O ratio of 3 (Figure 5). OpenFLUX could also use the labelling input from aspartate, without introducing this amino acid as a balanceable metabolite. With 13CFLUX, we had to artificially create a new reaction withdrawing a fraction of oxaloacetate to constitute the flux of aspartate synthesis and allow the consideration of its labelling pattern for flux estimation.
Real case flux analysis – lysine producing C. glutamicum
The performance of OpenFLUX was tested for a real case flux scenario, comparing a low and a high lysine producing mutant of the soil bacterium C. glutamicum [39]. For these mutants, fluxes were previously quantified combining mass balancing with tracer experiments on [1^{13}C]glucose, GCMS ^{13}C enrichment analysis of proteinogenic amino acids [10, 40], and isotopomer mapping matrices based simulation.
The isotopomer model for C. glutamicum is comprised of 108 unknown EMU variables, with 15 additional known [1^{13}C]Glucose and CO_{2} EMUs. A total of 20 balancing matrix equation sets were used to calculate the unknown EMUs, 9 of which are single equation balances that are easily calculated. The 108 unknown EMU variables correspond to 360 scalar MID variables, which is a substantial reduction from the 8380 unknown scalar variables expected in the full isotopomer model.
A total of 50 cycles of optimization were performed for both wild type and engineered strain datasets. The total computation time for each cycle was about 16 seconds. The calculated MIDs shown in Table 1 were derived from one of the optimization cycles that showed the smallest weighted sum of square residuals. All of the MIDs calculated by OpenFLUX were consistent with the published data (Table 1). The optimized free fluxes were also reproduced (Table 2). Comparing the 90% confidence intervals reported by Becker et al. [39] with the gradientbased search intervals generated by OpenFLUX, we could infer that there is no significant difference between the estimated free fluxes. In addition to the gradientbased search, we used the MonteCarlo approach to generate the 90% CI [32, 41] (Table 2). Briefly, the analysis was carried out by corrupting both the MIDs measurements with the prescribed relative error and the reported experimental yields for biomass and byproducts with the corresponding deviations. The new data set was then used for parameter estimation. Optimization was performed iteratively until a sample size of 150 independent flux distributions was obtained. The 90% CI for a single parameter was then calculated using the free parameter's mean and variance estimated from the samples [42]. The 90% CIs generated through MonteCarlo simulation were consistent with the original work, as well as with the intervals generated by gradientbased search. Notably, the confidence interval generated by Monte Carlo approach is generally wider than the gradientbased search because the variations in the experimental yields were included in the analysis. The wider 90% CI managed to capture all of the published optimum free fluxes. Overall, this validates OpenFLUX's accuracy with respect to the isotopomer balance model and the numerical approach.
Conclusion
OpenFLUX is a generic and efficient novel software tool for ^{13}C MFA. We have shown that the frontend model setup is relatively simple and intuitive, and the backend optimization is streamlined and is significantly faster than 13CFLUX. Overall, this means that studying largescale metabolic models becomes more tractable, especially for problems that have large number of unknown free fluxes and/or demand more stringent statistical evaluation. The underlying metabolic models are transparent to the user, and could be adapted for other purposes. In addition, the definition of the simulated output vector is flexible, thus various experimental data types can be applied. Finally, the inclusion of two different but complementary confidence interval determination algorithms (i.e., nonlinear and Monte Carlo) enables a more robust evaluation of flux solutions.
OpenFLUX performed well on real metabolic problems. We have applied the C. glutamicum metabolic model as an example problem, and were able to reproduce the calculated MIDs, the optimized flux parameters and the corresponding confidence intervals. We also showed that OpenFLUX is computationally fast.
OpenFLUX is useful for exploring different network topology, flux distribution and modelling assumptions. The application grants the user the ability to control the underlying metabolic models and data inputs via a simple textual interface. Experimental design is critical to justify choice of substrates and analytical techniques. One can troubleshoot potential observability and sensitivity issues by simulating hypothetical MIDs data on a typical range of flux distributions, and subsequently explore the resolution of different flux parameters. OpenFLUX supports testing of various labelled substrates. For incomplete metabolic models, ^{13}C MFA could be used to validate soft assumptions regarding P/O ratio and cofactor balances (ATP, NAD(P)H).
Overall, the purpose of OpenFLUX is to encourage the adoption of ^{13}C tracer studies for flux analysis in newly arising systems approaches. Providing it open source aims on further development of the software as a general platform for ^{13}C fluxomics. ^{13}C MFA requires significant upfront investment to construct the isotopomer balance model and to establish the numerical optimization for flux analysis. Users who are already familiar with MFA will find that ^{13}C MFA is readily implemented once the atom transitions are included into the metabolic model. Lastly, the textbased spreadsheet interface is an effective means of disseminating the metabolic model because the network topology and modelling assumptions are readily found in a single model definition file.
Availability and requirements
OpenFLUX is optimized for MATLAB 7.3 (R2006b) (The MathWorks, Natick, MA, USA) and Java 6 (Sun Microsystems, Santa Clara, CA, USA) on a Microsoft Windows XP platform. It requires the MATLAB Optimization Toolbox. The software is also available for MacOS 10.4.11 (and later) and Ubuntu linux 7.1 operating systems. Microsoft Excel 2003 was used to generate the model input for the Java parser, alternative spreadsheet programs are also useable.
OpenFLUX is released as open source and is available upon request, both compiled and as source code. Supplementary information is available online http://web.aibn.uq.edu.au/cssb/Resources.html.
Appendix
Handling bidirectional reaction using a new flux coordinate
The conventional coordinate system used to describe bidirectional reactions prevents the direct implementation of the nullspace matrix [28]. In this system the net flux (v^{net}) and exchange flux (v^{xch}) are used as flux coordinates, where v^{xch}is mapped to reverse flux (v^{←}) if v^{net}is positive, otherwise v^{xch}is mapped to the forward flux (v^{→}) (Figure 6).
The v^{net}and v^{xch}coordinate system was created for compactification purposes. As v^{xch}can range from 0 to infinity, which represents either irreversible or fully reversible reactions, v^{xch}is compactified using (12). This has been shown to reduce the response curvature of the ^{13}C label with respect to the reaction's reversibility and improve linearization of the model [31]. However, such coordinate system invokes a complicated equation system to calculate from [28].
Here, we introduce a straightforward linear coordinate system to describe bidirectional flux: v^{→} is expressed as v^{←} plus a net forward flux (v^{→}, ^{net}), which can be positive or negative (Figure 6). This linear transformation is directly implemented within NS with a specific structure (5), which is readily created using GaussJordan elimination. According to (5), we find that and , i.e., v^{→}, ^{net}of a given reversible reaction is a linear combination of . This saves us from creating additional equation systems, as seen in the transformation of v^{xch}and v^{net}into v^{→} and v^{←}. NS with the required structure can be reproduced for all metabolic network configurations.
The new v^{→}, ^{net}and v^{←} coordinate system still allows the compactification of v^{←} to the same effect as compactification of v^{xch}; since v^{←}, like v^{xch}, is mapped equally to the forward and reverse fluxes (Figure 6). However, there is one caveat with the coordinate system adopted by OpenFLUX. v^{→} is forced to be a free flux after GaussJordan elimination if all reactions in a branching pathway are reversible. To meet the conditions required for compactification, the nullvector associated with v^{←} is added to the nullvector associated with v^{→}. As a result, v^{→} becomes v^{→}, ^{net}, but v^{→}, ^{net}'s feasible range span both positive and negative domains. OpenFLUX has an algorithm to detect such occurrences and performs the appropriate modification to the free flux boundary value (i.e., UB ≤ v^{→}, ^{net}≤ UB).
How to set up a model in OpenFLUX – test case central metabolism
Model setup in spreadsheet mode
In OpenFLUX, the metabolic model is organized into a table consisting of seven columns, three metabolite lists and two experimental data columns (Figure 4) [see Additional file 1]. The metabolic model configuration and experimental data input are gathered within a single spreadsheet interface, which contains all the information required to perform flux analysis. The parser only reads key active zones of the spreadsheet, thus other annotation information and user comments can be included in the spreadsheet. The first column contains the reaction ID. Arbitrary values can be assigned. It is important to note that the flux vector in MATLAB is organized in the same order as presented in the table (i.e., the v(1) variable represents first reaction flux in the table). The second and third columns contain the reaction equations and corresponding atom transitions. Beforehand, it is important to distinguish reactions significant to the isotopomer balance from ones that are not. The definition of the latter reactions is flexible, and can accommodate floatingpoint coefficients, alphanumeric metabolite names and an unlimited number of metabolite species.
Notation of metabolic reactions
The setup of the reactions relevant to isotopomer balance must obey the following rules: (1) reactant stoichiometry is always one; (2) same stoichiometric coefficients in both the reaction and atom transition equation; (3) only uni or bimolecular reactions. Within these rules, both normal and scrambling reactions can be adequately described. These rules imply that scrambling occurs on the product side of the reaction equation (i.e., R6 and R8). The limitation of reaction order to bimolecular reaction is because the parser is not capable of generating a nested Cauchyproduct for higher order condensation reactions. Nonetheless, higher order reactions can be decomposed into bimolecular reactions. The Cauchyproduct is the convolution of 2 sequences A and B, where the n^{th} element of the product is obtained from
In the ^{13}C tracer context, these sequences relate to the MID of 2 EMUs involved in a condensation reaction (see [25] for an example). A Cauchyproduct function has been built into OpenFLUX.
The semantics for the atom transitions in the third column is based on a common convention [22, 25], alphabetic notations are used to identify a series of carbon atoms in a metabolite, and are used to describe the transfer of the carbon atoms from the reactant to the product [26]. As long as the convention is consistently applied, different conventions could be used to describe the order of carbon atoms, typically seen in different automated carbon fate mapping algorithms [17]. A table of atom transitions in central carbon metabolism is provided [see Additional file 3]. The fourth column is used to display the flux distribution, and is only used for forward simulation of MIDs from the flux distribution (hypothetical or known).
Specification of reaction types of reaction reversibility
The fifth column contains the definition of each reaction type. There are 5 reaction types, namely "F", "FR", "R", "B" and "S". Reversible reactions are decoupled into the forward ("FR") and reverse ("R") flux. This scheme ensures that "R" type reactions (R8) are always assigned as free fluxes. "B", "S" and "F" are used to identify the relevance of a reaction to metabolite balance, isotopomer balance or both respectively. For "F" type reactions, metabolites that are to be excluded from the isotopomer balance are marked with "X" (case insensitive) in the atom transition equation. This is typically used for cofactors, such as NAD(P)H, FADH_{2} and ATP. Biomass drain fluxes (R16, R17, R18) and reactions that do not contribute to the tracer distribution (R11, R12, R13, R15) are classified as type "B" reactions. The atom transition equation for this reaction type is not required. Type "S" reaction is a convenient approach to map a product's MID to their respective precursor(s) without incurring additional degreesoffreedom in the metabolite balance model, such as R21 for production of aspartate from oxaloacetate. This is especially useful when the labelling for a metabolite can be measured, but its production rate is unknown. Aspartate production rate is typically incorporated into the oxaloacetate biomass drain flux (R18).
Allocation of free fluxes
The sixth column contains the allocation of free fluxes for the metabolite balance model. It allows the user to specify an invariant value for a known free flux, or to allocate the preference for a reaction to be used as a free flux. For example, the drains of pyruvate, αketoglutarate and oxaloacetate to biomass were given the values of 0.07, 0.23 and 0.12 respectively. Additional drains for valine and lysine were added to enable subsequent model validation in 13CFLUX. Also, the activity of pyruvate uptake flux is 1 because all fluxes are normalized to this reaction. "X" is assigned to R2 as a preferred free parameter as the reaction represents the glutamate uptake flux. By convention, "X" is assigned to any "R" type reaction.
Consideration of experimental noise for flux calculation
The seventh column carries the experimental standard deviations associated with each of the known fluxes. This allows a known flux value to be used either deterministically, where the flux value is fixed, or as an experimental measurement, where the flux is set free and the flux value is included in the leastsquare analysis. Using flux values deterministically may lead to grossmeasurement error, but can reduce computation time.
Listing unbalanced metabolites, simulated EMU variables and input substrates
Metabolites that are considered external to the system must be identified in order to generate a balanceable stoichiometric matrix. The list comprises the reactants of the system inputs and products of the system outputs. The EMU variables to be simulated were chosen to be the full carbon backbone of valine, lysine, aspartate and succinate. They are written in the form "metabolite name#binary number". "1" and "0" are used to indicate whether the carbon atom at a given position is to be included or omitted respectively. For example, the simulated valine EMU variable is written as "VALX#11111". The experimental measurements and the associated errors are listed in the same order used to list the simulated EMU variables in the model definition file. The input substrates are pyruvate and glutamate, and are placed into another list. Generically, an input substrate is any exometabolite that enters the system boundary and can contribute to ^{13}C isotopomer distribution in metabolites. The ^{13}C enrichment of these metabolites must be known.
Error checking
Before the metabolite and isotopomer models are generated, OpenFLUX inspects the model definition file for potential inconsistencies. Mainly, OpenFLUX checks for consistency in metabolite naming, stoichiometry, carbon atom length for a given metabolite, and for presence of higher order reactions or unbalanced atom transition equation. The inconsistencies are reported to the user, and must be resolved before the metabolic models are successfully generated.
Parameter estimation and sensitivity analysis
Parameter estimation and sensitivity analysis is executed from MATLAB's command line. OpenFLUX is initialized by typing "start13OF". The user then chooses the various optimization tasks to be performed. After choosing the task, a series of command line prompts are given to the user to specify the parameters required by the optimization program. Once completed, the optimization begins.
Abbreviations
 AAV:

atom activity vector
 EMU:

elementary metabolite unit
 GC:

gas chromatography
 IDV:

isotopomer distribution vector
 MFA:

metabolic flux analysis
 MID:

mass isotopomer distribution
 MS:

mass spectrometry
 NMR:

nuclear magnetic resonance
 TCA:

tricarboxylic acid
 :

weighted sum of squared residual errors
 I :

Identity matrix
 NS :

null space matrix
 P:

compactification scaling factor
 S :

stoichiometric matrix
 WSSR:

weighted sum of squared residuals
 :

measurement variances
 :

flux vector
 :

free flux vector
 v ^{net} :

net flux
 v ^{xch} :

exchange flux
 v^{←}, :

reverse flux (scalar, vector)
 v^{→}, :

forward flux (scalar, vector)
 :

irreversible free flux vector
 :

irreversible dependent flux vector
 UB :

flux upper boundary
 :

simulated MID
 :

input substrate MID
 :

measurement MID.
References
 1.
Lee SY, Lee DY, Kim TY: Systems biotechnology for strain improvement. Trends Biotechnol. 2005, 23 (7): 349358. 10.1016/j.tibtech.2005.05.003.
 2.
Stephanopoulos G, Aristidou AA, Nielsen JH: Metabolic engineering: principles and methodologies. 1998, San Diego: Academic Press
 3.
Wiechert W: ^{13}C metabolic flux analysis. Metab Eng. 2001, 3 (3): 195206. 10.1006/mben.2001.0187.
 4.
Szyperski T: Biosynthetically directed fractional ^{13}Clabeling of proteinogenic amino acids. An efficient analytical tool to investigate intermediary metabolism. Eur J Biochem. 1995, 232 (2): 433448. 10.1111/j.14321033.1995.tb20829.x.
 5.
Szyperski T, Glaser RW, Hochuli M, Fiaux J, Sauer U, Bailey JE, Wuthrich K: Bioreaction network topology and metabolic flux ratio analysis by biosynthetic fractional ^{13}C labeling and twodimensional NMR spectroscopy. Metab Eng. 1999, 1 (3): 189197. 10.1006/mben.1999.0116.
 6.
Christensen B, Nielsen J: Isotopomer analysis using GCMS. Metab Eng. 1999, 1 (4): 282290. 10.1006/mben.1999.0117.
 7.
Fischer E, Sauer U: Metabolic flux profiling of Escherichia coli mutants in central carbon metabolism using GCMS. Eur J Biochem. 2003, 270 (5): 880891. 10.1046/j.14321033.2003.03448.x.
 8.
Sauer U: Metabolic networks in motion: 13Cbased flux analysis. Mol Syst Biol. 2006, 2: 62 10.1038/msb4100109.
 9.
Wittmann C: Metabolic flux analysis using mass spectrometry. Adv Biochem Eng Biotechnol. 2002, 74: 3964.
 10.
Wittmann C: Fluxome analysis using GCMS. Microb Cell Fact. 2007, 6: 6 10.1186/1475285966.
 11.
Khairallah M, Labarthe F, Bouchard B, Danialou G, Petrof BJ, Des Rosiers C: Profiling substrate fluxes in the isolated working mouse heart using 13Clabeled substrates: focusing on the origin and fate of pyruvate and citrate carbons. Am J Physiol Heart Circ Physiol. 2004, 286 (4): H14611470. 10.1152/ajpheart.00942.2003.
 12.
Krömer JO, Sorgenfrei O, Klopprogge K, Heinzle E, Wittmann C: Indepth profiling of lysineproducing Corynebacterium glutamicum by combined analysis of the transcriptome, metabolome, and fluxome. J Bacteriol. 2004, 186 (6): 17691784. 10.1128/JB.186.6.17691784.2004.
 13.
Schwender J, Ohlrogge J, ShacharHill Y: Understanding flux in plant metabolic networks. Curr Opin Plant Biol. 2004, 7 (3): 309317. 10.1016/j.pbi.2004.03.016.
 14.
Wittmann C, Kim HM, Heinzle E: Metabolic network analysis of lysine producing Corynebacterium glutamicum at a miniaturized scale. Biotechnol Bioeng. 2004, 87 (1): 16. 10.1002/bit.20103.
 15.
Fischer E, Zamboni N, Sauer U: Highthroughput metabolic flux analysis based on gas chromatographymass spectrometry derived 13C constraints. Anal Biochem. 2004, 325 (2): 308316. 10.1016/j.ab.2003.10.036.
 16.
Vo TD, Lim SK, Paul Lee WN, Palsson BO: Isotopomer analysis of cellular metabolism in tissue culture: A comparative study between the pathway and networkbased methods. Metabolomics. 2006, 2 (4): 243256. 10.1007/s1130600600333. 10.1007/s1130600600333.
 17.
Suthers PF, Burgard AP, Dasika MS, Nowroozi F, Van Dien S, Keasling JD, Maranas CD: Metabolic flux elucidation for largescale models using ^{13}C labeled isotopes. Metab Eng. 2007, 9 (5–6): 387405. 10.1016/j.ymben.2007.05.005.
 18.
Antoniewicz MR, Kraynie DF, Laffend LA, GonzalezLergier J, Kelleher JK, Stephanopoulos G: Metabolic flux analysis in a nonstationary system: fedbatch fermentation of a high yielding strain of E. coli producing 1, 3propanediol. Metab Eng. 2007, 9 (3): 277292. 10.1016/j.ymben.2007.01.003.
 19.
Nöh K, Gronke K, Luo B, Takors R, Oldiges M, Wiechert W: Metabolic flux analysis at ultra short time scale: isotopically nonstationary ^{13}C labeling experiments. J Biotechnol. 2007, 129 (2): 249267. 10.1016/j.jbiotec.2006.11.015.
 20.
Young JD, Walther JL, Antoniewicz MR, Yoo H, Stephanopoulos G: An elementary metabolite unit (EMU) based method of isotopically nonstationary flux analysis. Biotechnol Bioeng. 2008, 99 (3): 686699. 10.1002/bit.21632.
 21.
Zamboni N, Fischer E, Sauer U: FiatFlux – a software for metabolic flux analysis from ^{13}Cglucose experiments. BMC bioinformatics. 2005, 6: 209 10.1186/147121056209.
 22.
Wiechert W, Mollney M, Petersen S, de Graaf AA: A universal framework for ^{13}C metabolic flux analysis. Metab Eng. 2001, 3 (3): 265283. 10.1006/mben.2001.0188.
 23.
Rantanen A, Rousu J, Jouhten P, Zamboni N, Maaheimo H, Ukkonen E: An analytic and systematic framework for estimating metabolic flux ratios from 13C tracer experiments. BMC bioinformatics. 2008, 9: 266 10.1186/147121059266.
 24.
Yang TH, Frick O, Heinzle E: Hybrid optimization for 13C metabolic flux analysis using systems parametrized by compactification. BMC Syst Biol. 2008, 2: 29 10.1186/17520509229.
 25.
Antoniewicz MR, Kelleher JK, Stephanopoulos G: Elementary metabolite units (EMU): a novel framework for modeling isotopic distributions. Metab Eng. 2007, 9 (1): 6886. 10.1016/j.ymben.2006.09.001.
 26.
Zupke C, Stephanopoulos G: Modeling of Isotope Distributions and Intracellular Fluxes in Metabolic Networks Using Atom Mapping Matrixes. Biotechnol Prog. 1994, 10 (5): 489498. 10.1021/bp00029a006. 10.1021/bp00029a006.
 27.
Schmidt K, Carlsen M, Nielsen J, Villadsen J: Modeling isotopomer distributions in biochemical networks using isotopomer mapping matrices. Biotechnol Bioeng. 1997, 55 (6): 831840. 10.1002/(SICI)10970290(19970920)55:6<831::AIDBIT2>3.0.CO;2H.
 28.
Wiechert W, de Graaf AA: Bidirectional reaction steps in metabolic networks: I. Modeling and simulation of carbon isotope labeling experiments. Biotechnol Bioeng. 1997, 55 (1): 101117. 10.1002/(SICI)10970290(19970705)55:1<101::AIDBIT12>3.0.CO;2P.
 29.
van Winden WA, van Gulik WM, Schipper D, Verheijen PJ, Krabben P, Vinke JL, Heijnen JJ: Metabolic flux and metabolic network analysis of Penicillium chrysogenum using 2D [13C, 1H] COSY NMR measurements and cumulative bondomer simulation. Biotechnol Bioeng. 2003, 83 (1): 7592. 10.1002/bit.10648.
 30.
Antoniewicz MR, Kelleher JK, Stephanopoulos G: Determination of confidence intervals of metabolic fluxes estimated from stable isotope measurements. Metab Eng. 2006, 8 (4): 324337. 10.1016/j.ymben.2006.01.004.
 31.
Möllney M, Wiechert W, Kownatzki D, de Graaf AA: Bidirectional reaction steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments. Biotechnol Bioeng. 1999, 66 (2): 86103. 10.1002/(SICI)10970290(1999)66:2<86::AIDBIT2>3.0.CO;2A.
 32.
Wittmann C, Heinzle E: Genealogy profiling through strain improvement by using metabolic network analysis: metabolic flux genealogy of several generations of lysineproducing corynebacteria. Appl Environ Microbiol. 2002, 68 (12): 58435859. 10.1128/AEM.68.12.58435859.2002.
 33.
van Winden WA, Wittmann C, Heinzle E, Heijnen JJ: Correcting mass isotopomer distributions for naturally occurring isotopes. Biotechnol Bioeng. 2002, 80 (4): 477479. 10.1002/bit.10393.
 34.
Isermann N, Wiechert W: Metabolic isotopomer labeling sysmtes. Part II. structural flux identifiability analysis. Math Biosci. 2003, 183 (2): 175214. 10.1016/S00255564(02)002225.
 35.
van Winden WA, Heijnen JJ, Verheijen PJ, Grievink J: A priori analysis of metabolic flux identifiability from (13)Clabeling data. Biotechnol Bioeng. 2001, 74 (6): 505516. 10.1002/bit.1142.
 36.
Chang Y, Suthers PF, Maranas CD: Identification of optimal measurement sets for complete flux elucidation in metabolic flux analysis experiments. Biotechnol Bioeng. 2008, 100 (6): 10391049. 10.1002/bit.21926.
 37.
Wittmann C, Heinzle E: Modeling and experimental design for metabolic flux analysis of lysineproducing Corynebacteria by mass spectrometry. Metab Eng. 2001, 3 (2): 173191. 10.1006/mben.2000.0178.
 38.
Rantanen A, Mielikainen T, Rousu J, Maaheimo H, Ukkonen E: Planning optimal measurements of isotopomer distributions for estimation of metabolic fluxes. Bioinformatics. 2006, 22 (10): 11981206. 10.1093/bioinformatics/btl069.
 39.
Becker J, Klopprogge C, Zelder O, Heinzle E, Wittmann C: Amplified expression of fructose 1, 6bisphosphatase in Corynebacterium glutamicum increases in vivo flux through the pentose phosphate pathway and lysine production on different carbon sources. Appl Environ Microbiol. 2005, 71 (12): 85878596. 10.1128/AEM.71.12.85878596.2005.
 40.
Dauner M, Sauer U: GCMS analysis of amino acids rapidly provides rich information for isotopomer balancing. Biotechnol Prog. 2000, 16 (4): 642649. 10.1021/bp000058h.
 41.
Kadirkamanathan V, Yang J, Billings SA, Wright PC: Markov Chain Monte Carlo Algorithm based metabolic flux distribution analysis on Corynebacterium glutamicum. Bioinformatics. 2006, 22 (21): 26812687. 10.1093/bioinformatics/btl445.
 42.
Wiechert W, Siefke C, Graaf AAd, Marx A: Bidirectional reaction steps in metabolic networks: II. Flux estimation and statistical analysis. Biotechnol Bioeng. 1997, 55 (1): 118135. 10.1002/(SICI)10970290(19970705)55:1<118::AIDBIT13>3.0.CO;2I.
Acknowledgements
We thank Oliver Frick and Esteban Marcellin for reviewing the manuscript. We also thank the University of Queensland and the Australian Institute for Bioengineering and Nanotechnology for financial support.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
LEQ programmed the JAVA parser, implemented OpenFLUX in MATLAB and performed the EMU based simulations. He also performed 13CFLUX simulations and contributed to writing the paper. CWI performed the simulations based on isotopomer mapping, provided data and modelling details on the real test case scenario, contributed expertise for development of OpenFLUX and contributed to writing of the paper. LKN provided his expertise in mathematical modelling of fluxes and contributed to writing the paper. JOK designed and supervised the study. He contributed expertise in ^{13}C fluxomics and coordinated the writing of the paper. All authors have approved the final manuscript
Electronic supplementary material
Additional file 2: C. glutamicum metabolic model. Contains the C. glutamicum metabolic network for both wildtype and mutant strains. Model and data are reproduced from Becker et al. [39]. File includes brief instruction on how to set up the simulated output EMU vector. (XLS 80 KB)
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Flux Distribution
 Metabolic Model
 Flux Analysis
 Metabolic Flux Analysis
 Glucose Uptake Rate