Construction of an enzyme-constrained metabolic network model for Myceliophthora thermophila using machine learning-based k cat data

Background Genome-scale metabolic models (GEMs) serve as effective tools for understanding cellular phenotypes and predicting engineering targets in the development of industrial strain. Enzyme-constrained genome-scale metabolic models (ecGEMs) have emerged as a valuable advancement, providing more accurate predictions and unveiling new engineering targets compared to models lacking enzyme constraints. In 2022, a stoichiometric GEM, iDL1450, was reconstructed for the industrially

Genome-scale Metabolic Models (GEMs) are effective tools for elucidating cellular phenotypes and predicting potential engineering targets to guide development of industrial strains [11][12][13].In 2022, a manually curated genome-scale metabolic model, iDL1450, was constructed for M. thermophila [14].With the help of iDL1450, an optimized rational design for important bulk chemicals was simulated and metabolic differences under different temperature conditions were analyzed.However, GEMs only consider stoichiometric constraints, which may not accurately capture the intracellular conditions.To enhance the performance of GEMs, the integration of enzyme constraints into the models has been explored.Several methods have been developed to incorporate the enzyme constraints, considering enzyme concentration, enzyme catalytic efficiency, and enzyme molecular weight [15][16][17][18][19][20][21].In 2007, the first mathematical framework, Flux Balance Analysis with Molecular Crowding (FBAwMC), was established by Beg et al. taking into account enzyme constraints based on macromolecular crowding [22].FBAwMC imposes constraints on enzyme concentrations at a physical level by introducing crowding coefficients, thereby achieving an overall constraint on enzyme activity.Subsequently, several methods, such as MOMENT [23], GECKO [24], AutoPACMEN [25], and ECMpy [17], has been developed to integrate enzyme constraints into GEMs.
In 2017, based on FBAwMC, Sánchez et al. developed the GECKO toolbox, which extends GEMs by adding new rows to the S-matrix that represent the enzymes and new columns representing each enzyme's usage [24].The enzyme-constrained GEM (ecGEM) ecYeast7 revealed improved predictive performance compared with the Yeast7 and identified enzyme limitation as a major driving force behind enzymatic protein reallocation.Bekiaris et al.Combined the MOMENT and GECKO methods to introduce AutoPACMEN, a method capable of automatically retrieving enzyme data from the BRENDA [26] and SABIO-RK [27] databases, marking a significant step in automating ecGEM construction [25].Additionally, in 2021, Mao et al. presented ECMpy, an automated method for ecGEM construction, which simplified the workflow without modifying the S-matrix [17].ECMpy was employed to construct an ecGEM for Escherichia coli, demonstrating improved prediction accuracy for various cellular phenotypes [17].Subsequently, ECMpy was utilized to develop ecGEMs for Bacillus subtilis [20] and Corynebacterium glutamicum [19], providing more precise predictions and guiding the rational design of microbial cell factories.
In this study, we first updated and modified iDL1450 to a new version called iYW1475.Based on this GEM, an enzyme-constrained model for M. thermophila was constructed using the ECMpy workflow.During the construction, enzyme turnover numbers (k cat ) were gathered in three distinct methods (AutoPACMEN [25], DLKcat [28] and TurNuP [29]), resulting in three versions of ecGEMs (eciYW1475_AP, eciYW1475_DL and eciYW1475_TN).After comparison, eciYW1475_TN was selected as the final ecGEM version for M. thermophila (ecMTM).Simulation of substrate hierarchy utilization and prediction of metabolic engineering targets were performed using ecMTM.

Measurements of RNA and DNA content in M. thermophila
In order to determine RNA and DNA content in M. thermophila, the wild-type strain ATCC 42464 was grown on Vogel's minimal medium supplemented with 2% glucose (GMM) containing 1.5% agar at 35 ℃ for 7 days to obtain mature conidia.Conidia was inoculated into 50 mL GMM to a final concentration of 1×10 6 conidia/ mL in a 250-mL Erlenmeyer flask.Then, liquid cultures were incubated under 45 ℃ at 150 rpm in a rotary shaker for 20 h.RNA content was quantified using the method described in report [30,31].A 2 mL sample was collected and certificated at 10,000×g for 5 min.The resulting pellet was washed three times with 3 mL of cold 0.7 M HClO 4 .Subsequently, the mycelium was resuspended in 3 mL of 0.3 M KOH and incubated at 37 ℃ for 60 min with occasional shaking.Following cooling to room temperature, the samples were neutralized by adding 1.0 mL 3 M HClO 4 , followed by centrifugation.The supernatant was collected, and the pellet underwent two washes with 4 mL of cold 0.5 M HC1O 4 .The supernatants were combined, and the volume was adjusted to 15 mL with 0.5 M HClO 4 .Finally, the samples were clarified by centrifugation, and the absorbance A 260nm was measured using a UV spectrometry (Nanodrop 2000c spectrophotometer).For the determination of biomass dry weight, mycelium was collected using vacuum filtration, washed three times with distilled water, and rapidly frozen in liquid nitrogen.Then the biomass samples were lyophilized under − 40 ℃ until a constant weight was achieved.
To measure DNA content, approximately 0.01 g of lyophilized mycelium was ground into powder in liquid nitrogen in a mortar.Following this, 1 mL extraction buffer (200 mM Tris•HCI pH 8.5, 250 mM NaCl, 25 mM EDTA, 0.5% SDS) [32] was added and incubated at 60 ℃ for 30 min.The sample was subjected to two extractions by adding an equal volume of phenol:chloroform:isoamy lalcohol (25:24:1, v/v/v) with certification at 10,000×g for 10 min.The supernatant was mixed with 1/10 volume of 3 M sodium acetate (pH 5.3) and 2.5 volumes ethanol to precipitate DNA at − 20 ℃ for 1 h.After precipitation, the DNA was resuspended in TNE (1 M NaCl, 10 mM EDTA, 0.1 M Tris•HCl, pH 7.4) and treated with RNase, followed by a second precipitation and wash to remove degraded RNA.Finally, the DNA content was determined using a Nanodrop spectrophotometer.

Model update and format modification
Prior to establishing ecGEM, the initial model iDL1450 underwent comprehensive updates and format modifications to align with ECMpy requirements.In detail, biomass components were adjusted based on experimental data (Additional file 1: Table S1).Precise corrections were applied to RNA, DNA, protein and amino acid content [33], according to measured data.Subsequently, the content of lipids and cell wall was adjusted according to literatures [34] and [30], respectively.Some redundant metabolites were identified by their corresponding IDs and names and were manually consolidated into singular entities (Additional file 2: Table S2).Besides, gene-protein-reaction (GPR) relationships were checked and updated based on experiment data from previous reports [4,7,35] alongside KEGG annotation [36], mainly including glycolysis, tricarboxylic acid cycle (TCA) and oxidative phosphorylation pathways (Additional file 3: Table S3).After the revision, gene number was increased from 1450 to 1475 and the initial model was renamed iYW1475.
To meet requirement of ECMpy, several changes had been made.Primarily, metabolite names were mapped to those in BiGG database [37] using KEGG [36] identifier, CHEBI IDs [38] and metabolite names (Additional file 4: Table S4).Then the format of modified GEM was converted from XML to JavaScript Object Notation (JSON) format.Furthermore, protein IDs in UniProt database [39] were collected and integrated into JSON format model as a part of gene annotation.

Construction of ecGEM and calibration of enzyme kinetic parameters
The ecGEM was constructed based on iYW1475 using the ECMpy workflow (see Fig. 1).Initially, reversible reactions were transformed into pairs of irreversible reactions, and reactions catalyzed by multiple isoenzymes were segregated into distinct reactions.The enzyme mass fraction (f) was determined as 0.55, calculated using Eq.(1) based on unpublished proteomic data measured in our lab, where A i and A j represent the abundances (mole ratio) of the i-th protein (p_num represents proteins expressed in the model) and j-th protein (g_num represents proteins expressed in the whole proteome of M. thermophila), respectively; MW i is molecular weight of an enzyme catalyzing reaction i.Additionally, enzyme subunit information was sourced from the 'Interaction information' section in the UniProt database (see Additional file 5: Table S5).
For enzyme kinetic parameter data, three methods were employed: AutoPACMEN was utilized to collect enzyme kinetic parameter data based on EC numbers, and machine learning-based tools (DLKcat [28] and TurNuP [29]) were used to predict enzyme kinetic parameter information.Finally, the enzyme-constrained model was developed by incorporating enzyme constraints (see Eq. 2) into GEM, where v i represents flux of i-th reaction in the model; σ i denotes the saturation coefficient for i-th enzyme, with an average value of 0.5 assigned to all enzymes; ptot represents the total protein fraction (0.4653) in M. thermophila and f represents the mass fraction of enzymes; k cat,i represents the turnover number of the enzyme catalyzing reaction i.
To enhance the alignment between model predictions and experimental data, additional fine-tuning of the original k cat values was necessary for the enzyme-constrained model.The iterative calibration for k cat was carried out utilizing two methods outlined in ECMpy [17]: the enzyme usage method and the 13 C flux consistency method, relying on measured 13 C flux data [33].

Phenotype phase plane (PhPP) analysis
Cells deploy distinct metabolic strategies contingent upon varying oxygen and glucose levels.To simulate this dynamic cellular responses, PhPP analysis was conducted to explore metabolic behavior [40].During the analysis, uptake flux ranges for oxygen and glucose were set to 0-50 mmol/gDW/h and 0-10 mmol/gDW/h, (1) respectively.Parsimonious Flux Balance Analysis (pFBA) was used for PhPP analysis with the objective of maximum biomass [41].

Flux variability analysis (FVA)
Flux balance analysis (FBA) using linear programming with GEMs might generate diverse optimal solutions for the same objective value [42,43].To assess the changes in solution space after introducing enzyme constraint, flux variability ranges between iYW1475 and ecGEM were compared using FVA method [43] with some modifications [19].Specifically, for reactions involving isozymes, the maximal flux variability range within the isozymes was used (Eq.( 3)).For reversible reactions in ecGEMs, the corresponding flux variability ranges were solved using Eq. ( 4) [19], where v max i,isoj and v min i,isoj represent the maximum and minimum fluxes, respectively, of the j-th reaction among the reactions associated with isozyme i; v max i and v min i represent the maximum and minimum fluxes of reaction i; v max i,isoj and v min i,REV represent the maximum and minimum fluxes of the reversible reaction i.

Metabolic adjustment simulation
GEMs or ecGEMs were always employed to investigate and explicate metabolic adaptations [15][16][17][18][19][20][21].In this study, metabolic adjustment was simulated using ecMTM solving by pFBA to maximize biomass production with substrate uptake rates ranging from 0 to 5 mmol/ gDW/h.To further elucidate the metabolic adaptation to varying glucose uptake rates, detailed explorations were performed, encompassing alterations in biomass yield (Eq. 5), where v biomass represents the flux of biomass and v glucose denotes the glucose uptake rate, MW glucose is the molecular weight of glucose; enzyme efficiency for biomass synthesis ( ε biomass , Eq. 6), where E min, biomass was determined using the minimum enzyme amount algorithm of ECMpy for biomass synthesis; energy synthesis enzyme cost (Eq.7), where E reaction for ATP production, i rep- resents the enzyme level of the i-th ATP production reaction, and V net_generated_ATP, i represents the flux of the i-th ATP production reaction; and oxidative phosphorylation ratio (proportion of ATP produced by oxidative phosphorylation to the total ATP production) [20].

Substrate hierarchy utilization measured in vivo
During cellular growth on various carbon sources, a hierarchy exists in their utilization.Utilization pattern of five carbon sources in M. thermophila was measured in this study.For measurement consumption of carbon sources, inoculation of M. thermophila ATCC42464 conidia into 100 mL 1 × Vogel's minimal medium supplemented with five carbon sources (glucose, xylose, galactose, arabinose, cellobiose, 0.5% (w/v) for each substrate) to a final concentration of 1 × 10 6 conidia/mL in a 250-mL Erlenmeyer flask.Then the liquid cultures were incubated at 45 ℃ and 150 rpm in a rotary shaker.Sample was centrifuged at 10,000×g for 5 min and the supernatant was filtered with 0.22-μm cellulose acetate syringe filters before analysis.The concentration of cellobiose was determined (3)

Substrate hierarchy utilization simulated in silico
In a recent study, Wang et  where v biomass precursor and Emin, precursor respectively rep- resent the flux of biomass precursor and the minimum enzyme cost for biomass precursor synthesis.

Prediction of metabolic engineering targets
A distinctive feature of ecGEM lies in its capability to calculate the enzyme cost of reactions, facilitating the identification of key enzymes within pathways [19].In this study, potential modification targets were identified using two methods: one is enzyme cost-based sorting method (Method 1) in which the first 15 proteins were classified as top-demanded proteins and selected as potential targets for metabolic engineering; The second is the enzyme (8) ε biomass precursor = v biomass precursor E min, precursor cost differences in different conditions (HGLP (high growth low product generation)/LGHP (low growth high product generation)) method (Method 2, reactions with a flux greater than 1 mmol/gDW/h are selected as potential targets for engineering modifications (HGLP involves the flux at biomass = 10%, while LGHP involves the flux at biomass = 100%) [19].

Statistical analysis
All experiments were carried out in three independent repeated assays.

Model update and format modification
The biomass components in iDL1450 were adjusted based on measured data for protein, RNA, and DNA content.Subsequent modifications of additional components were based on literature data, all of which are detailed in the supplementary file (Additional file 1: Table S1).
After that, 96 duplicate metabolites were identified and consolidated into singular entities with identical identifiers and names (Additional file 2: Table S2).Additionally, gene-protein-reaction (GPR) rules, particularly reactions with 'and' relationships, were modified using information from literature and KEGG data.A total of 29 reactions with 'and' relationships in their GPR associations were revised (Additional file 3: Table S3).Through iterative corrections, a refined model, named iYW1475, was obtained containing 1475 genes, 2591 reactions, and 1700 unique metabolites.Additionally, format adjustments were executed to comply with ECMpy criteria, and leveraging multiple identifiers mapping, 55.2% of metabolite IDs were substituted with BiGG IDs (Additional file 4: Table S4).After these updates and format modifications, iYW1475 was transitioned from XML format to JSON format (all workflows are showed in Fig. 1).

Construction of ecGEM for M. thermophila
Before integrating enzyme kinetic parameters, reactions in iYW1475 were transformed based on ECMpy workflow.After the transformation, the model was expanded to 6690 reactions (Table 1).Then, enzyme data were collected using AutoPACMEN method, accumulating a total of 4481 k cat data ranging from 10 -3 to 10 6 , leaving 1059 reactions without enzyme data.The missing values were filled with median value of all AutoPACMEN-collected k cat data (Additional file 6: Table S6).Additionally, the molecular weights of the enzymes spanned from 8 to 1800 kDa (Fig. 2B).The constructed ecGEM from this workflow was denoted as eciYW1475_AP.
However, 19.1% of k cat data cannot be collected through the AutoPACMEN method.Furthermore, only one k cat value originated from the native species, while the remaining values were obtained from other species, indicating a low availability of data for enzyme kinetic parameters.Therefore, machine learning-based methods DLKcat and TurNuP were employed to predict k cat data.DLKcat captured 67.7% k cat data while TurNuP successfully predicted all k cat values for reactions in iYW1475 (Additional file 6: Table S6).Consequently, a second ecGEM was constructed using DLKcat predicted data with missing values filled in using the median of DLKcat predicted k cat , resulting in eciYW1475_DL.Meanwhile, TurNuPpredicted data was chosen to construct a third ecGEM, named eciYW1475_TN.The distribution of k cat values (Fig. 2A) indicated a more concentrated k cat distribution in eciYW1475_TN and eciYW1475_DL, ranging from 10 −2 to 10 3 compared to those in eciYW1475_AP.
Next, k cat values were calibrated using enzyme usage method until the growth rate approximated the measured value (Additional file 7: Table S7).After that, the 13 C flux consistency method was employed to ensure flux consistency between ecGEM and measured fluxes (Additional file 7: Table S7).After the calibration of the k cat values, growth and flux predictions were compared.Results showed that (Fig. 2C) the predicted biomass fluxes in eciYW1475_AP and eciYW1475_ TN were both similar to experimental value.However, for eciYW1475_DL, the predicted biomass was much smaller than the experimental value due to missing values being filled with the median value of 7.8391 s −1 leading to an over-constrained model.Compared with eciYW1475_DL and eciYW1475_AP, eciYW1475_ TN indicated a higher consistency with the 13 C data (Fig. 2C, Additional file 8: Table S8).Discrepancies were observed between the predicted fluxes and the experimental data for eciYW1475_DL and eciYW1475_AP (Fig. 2C, Additional file 8: Table S8).
It was noted that some predicted fluxes for the TCA pathway were zero when the glucose uptake rate was 3.0676 mmol/gDW/h.Considering k cat coverage and performance of these three ecGEMs, the revised eciYW1475_TN was then used as the final version of ecGEM for M. thermophila, named ecMTM.All subsequent simulations were carried out with ecMTM.

Introducing enzyme constraints reduced the solution space and improved prediction accuracy
When a GEM is given a specific objective, solving it through FBA often results in alternate optimal solutions, which constitutes a limitation of GEMs [43].To address this challenge and narrow the range of flux variability, various strategies, including the incorporation of enzyme constraints, have been explored in previous studies [17,[19][20][21].In this study, the cumulative flux variability ranges between iYW1475 and ecMTM were compared using Flux Variability Analysis (FVA) with setting glucose uptake rate at 10 mmol/gDW/h.Results demonstrated that a substantial two-order reduction in the median flux variability range after introducing enzyme constraints.Furthermore, cumulative distribution indicated that there were 5% with totally variable flux (reactions that can carry any flux between − 1000 and 1000 mmol/ gDW/h) in iYW1475, while no such extreme variability range was observed in ecMTM (Fig. 3A, Additional file 9: Table S9).
In experimental condition, growth rate of M. thermophila is 0.2573 h −1 with maximum glucose uptake rate of 3.0676 mmol/gDW/h when abundant oxygen and glucose are supplied [33].The PhPP analysis of iYW1475 exhibited linear growth as a function of carbon source and oxygen uptake rates (Fig. 3B), which was inconsistent with experimental observations.In contrast, ecMTM constrained the maximal growth rate with increasing carbon source availability (Fig. 3C), indicating a significant reduction in the solution space.All these results suggest that incorporating enzyme constraints into GEMs make the simulation results closer to realistic cellular phenotypes.

Simulating metabolic strategy adjustment
Enzyme-constrained models provide valuable insights into how cells adjust their metabolic pathways based on enzyme resources in response to increasing carbon source uptake.Previous studies have employed ecGEMs to simulate the metabolic adjustments in Yeast [24] and E. coli [17], elucidating overflow metabolism phenomena in these organisms.Simulation of metabolic strategy adjustment were conducted with a glucose uptake rate ranging from 0 to 6 mmol/gDW/h.The results illustrated that metabolic pathways were dynamically adjusted with an increase in glucose uptake rate.These adjustments can be divided into substrate-limited, metabolic adjustment stage, and metabolic overflow stage (Fig. 4A).In the first stage (glucose uptake rate less than 2.5 mmol/gDW/h), the growth rate exhibited a linear relationship with glucose uptake, aligning with the behavior observed in iYW1475 (Fig. 4A, B).In the second stage (glucose uptake rate 2.5 to 5 mmol/gDW/h), due to the enzyme resource constraint, growth rate was decreased compared with iYW1475 along with the gradual increase of substrate supply (Fig. 4B).Meanwhile, metabolic pathways were adjusted to these with higher enzyme efficiency and more carbon loss resulting in lower biomass yield (Fig. 4B, C).Moving on to the third stage, as substrate supply continued to increase, a shift towards a more enzymatically efficient ethanol production pathway was observed.This shift resulted in an increase in ethanol production flux from 0 to 2.17 mmol/gDW/h.Overall, the results depicted a trade-off between enzyme efficiency and growth rate.
To investigate energy pathway regulation strategies during metabolic adjustments, energy synthesis enzyme costs and oxidative phosphorylation rates were calculated [20].Cells showed a preference for energyefficient respiratory oxidative phosphorylation at low growth rates, particularly in the substrate limited stage (Fig. 4D, Additional file 10: Table S10).However, as the glucose uptake rate increased during the metabolic adjustment stage, the ratio of oxidative phosphorylation with high enzyme consumption decreased, resulting in a decline of enzyme cost for energy synthesis (Fig. 4D, Additional file 10: Table S10).Due to enzyme limitation, more energy was produced through the glycolysis pathway.The regulation of the energy pathway indicated that enzyme limitation is a major driving

Simulation of substrate hierarchy utilization
As a highly efficient cellulose and hemicellulose degrader, M. thermophila exhibits proficiency in utilizing various oligosaccharides and monosaccharides derived from plant biomass hydrolysis.The primary degradation products include cellobiose, glucose, xylose, arabinose and galactose [45].Investigating the preferential utilization of five substrates in M. thermophila, essential for environmental adaptation, was performed by simulating hierarchical substrate utilization using ecMTM from the perspective of enzyme efficiency (substrate carbon atom uptake was scaled to 15 mmol/gDW/h).According to the assessment principles [44], if the enzyme efficiency for synthesizing biomass precursors from one substrate is higher than that from another substrate, the former is sequentially consumed before the latter; otherwise, they are deemed to be simultaneously consumed (co-utilization).
There are mainly two pathways for cleaving cellobiose in M. thermophila, namely, hydrolytic pathway and phosphorolytic pathway [35] (Fig. 5A).In the hydrolytic pathway, β-glucosidase catalyzes one cellobiose molecule into two molecules glucose, while in the phosphorolytic pathway, cellobiose phosphorylase converts one cellobiose molecule into one glucose molecule and one glucose-1-phosphate (G1P) molecule.Glucose generated in these pathways is further catalyzed into G6P with consuming ATP whereas G1P, produced in the phosphorolytic pathway, is catalyzed into G6P by phosphoglucomutase without ATP consumption.In comparison to the hydrolytic pathway, 11 biomass precursors, excluding PYR, are synthesized from one cellobiose molecule through the phosphorolytic pathway, which consumes one less molecule of ATP.This reduction in ATP usage saves enzyme resources by 12.5% (in the case of G6P), as ATP production involves large amounts of enzyme resource through oxidative phosphorylation (Additional file 11: Table S11).As a result, there is a higher enzyme efficiency in synthesizing 11 biomass precursors from cellobiose compared with glucose (Fig. 5B).However, the hydrolytic pathway is chosen for pyruvate synthesis from cellobiose since ATP production and consumption are balanced through this pathway.The cellobiose hydrolytic pathway requires an additional β-glucosidase, accounting for 4.5% of the total enzyme resources, leading to lower enzyme efficiency in PYR synthesis compared with glucose.Meanwhile, enzyme efficiencies for synthesizing all the 12 biomass precursors with cellobiose and glucose are higher than those with xylose, galactose and arabinose.Therefore, glucose and cellobiose are first co-utilized showing a preferential utilization for the other three sugars.
Concerning galactose and xylose, the synthesis enzyme efficiency with the former is higher than that of the latter for 10 biomass precursors and is lower for 2, indicating a co-utilization relationship between the two sugars (Fig. 5B).Additionally, all enzyme efficiencies for the 12 precursors with xylose and galactose are higher than that of arabinose, resulting in arabinose being utilized last.Compared with xylose, there are two additional enzyme reactions for biomass precursor synthesis using arabinose which constitute 4.7% (in the case of G6P) of the total enzyme resources, resulting in lower enzyme efficiency (Additional file 11: Table S11).Overall, these simulation results are consistent with experimental measurements illustrated in Fig. 5C.The quantitative simulation elucidates the hierarchical utilization of different carbon sources derived from plant biomass in this fungus from the perspective of protein resource allocation.

Exploration of metabolic engineering targets based on enzyme cost
GEMs function as valuable predictive tools for identifying potential targets in metabolic engineering.In the case of B. subtilis, an ecGEM was applied to identify novel gene deletion targets, resulting in an increased yield of poly-γ-glutamic acid [18].Based on enzyme costs, two methods (outlined in Method section) were employed to predict potential targets for three products: ethanol, malate and fumarate.The biosynthetic pathways for these three products were illustrated in Fig. 6A.In total, 12 out of 32 previously overexpressing targets were predicted by ecMTM using the first method as depicted in Fig. 6B.Modifying these 12 targets showed notable improvements in the production of target chemicals [4,5,8,9].
Target prediction method 2 presents a modification strategy that includes both enhancing and weakening targets (Additional file 12: Table S12).Six enhancing targets for ethanol were successfully predicted and verified by overexpression target genes [8,9].Among these predicted targets, both pdc (pyruvate decarboxylase encoding gene) and adh (alcohol dehydrogenase encoding gene) were identified by these two methods, representing the final crucial steps in ethanol synthesis.Overexpressing PDC, which decarboxylates pyruvate to acetaldehyde, and ADH, which reduces acetaldehyde to ethanol, can directly enhance the ethanol synthesis capability.This strategy has been supported by previous studies [8,9].Additionally, two new enhancing targets, eno and gapdh, were predicted but have not been modified yet.
For malate, four enhancing targets were identified, including pyc (pyruvate carboxylase encoding gene), mdh (malate dehydrogenase encoding gene), and hxt (hexose transporter encoding gene), predicted by both methods.In microbes, PYC catalyzes the conversion of pyruvic acid to oxaloacetic acid with the fixation of CO 2 ; the oxaloacetic acid is then converted into malate by MDH.PYC and MDH are two key steps in produce of malate through the reductive tricarboxylic acid (rTCA) pathway in cytoplasm [5].Due to limited experiments on fumarate, only two predicted targets by these two methods were verified, leaving the remaining targets serve as potential objectives for fumarate modification.
Notably, among the predicted targets, hxt and eno were shared by these three chemicals.HXT is responsible for transporting glucose, a key factor in increasing glucose uptake.This has been confirmed by studies showing that overexpressing HXT can elevate ethanol and malate yield [5,8,9].As a vital step of glycolysis, ENO is regarded as a key enzyme due to its low k cat .Therefore, eno is a potential target to increase precursor metabolite flux from glycolysis for enhancing product synthesis, which is worth trying in future.
In the results predicted by method 2, putative oxaloacetate transporter was identified as a weakening target which contributes to the enhancement of malate and fumarate production.Knocking out this target in Aspergillus carbonarius has been shown to boost malate production [46], making it a prospective modification target for improving malate synthesis.In addition, most of predicted weakening targets for these three products were focused on the respiratory chain, suggesting that modifying these reactions may allocate more enzyme resources for product synthesis.6 Metabolic engineering targets prediction for ethanol, malate and fumarate in M. thermophila.A Depicts the biosynthetic pathways for ethanol, malate and fumarate in M. thermophila.B Metabolic engineering targets for three products predicted using ecMTM.The targets reported in the literature are highlighted in orange.The reactions uccr (Ubiquinol-cytochrome c reductase complex), rs (RNA synthesis complex), and rnd (Respiratory-chain NADH dehydrogenase complex) are involved multiple genes

Discussion
In this research, an enzyme-constrained model ecMTM was constructed based on iDL1450 under the ECMpy framework.Enzyme turnover number k cat is a fundamental parameter in the quantitative study of enzyme activity and is crucial for understanding cellular metabolism, physiology, and resource allocation [15][16][17][18][19][20][21].Several databases such as BRENDA, UniProt, and SABIO-RK offer enzyme parameter queries, while automated pipelines like AutoPACMEN facilitate the automated acquisition of k cat data for ecGEMs.Besides, k cat values can be predicted through methods based on machine learning algorithms, such as DLKcat and TurNuP.During construction of ecMTM, three versions of ecGEMs were developed using k cat values obtained through AutoPAC-MEN, DLKcat and TurNuP.After comparison, ecGEM constructed using TurNuP-predicted data performed better in several aspects and was selected as the final version ecGEM for M. thermophila.Nowadays, the machine learning based methods has gradually used in ecGEMs with improved performance [28,47].
The constructed ecMTM demonstrated improved accuracy in simulating strain behavior compared to iYW1475.Incorporating enzyme constraints into iYW1475 notably reduced the solution space, yielding predictions closer to experimental observations.However, there are still some disagreements that could not explain by ecMTM.In particular, metabolic strategy adjustment simulations within the model revealed an ethanol overflow, as was observed in yeast ecGEM [24].Although ethanol overflow is important for biofuel production from plant biomass, it is not observed in M. thermophila wild type strain.These discrepancies suggest that factors beyond enzyme constraints influence cell phenotype.Nonetheless, metabolic adjustment simulation provided insight into potential strategies to increase ethanol flux.
Due to enzyme resource limitations and quantified enzyme efficiencies, ecMTM could predict hierarchy substrate utilization in a quantitative perspective.M. thermophila typically found in low-carbon-content soil environments requires lignocellulolytic enzymes for lignocellulose degradation [48].In this niche, the utilization of carbon sources with low cost of enzyme is important for the survival of M. thermophila.Hierarchy substrate utilization was determined in liquid shake flask and simulated with ecMTM.The simulation unveiled the enzyme-cost-driven hierarchy in substrate utilization by M. thermophila.In previously research, Ramkrishna et al. [49] developed a cybernetic model to simulate mixed-substrate growth dynamics of E. coli.This model utilizes dynamic equations to describe the rates of individual reactions in metabolic pathways, taking into account substrate concentration, enzyme concentration and enzyme activity.It requires a large amount of experimental data to fit model parameters.On the other hand, Enzyme efficiency method determines the hierarchical utilization of carbon sources by calculating the enzyme efficiency for biomass precursors without relying on complex dynamic equations or experimental data.Nevertheless, it ignores the dynamic regulation processes in metabolic pathways.Therefore, the cybernetic model is better suitable for studying the dynamic regulation processes of metabolic pathways in biological systems, while the enzyme efficiency method is more suitable for predicting substrate hierarchy utilization.
Based on enzyme costs, metabolic engineering targets were predicted using ecMTM.Notably, the first prediction method revealed several shared targets across the three products, including hxt and eno.Similarly, the second prediction method predominantly highlighted targets within the glycolysis pathway to enhance these products.These predictions underscore the pivotal role of glucose transport and the rapid glycolysis flux in driving the production of these specific products.
After integration of enzyme constraints, the predictive capabilities of GEM were significantly enhanced.As we all know, biological systems are extremely complex with various constraints and hierarchical regulations.Enzyme constraints, though influential, might not comprehensively capture the intricacies of these systems.Furthermore, M. thermophila exhibits special phenotypes, such as thermophily and excellent ability of cellulose degradation.To elucidate these characteristics, it may be necessary to introduce additional constraints, such as temperature constraints, or development of comprehensive metabolic network models like ME models [50] and etcGEM models [51].

Conclusion
In this study, an enzyme-constrained genome-scale metabolic model ecMTM was constructed for M. thermophila within ECMpy framework using machine learning-based k cat data predicted by TurNuP.Simulation results demonstrated that enzyme constraints could reduce the model's solution space, leading to more realistic pathway predictions.The ecMTM successfully predicted and elucidated the hierarchical utilization of five carbon sources derived from lignocellulose.Moreover, key enzymes were identified for three chemicals, providing rational metabolic engineering modifications to increase the yield of valuable compounds.

Fig. 2
Fig. 2 Basic information of eciYW1475_AP, eciYW1475_DL and eciYW1475_TN.A Cumulative distribution of k cat values for three ecGEMs.B Cumulative distribution of molecular weights.C Flux comparison of three ecGEMs

Fig. 3
Fig.3Simulation of solution space and growth rate.A Cumulative distribution of flux variability at high growth rates for iYW1475 and ecMTM.Simulation of growth rates at different glucose and oxygen uptake rates using iYW1475 (B) and ecMTM (C)

Fig. 4
Fig. 4 Simulation of metabolic adjustments.A Metabolic flux map of the three stages.B Comparison of metabolic adjustments between iYW1475 and ecMTM.C Trade-off phenomenon simulated by ecMTM.D Energy synthesis enzyme cost and energy production ratio of oxidative phosphorylation simulated by ecMTM

Fig. 5
Fig. 5 Substrate hierarchy utilization simulation.A Substrate utilization pathways for five carbon resources.Metabolites with background represent 12 biomass precursors.B Simulation of substrate hierarchy utilization.Numbers represent enzyme efficiency.C Measured data of substrate utilization

Fig.
Fig.6 Metabolic engineering targets prediction for ethanol, malate and fumarate in M. thermophila.A Depicts the biosynthetic pathways for ethanol, malate and fumarate in M. thermophila.B Metabolic engineering targets for three products predicted using ecMTM.The targets reported in the literature are highlighted in orange.The reactions uccr (Ubiquinol-cytochrome c reductase complex), rs (RNA synthesis complex), and rnd (Respiratory-chain NADH dehydrogenase complex) are involved multiple genes )

Table 1
Basic information of four models