Non-ribosomal peptide synthetase (NRPS)-encoding products and their biosynthetic logics in Fusarium

Fungal non-ribosomal peptide synthetase (NRPS)-encoding products play a paramount role in new drug discovery. Fusarium, one of the most common filamentous fungi, is well-known for its biosynthetic potential of NRPS-type compounds with diverse structural motifs and various biological properties. With the continuous improvement and extensive application of bioinformatic tools (e.g., anti-SMASH, NCBI, UniProt), more and more biosynthetic gene clusters (BGCs) of secondary metabolites (SMs) have been identified in Fusarium strains. However, the biosynthetic logics of these SMs have not yet been well investigated till now. With the aim to increase our knowledge of the biosynthetic logics of NPRS-encoding products in Fusarium, this review firstly provides an overview of research advances in elucidating their biosynthetic pathways. Supplementary Information The online version contains supplementary material available at 10.1186/s12934-024-02378-1.


Introduction
Fungal non-ribosomal peptide synthetases (NRPS) are large modular multifunctional enzymes that generate compounds by sequential condensation of amino acids and hydroxycarboxylic acid units [1].Fungal NRPSencoding products are a prolific source of bioactive compounds, some of which have been commercially used as therapeutic agents, such as cyclosporin A, echinocandins and emodepsides [2,3].As one of the most common filamentous fungi in nature, Fusarium is well-known for its potential of production of NRPS products with a wide array of biological properties [4][5][6].With a substantial increase in fungal genome sequences and the incremental optimization of software tools (e.g., anti-SMASH, NCBI, UniProt), bioinformatic analysis of the link between secondary metabolites (SMs) and their biosynthetic gene cluster (BGCs) has become simple and efficient [7][8][9].A growing number of Fusarium-derived NRPS products and their BGCs have been isolated and characterized [6,10,11].However, the biosynthetic pathways of these SMs have not been well unveiled till now.By extensive literature search and analysis, this review comprehensively summarizes 15 biosynthetic pathways of NRPS-type compounds from Fusarium spp., highlighting the key enzymatic domains involved in their biosynthetic pathways.Additionally, the supporting information summarizes some of the common methods, which can provide valid references for further research.

Canonical NRPS-encoding compounds
One fungal NRPS module usually consists of at least three essential domains including the adenylation (A), the thiolation (T) and the condensation (C) [12][13][14][15].The other family members also can replace the C domain in the biosynthesis or work together with C domain, including the epimerization (E) domain, the heterocyclization (Cy) domain, the CT domain (a subset of the C domain) etc., which can meet diverse and novel functions [16,17].The released products are subsequently further modified by additional enzymes, which are encoded by genes located near the NRPS and thus form the final product [18,19].

Fusahexin
Fusahexin (1), originally derived from strain F. graminearum PH-1, represents a cyclic hexapeptide consisting of six amino acid residues and containing an uncommon ether bond between the C-δ of proline and the C-β of threonine [20,21].Phytopathological investigation showed that this substance plays a key role in hyphal growth, attachment, water-air interface penetration and plant infection through regulation of surface hydrophobicity of conidia and the cell wall as well as hydrophobin rodlet formation in Aspergillus nidulans [22][23][24][25].
Knockout and overexpression experiments revealed that an NRPS4 cluster in F. graminearum was responsible for the production of compound 1 [22,26].This cluster contains four genes that respectively encode for glucoside hydrolase, NRPS synthetase (gene NRPS4), ABC transporter and major facilitator superfamily (MFS) transporter (Fig. 1A).The NRPS4 enzyme consists of five modules, in which modules 1-4 are respectively responsible for linking D-alanine, L-leucine, D-allo-threonine, and L-proline, and module 5 is serially reusable in assembly of D-leucine and L-leucine (Fig. 1B) [20].However, the function of other three enzymes in the NRPS4 cluster had not yet been characterized till now.
Overexpression of genes fgm1, fgm2 and fgm3 along with their diverse combinations in Pichia pastoris GS115 showed these genes are responsible for the formation of GAA (Fig. 2C), which is a guanosine residue that serves as the initiating unit for the biosynthesis of compound 3. Fgm1, Fgm2 and Fgm3 respectively encode cytochrome P450, metallo-dependent pyridoxal-5′-phosphate (PLP)dependent lyase.Fgm1 oxidizes L-Arg to 4(R)-hydroxy-L-Arg (4), which selectively enables the activation of inert C4 atom by hydroxylation for subsequent C3-C4 cleavage [34].Fgm3 catalyzes the cleavage of the C β -C γ bond in 4 to produce 5 and L-Ala [35].Fgm2 effectively hydrolyzes glycociamidine (6) to produce linearized GAA.The pathway for GAA formation in F. graminearum differs significantly from the well-known pathway that utilizes the L-Arg:L-Gly aminidotransferase (AGAT) to transfer amino group between L-Arg and L-Gly residues.Instead, it relies on L-Arg as a precursor through a series of chemical reactions including inert C−H bond activation, selective C−C bond cleavage, cyclization-based alcohol dehydrogenation, and amidohydrolysis-associated linearization [36].

Gramillin
Gramillins A (8) and B ( 9) are two host-specific virulence factors initially isolated from several F. graminearum strains [37].They possess a fused bicyclic structure in which the main peptide ring is cyclized through the carboxylic group of glutamic acid and the side chain of 2-amino adipic acid [38][39][40].It was the first occurrence of anhydride bond being involved in the cyclization of a cyclic peptide [37,41].
The functions of the NRPS8 gene cluster were determined through targeted gene disruption [42].Gene GRA1 encodes a multi-modular NRPS synthase that contains seven A and C domains [43].GRA2 encodes a transcription factor (TF) and is responsible for the regulation of cyclic peptide production (Fig. 3A) [44,45].By combining the Stachelhaus model and analyzing the conservation of the two adjacent A domains, the probable pathway for gramillins biosynthesis was identified.The biosynthetic pathway begins with Glu or 2-amino adipic acid and sequentially connects to Leu, Ser, HO-glutamine (HO-Gln), 2-amino decanoic acid, cysteine B (Cys B), and Cys A via other modules (Fig. 3B) [46,47].However, the functions of the other genes still need to be confirmed through additional specific experiments.
The small two-gene cluster for BEA biosynthesis in strain LF061 consists of an NRPS gene and a KIVRencoding gene [72].D-Hiv is recognized by the A 1 domain in module 1 of NRPS22 and attached to the T 1 domain as a thioester.L-Phe is specifically activated by the A 2 domain and is loaded to the twin T 2 domain in module 2.An integrated N-methyltransferase domain is also present in NRPS22, which is responsible for the methylation of the L-Phe residue (Fig. 5) [67,71].This serves as a classic example of acting through the core NRPS synthase and provides valuable insights for subsequent studies [60].

Sansalvamide A
Sansalvamide A ( 20) is a cyclic pentadepsipeptide composed of an α-hydroxyisocaproic acid (α-HICA) unit and four protein amino acids (L-Val, L-Leu, L-Phe, L-Leu).It was originally discovered in the crude extract of an unknown Fusarium strain, which was collected from the surface of the seagrass Halodule wrightii [73][74][75].Bioassay tests indicated that compound 20 is an effective cytotoxin in the colon cancer cell lines COLO 205 and HCT116 and the melanoma cell line SK-MEL-2 [75,76].
The BGC NRPS30, which is responsible for the formation of compound 20 in F. solani FGSC 9596, was characterized through a gene knockout experiment using the ATMT approach [77,78].This cluster contains at least four genes that encode NRPS30 synthetase (gene NRPS30), oxidoreductase, short-chain dehydrogenase/ reductase, and MFS transporter (Fig. 6A).Among the five modules of the NRPS30 enzyme, only the first amino acid of the A 3 domain is glycine, while the remaining four are aspartic acid [46,79].This suggests that α-HICA is loaded as the third substituent during the biosynthesis of compound 20, as the lack of an acidic residue in the first position is only observed for A domain with non-amino acid substrates [80].NRPS30 utilizes L-Phe as a starting Fig. 5 The scheme of BEA (19) biosynthesis and the bea cluster in F. proliferatum LF061 unit and extends the sequence with additional units, including L-Leu, α-HICA, L-Val, and L-Leu (Fig. 6B).
The comparison of metabolite profile of the knockout mutants revealed that only six genes (APF1, APF3, APF4, Fig. 6 The proposed biosynthetic pathway for sansalvamide A (20).A The NRPS30 cluster in Fusarium solani FGSC 9596; B the compound 20 biosynthesis logic APF5, APF6, APF7/AFP8/APF9) directly participate in the biosynthesis of APF [85].Apf3 reduces L-lysine to L-piperidinic acid (22), which is subsequently converted to 23 by Apf1.L-tryptophan is initially oxidized to N-hydroxyl-L-tryptophan (24) by one of the two P450 enzymes (Apf7/Apf8), followed by conversion to 25 by Apf6.Apf5 is responsible for the condensation of three malonyl-CoA units and an acetyl-CoA into the octanoic acid backbone, which is then oxidized to form 28 by a P450 oxygenase.Apf4 catalyzes the exchange of the keto group of 28 with the amino group to form 27. Apf7/Apf9 may be involved in the conversion of 27 to 26.Ultimately, APF is generated by combining the four precursors in the presence of Apf1 (Fig. 7B).This represents a unique case of NRPS synthase function, where the NRPS enzyme is not fully functional until the final step.

Fusarochromene (NRPS-like)
Fusarochromene (29) firstly isolated from F. sacchari has structural similarities to fusarochromanone (30), which is a lead compound for cancer treatment [91,92].Compound 30 demonstrates a wide range of biological activities, such as angiogenesis inhibition, prevention of cell reproduction, and induction of apoptosis in numerous cancer cells, especially COS7 and HEK293 cells [93,94].
A biosynthetic pathway for 29 and 30 is proposed in Fig. 8B.L-tryptophan is converted to D-tryptophan (36) in the presence of FscC, and subsequently hydroxylated by FscE to yield 6-hydroxytryptophan (35) [97].The pyrrole ring undergoes cleavaged by FscD and is finally converted to 4-hydroxykyrunenine (34).FscA reduces the carboxyl group to primary alcohol (33) and FscG, a DMATS-type prenyltransferase, performs prenylation to 32 with the formation of a chromene ring.32 is catalyzed by FscJ, leading to the formation of desacetyl-fusarochromene (31).Epoxidation (FscF) and rearrangement reactions of chromene double bonds convert compound 31 to 30.Although specific acetyltransferases were not found near the fsc BGC, several predicted enzymes containing the N-acetyltransferase superfamily domain were discovered in the genome of F. equiseti.These predicted enzymes may have the potential to convert compound 31 to 29 [98].
As shown in Fig. 9A, the FGSG cluster in F. graminearum consists of at least five genes: PKS6, NRPS7, FGSG-A, FGSG-B, and FGSG-C.Deletion of NRPS7/PKS6 resulted in the absence of 37, confirming that PKS6 and NRPS7 are the two key enzymes jointly responsible for its production.Additionally, FGSG-C is predicted to encode a cytochrome P450 monooxygenase, FGSG-A encodes an aminotransferase, and FGSG-B encodes a putative protein containing a stress response A/B barrel domain [108].The biosynthetic pathway of product 37 is mainly accomplished by PKS6 and NRPS7.As the FGSG cluster lacks acyltransferases, the polyketide synthesized by PKS6 is directly transferred to NRPS7.Then module 1-3 of NRPS7 sequentially adds Ala, Gln, and β-aminoisobutyric acid, and is finally released through cyclization (Fig. 9B).Although the β-aminoisobutyric acid units are most likely not freely available to the NRPS7, the FGSG cluster harbors cytochrome P450 and aminotransferases, which could potentially obtain it from thymidine.
The FPSE cluster, consisting of at least four genes (PKS40, NRPS32, FPSE-A, FPSE-B), was identified in F. pseudograminearum through the analysis of the conserved genes [108].These genes were respectively predicted to encode a PKS enzyme (PKS40), a NRPS enzyme (NRPS32), an acyl-CoA ligase and a thioesterase (Fig. 10A).The biosynthetic pathway of W493 B Fig. 9 Proposed biosynthetic pathway of fusaristatin A (37).A The FGSG gene cluster in F. graminearum; B The PKS6 and NRPS7 collaborative model of the biosynthetic logic of 37 is primarily catalyzed by PKS40 and NRPS32, which respectively play important roles in the formation of 4-methyltridecanoic acid thioester and a hexapeptide (Fig. 10B).The T 1 domain of NRPS32 is responsible for accepting threonine, which is adenylated by the A 1 domain and then combined with D-allo-threonine formed by the E 1 domain.Five consecutive modules bind Ala, Ala, Gln, Tyr, and Val/Ile to form the final product and release it through the cyclization domain [108].The biosynthetic pathways of compounds 37 and 38 provide a comprehensive overview of lipopeptide biosynthesis.
The FA biosynthetic pathway has been proposed in Fig. 11B.With the combined action of FUB3 and FUB5, L-aspartate is converted to O-acetyl-homoserine (42).FUB1 generates the triketide trans-2-hexenal (41), which is potentially released by FUB4 linked to the NRPSbound amino acid precursor by Fub6.After further modification by FUB7, the NRPS-bound amino acid precursor is released by FUB8 to form 40, which is finally oxidized by FUB9 to form FA.

Hybrid PKS/NRPS products
The compounds generated by PKS/NRPS hybrid megaenzymes are especially intriguing due to their structural complexity [124,125].This hybrid megaenzymes consists of an NRPS module and a PKS module together.The PKS module synthesizes the linear polyketide backbone, which is released after ligating with amino acids through the action of the NRPS module [126][127][128][129].It is then further converted to more complex metabolites by oxidase or other enzymes.
The intermediates of compound 43 were only identified in the Δfus2, Δfus8, Δfus9, and Δfus2-9 mutants, suggesting that the genes fus3, fus4, fus5, fus6, and fus7 are largely uninvolved in the production of C. The proposed fusarin C biosynthetic pathway is as follows: Fus1 is responsible for the condensation of one acetyl-CoA with six malonyl-CoA and homoserine to form prefusarin (47).Fus8 then oxidizes 47 to form 46, which is an essential reaction until Fus2 catalyzes the formation of 20-hydroxy-prefusarin (45).45 is further oxidized to produce 44 by Fus8.The final step involves the methylation of the hydroxyl group of C-21 by Fus9, resulting in the production of fusarin C (Fig. 12B).The co-cultivation of different mutants and intermediates analysis further confirms that Fus1, Fus2, Fus8, and Fus9 are sufficient for the biosynthesis (see Additional file 1).

Fusaridione A
Fusaridione A ( 55) is an unstable tyrosine-derived 2,4-pyrrolidinedione produced by F. heterosporum [150][151][152][153]. Genomic analysis has revealed a silence gene, fsdS, which consists of a hybrid PKS and NRPS module.The putative biosynthesis pathway of fusaridione A was unveiled by fsdS gene knockout experiments [154].The polyketide chain is first synthesized by the addition of seven acetyl-CoA units.Each extension requires the involvement of the KS, AT, KR, DH and ACP domain.Then, the tyrosine is activated and attached to the polyketide chain in the presence of the C, A and T domains.Compound 55 is finally released through the Dieckmann cyclase R* domain [16,155].The unstable pyrrolidinedione ring is opened by a reverse Dieckmann reaction, resulting in the formation of product 56 (Fig. 14) [156].Further exploration is required to elucidate the genes that are closely related to gene fsdS.
The proposed biosynthetic scheme for compound 58 and its derivatives involves the utilization of an acetyl-CoA, seven malonyl-CoA, two S-adenosyl-L-methionine (SAM) and L-serine to form the backbone [164].The PKS module of EqxS catalyzes with the enoyl reductase (EqxC) to produce a polyketide unit followed by conjugation with a L-serine (in red) through the condensation of the NRPS module.The Dieckmann cyclase domain activity (R*) leads to the release of 57.Compound 57 is then N-methylated by EqxD to form 58, which was further converted to fusarisetin A (59) in Fusarium sp.FN080326 (Fig. 15B).

Conclusions
Fusarium is one of excellent producers of NRPS products with a wide range of biological properties.To the best of our knowledge, over 800 SMs produced by Fusarium strains have been recorded in the Dictionary of Natural Products (DNP) database and nearly 300 chemicals related to NRPS pathway [165].This review highlights only fifteen biosynthetic pathways that linked NRPS products with their corresponding BGCs identified in Fusarium.Therefore, most of these NRPS compounds linked to their BGCs need to be investigated.More efforts should be made to apply genetic engineering approaches to elucidate the biosynthetic pathways of other Fusarium NRPS-encoding compounds and to characterize their key genes and functions.

Fig. 2
Fig. 2 Biosynthetic pathway of fusaoctaxin A (2) and B (3).A The fg3_54 cluster in F. graminearum PH-1; B Model of the assembly line for 2 and 3. C Enzymatic biosynthesis for the formation of GAA

Fig. 7
Fig. 7 Proposed biosynthetic pathway of APF (21) A The APF gene cluster in F. fujikuroi IMI58289; B The biosynthesis logic of APF (21)

Fig. 13
Fig.13 Proposed biosynthetic pathway of oxysporidinone(48).A The osd gene cluster in F. oxysporum ACCC 36465; B the scheme of the assembly line for 48

Table 1
The comparison of gene functions for the biosynthesis of equisetin