Combining functional metagenomics and glycoanalytics to identify enzymes that facilitate structural characterization of sulfated N-glycans

Background Sulfate modification of N-glycans is important for several biological functions such as clearance of pituitary hormones or immunoregulation. Yet, the prevalence of this N-glycan modification and its functions remain largely unexplored. Characterization of N-glycans bearing sulfate modifications is hampered in part by a lack of enzymes that enable site-specific detection of N-glycan sulfation. In this study, we used functional metagenomic screening to identify enzymes that act upon sulfated N-acetylglucosamine (GlcNAc). Using multiplexed capillary gel electrophoresis with laser-induced fluorescence detection (xCGE-LIF) -based glycoanalysis we proved their ability to act upon GlcNAc-6-SO4 on N-glycans. Results Our screen identified a sugar-specific sulfatase that specifically removes sulfate from GlcNAc-6-SO4 when it is in a terminal position on an N-glycan. Additionally, in the absence of calcium, this sulfatase binds to the sulfated glycan but does not remove the sulfate group, suggesting it could be used for selective isolation of sulfated N-glycans. Further, we describe isolation of a sulfate-dependent hexosaminidase that removes intact GlcNAc-6-SO4 (but not asulfated GlcNAc) from a terminal position on N-glycans. Finally, the use of these enzymes to detect the presence of sulfated N-glycans by xCGE-LIF is demonstrated. Conclusion The present study demonstrates the feasibility of using functional metagenomic screening combined with glycoanalytics to discover enzymes that act upon chemical modifications of glycans. The discovered enzymes represent new specificities that can help resolve the presence of GlcNAc-6-SO4 in N-glycan structural analyses. Supplementary Information The online version contains supplementary material available at 10.1186/s12934-021-01652-w.

considered an intrinsic part of a secretory protein's overall structure. Glycan structure itself has wide-ranging importance in both basic biology and pharmaceutical science [2][3][4][5]. Aberrant glycosylation has been implicated in numerous diseases [6,7] and changes in N-glycans of serum glycoproteins has yielded promising disease biomarkers [7][8][9]. Additionally, N-glycan composition can affect the function and efficacy of many glycoprotein drugs like certain monoclonal antibodies, growth factors and fusion proteins [10,11], and is a critical quality attribute that is monitored during drug manufacturing.
Protein glycosylation is inherently complex due to the manner in which glycans are assembled and processed by cells. Glycan synthesis does not occur through copying or interpreting a biological template (like DNA, RNA, or protein synthesis). Instead glycans are synthesized via complex biosynthetic pathways involving numerous glycosyltransferases that assemble different monosaccharides into specific structures. There are over 200 glycosyltransferases encoded in the human genome and expression of many varies by cell type, tissue type or in response to environmental factors. This results in substantial glycan structural heterogeneity. For example, over 140 different N-glycan structures have been identified on glycoproteins present in human serum [12]. Adding to this complexity, glycans can also be modified with sulfate, phosphate, methyl or acetyl groups on certain sugars. These groups are collectively termed 'post-glycosylation modifications' (PGMs) and each imparts a variety of biological functions (See [13] for a review).
The study of sulfated N-and O-glycans can be technically challenging and better tools are needed to aid their analysis. Structural characterization of sulfated glycans has been accomplished using mass spectrometry (e.g., LC-MS/MS or MALDI-TOF) of released glycans or glycopeptides [14,[22][23][24][25]. These methods can be compromised by poor ionization of sulfated species and the natural low abundance of sulfated N-and O-glycans. Methods to enrich samples for sulfated glycans could aid their analysis. Glycan-specific binding proteins have recently been used to isolate glycans possessing certain structural features from complex samples [26][27][28][29][30]. However, no sugar sulfate-specific binding proteins have yet been applied to enrich sulfated glycans. Additional analytical methods involving multiplexed capillary gel electrophoresis with laser-induced fluorescence detection (xCGE-LIF) [31][32][33] and high/ultra-performance liquid chromatography coupled to fluorescence detector (H/ UPLC-FLR) methods [34][35][36][37] are also routinely used to separate glycans and match their migration-and retention-times, respectively, to those of known glycans in reference databases. These methods of glycan analysis often use highly specific exoglycosidases to enzymatically confirm the presence or absence of certain sugars [38][39][40]. However, enzymes that remove site-specific sugar sulfation or sulfated sugars are currently missing from the glycoanalysis toolbox.
In this study, we sought to improve the range of analytical enzymes available for analysis of sulfated glycans. We successfully applied high-throughput functional metagenomic screening to identify enzymes that manipulate GlcNAc-6-SO 4 , a sulfated sugar found in some O-glycans and certain mammalian N-glycan outer arms. We devised a novel coupled screening assay using a fluorescent GlcNAc-6-SO 4 analog and an exogenous hexosaminidase. In this assay the coupling enzyme can act upon the substrate and release the fluorophore only if the sulfate group was first hydrolyzed by an enzyme expressed from a metagenomic clone. Our screen identified a sulfatase and hexosaminidase that each act exclusively upon GlcNAc-6-SO 4 . We further employed xCGE-LIF-based glycoanalysis to evaluate these enzymes activity on N-glycan substrates and showed their suitability for structural characterization and enrichment of N-glycans bearing terminal GlcNAc-6-SO 4 .

Functional screening for sulfatases
A fosmid library containing large DNA inserts (~ 40 kb) from human gut microbiota was created in E. coli. The library diversity was assessed by analyzing cloned fragments from 24 randomly picked transformants. Sequencing of fosmid inserts revealed DNA originating from various enteric bacteria species such as Bifidobacterium longum (3 clones), Bifidobacterium adolescentis (4 clones), Bacteroides sp. (2 clones), Phocaeicola dorei (2 clones), Faecalibacterium prausnitzii (1 clone), Adlercreutzia sp. (2 clones) and Collinsella aerofaciens (1 clone). All identified species belonged to the phyla Bacteroides, Actinobacteria or Firmicutes, consistent with the composition of a healthy individual's gut microbiota [41]. Interestingly, 4 clones (> 15% of those analyzed) contained insert DNA from unknown or un-sequenced species. For 5 clones, the quality of the Sanger sequencing data did not permit a conclusion about the origin of the cloned DNA. In a second investigation, 12 clones were subjected to restriction analysis using the endonuclease FseI. Distinct restriction patterns were observed, indicating that each clone possessed a unique cloned DNA fragment (Additional file 1: Fig. S1). The average insert size was estimated to be 30-40 kb. No empty fosmid vector was observed. The library quality both in terms of genetic diversity and insert size was deemed suitable for function-based enzyme screening. As such, a collection of 23,040 clones were individually archived for use in various screens.
In this study, we sought to identify enzymes proficient at removing sulfate from the 6-carbon (C6) of GlcNAc. A coupled assay using the substrate 4-methylumbelliferyl N-acetyl-β-d-glucosaminide-6-sulfate (4MU-Glc-NAc-6-SO 4 ) and an exogenous GlcNAc-6-SO 4 -resistant hexosaminidase was devised ( Fig. 2A). In this assay, the activity of a sulfatase alone is not sufficient to release the 4MU fluorophore. Thus, the assay contains an exogenous hexosaminidase that is blocked by the presence of sulfate at the C6 position of GlcNAc. This enzyme enables fluorescence generation only if the C6-sulfate has been removed by a cloned environmental sulfatase. It is important to note that this screening strategy will also identify hexosaminidases that are able to directly hydrolyze C6-sulfated GlcNAc from 4MU.
A primary screen was conducted using the 4MU-Glc-NAc-6-SO 4 /hexosaminidase coupled assay with 11,520 cell lysates (half the library) comprising the human gut  [15][16][17][18], and sulfated N-glycan structures from Influenza virus (4), human peripheral blood cells (5), human Tamm-Horsfall glycoprotein (7), human pituitary hormones (8) [14] and Panulirus interruptus hemocyanin (9) [19] are shown. Glycans are represented using SNFG nomenclature [69] metagenomic DNA clone collection (see "Methods"). A total of 81 "hits" were identified by measuring an increase in fluorescence over time. A hit was defined as a fluorescence reading 3 standard deviations over the mean background fluorescence value in at least one measured time point. This definition was intentionally liberal to capture all potential hits. The 81 hits were then re-screened using the same assay, but with a more stringent hit definition consisting of a fluorescence reading over 6 standard deviations from the mean control background fluorescence in at least two timepoints. This secondary analysis yielded 24 hits (Fig. 2B) indicating an overall screen hit rate of 0.2%. Enzyme activity in lysates from these 24 clones was then assessed using different substrates: i) 4MU-SO 4 to detect general sulfatase activity, ii) 4MU-GlcNAc (no sulfate) to detect hexosaminidase activity, and iii) 4MU-GlcNAc-6-SO 4 (without exogenous hexosaminidase) to detect the activity of hexosaminidases that are not inhibited by the sulfate moiety. Twenty out of the 24 hits showed activity on 4MU-GlcNAc-6-SO 4 in absence of exogenous hexosaminidase while 11 of these also retained activity on non-sulfated GlcNAc. These data suggest that most fosmids likely encoded a combination of both hexosaminidases and sulfatases. Only 6 hits were active on the general sulfatase substrate 4MU-SO 4 , suggesting the presence of a sulfatase that does not exclusively recognize GlcNAc-6-SO 4 , and implying that the observed sulfatases from other clones may be sugar specific-sulfatases that strictly hydrolyze sulfate located on a sugar ring. These . The GlcNAc molecule is modified at carbon 6 with a sulfate group (SNFG notation, GlcNAc, blue square). In a first reaction, a sulfatase expressed from a metagenomic clone removes the C6 sulfate. In a second reaction, an exogenous hexosaminidase (that is inactive on the sulfated substrate) liberates the fluorophore from GlcNAc, generating a fluorescence signal. B Re-screening human gut metagenomic clones for sulfatase activity. Represented are fluorescence values for the 30 h timepoint. Values above the mean + 6σ (black line) were considered 'hits' . Control lysates from clones carrying an empty pCC1 fosmid (grey circles) were assayed along with the metagenomic clones (blue circles) from the primary screen biochemical observations were further reconciled with nucleotide sequencing of fosmid inserts.

Analysis of fosmid DNA sequences
Two multiplexed Pacific Bioscience (PacBio) libraries were constructed to sequence fosmids isolated from all 24 clones. Nineteen were successfully assembled in single contigs with insert sizes of 30-45 kb. These nucleotide sequences termed F1-F19 were deposited to GenBank under the accession numbers MW677166-MW677184. Five fosmids could not be sequenced properly with attempted assemblies resulting in multiple contigs or failing due to a lack of sequence coverage. Contamination of the fosmid preparation with E. coli genomic DNA, the presence of long repeats, or a co-culture of two distinct clones in the same well of the 384 well plate may be responsible for the failed assemblies [42]. As such, these 5 clones were not further analyzed.
Open reading frames (ORFs) encoded by each of the remaining 19 clones were predicted using MetaGen-eMark. For each ORF, a MegaBLAST search against the NCBI protein repository was performed, and ORF maps for each of the 19 inserts were drawn (Additional file 1: Fig. S2). For two clones (F13 and F15), genes constituting the riboflavin biosynthesis pathway (ribD, ribE, ribB/A and ribH) were observed (Additional file 1: Fig. S2N and S2P) [43]. Multiple metabolites produced by this pathway, including riboflavin and luminazine, are fluorescent compounds having excitation/emission spectra that partially overlap with that of 4MU [44,45]. These clones were also fluorescent in our assay in the absence of the 4MU-GlcNAc-6-SO 4 substrate (Additional file 1: Fig. S3), indicating they were false positives. It is noteworthy that the riboflavin synthesis pathway has been consistently identified in other screens conducted in our laboratory where 4MU substrates have been used. Sequence analysis also showed that 5 clones (F14, F16, F17, F18, and F19) did not encode obvious sulfatase, hexosaminidase, or fluorescent metabolite producing pathways (Additional file 1: Fig. S2O, S2Q, S2R, S2S, S2T). However, these five clones had very low fluorescence signals in secondary screening. As such, 7 fosmids (F13-F19) were dismissed from this initial study and the remaining 12 fosmids (F1-F12) were further analyzed.
The sequences of fosmids F1-F12 were assessed for the presence of sulfatases and hexosaminidases from known protein families (Fig. 3A). In total, 16 sulfatase genes encoded by 9 of the 12 clones were identified by homology, with 7 clones each encoding two putative sulfatases. Interestingly, of the 9 clones having putative sulfatase genes, 7 also encoded a putative hexosaminidase. Overall, 10 hexosaminidases were identified (Fig. 3A). The fosmids F1-F12 were also compared to each other using Circos plots generated by Circoletto (Fig. 3B) [46]. This revealed that clones F3 and F4 both harbored the same region of the Phocaeicola dorei genome and were closely A B Fosmids F1-F12 were tested using four distinct assays: using 4MU-GlcNAc-6-SO 4 with exogenous hexosaminidase (GlcNAc-SO 4 + Hex), 4MU-GlcNAc-6-SO 4 in the absence of exogenous enzyme (GlcNAc-SO 4 ), asulfated 4MU-GlcNAc (GlcNAc), and 4MU-SO 4 (SO 4 ). For each fosmid, shown is its activity in each assay, the predicted number of sulfatase and hexosaminidase genes found within its DNA sequence, the closest species of origin as determined by BLAST against the entire NCBI nucleotide collection, and the size of the cloned insert. B Sequence alignment of fosmids F1-F12. A Circos plot made with Circoletto software illustrates relatedness in the DNA sequences of the 12 clones. Ribbons are colored using BLAST bitscores within Circoletto related to portions of F9. These three clones lacked predicted sulfatase genes but contained the same hexosaminidase gene, suggesting they might encode a protein with the ability to directly release intact GlcNAc-6-SO 4 from 4MU-GlcNAc-6-SO 4 . Clones F8 and F12 were also highly similar fosmids containing the same genome locus of a species related to Bacteroides cellulosilyticus with ~ 60% sequence identity for the analyzed fragments. Finally, a group of 4 highly related sequences was comprised of fosmids F6, F7, F11 and F2 (with F6 and F7 being nearly identical). Considered together, the Circos analysis showed that 8 sulfatase genes out of the 16 initially identified were distinct. The redundancy in fosmids F1-F12 suggests that the number of clones we screened was sufficient to explore the full diversity of the library. Thus, the 8 distinct sulfatase genes and the hexosaminidase gene (common to fosmids F3, F4 and F9) were selected for further biochemical exploration.

Identifying genes encoding active sulfatases using in vitro protein expression
To rapidly test candidate sulfatase genes for activity, each was expressed in vitro using the PURExpress ® in vitro transcription/translation system and assayed for activity ( Fig. 4). Each of the 8 sulfatase genes expressed protein of the expected size (Fig. 4A). Of these, three proteins (F1-ORF13, F6-ORF7 and F8-ORF16) also showed activity on 4MU-GlcNAc-6-SO 4 supplemented with hexosaminidase , but showed no activity in control reactions lacking hexosaminidase (Fig. 4B). Interestingly, F10 encoded two putative sulfatases (F10-ORF22 and F10-ORF23) but neither showed statistically significant activity (Fig. 4B).

Protein sequence analysis of active sulfatases
The deduced protein sequences of the 3 active sulfatases (F1-ORF13, F6-ORF7 and F8-ORF16) were compared to proteins present in GenBank. All three enzymes had strong homology to enzymes from Phocaeicola and Bacteroides species (Additional file 1: Fig. S4A). During secondary screening, clone F1 was the only clone that retained activity solely on 4MU-GlcNAc-6-SO 4 in presence of an exogenous hexosaminidase (Additional file 1: Fig. 3A). As such, F1-ORF13 was chosen for further biochemical characterization.

Determination of F1-ORF13 sulfatase specificity using sulfated monosaccharides
To determine the sulfate specificity of F1-ORF13, its activity on the sulfated monosaccharides GlcNAc

F1-ORF13 sulfatase activity on GlcNAc-6-SO 4 in intact N-glycans
While our screen was designed to isolate enzymes that act upon a sulfated monosaccharide analog 4MU-GlcNAc-6-SO 4 , a broader aim of the study was to find enzymes that act on sulfated GlcNAc in the context of an intact N-glycan. As such, the activity of the F1-ORF13 sulfatase was evaluated on a sulfated N-glycan substrate isolated from a mammalian glycoprotein via analysis by electrophoretic separation using xCGE-LIF. To enable this experiment, the F1-ORF13 sulfatase was cloned and expressed in vivo in E. coli (Additional file 1: Fig. S6). Partially purified enzyme was incubated with N-glycans enzymatically released from human immunoglobin A (hIgA) and reductively labelled with the fluorophore aminopyrene trisulfonic acid (APTS). HIgA contains 2 or 5 N-glycans per heavy chain for the subclasses A1 or A2, respectively [4].  [47,48]. Finally, F1-ORF13 was tested on a mixture of total N-glycans released from hIgA and subsequently desialylated and degalactosylated. In this context, F1-ORF13 was also able to hydrolyze SO 4 from terminal GlcNAc-SO 4 (Fig. S9). These experiments support the conclusion that the F1-ORF13 sulfatase can remove sulfate from GlcNAc-6-SO 4 in the context of an intact N-glycan, but only if it is the terminal residue in an outer-arm branch. The FA2G0-SO 4 substrate was also used to determine pH and ion requirements of F1-ORF13 via xCGE-LIF. Reactions were performed in triplicate at each pH and yield was measured by integration of the peaks observed by xCGE-LIF (Additional file 1: Fig. S10). The enzyme is active over pH 5-10 with a peak around pH 8-9 (Fig. 6B). Consistent with other proteins of this enzyme family, F1-ORF13 is calcium-dependent (Fig. 6C) [49,50]. No hydrolysis of sulfate from FA2G0-SO 4 was detected unless Ca 2+ ions were present (Fig. 6C). Nevertheless, in absence of calcium, the height of the peak corresponding to FA2G0-SO 4 decreased upon addition of F1-ORF13 sulfatase (Additional file 1: Fig. S11A) suggesting that F1-ORF13 was binding to the substrate.  . Signal Intensities were normalized to the total peak height, resulting in relative peak height proportions (%). Peaks marked with (*) correspond to the internal standard used for migration time normalization in xCGE-LIF. Glycans are represented using the SNFG nomenclature [69] in the absence of calcium. After incubation, samples were either directly cleaned-up with HILIC-SPE or first digested with proteinase K and then, cleaned with HILIC-SPE (Additional file 1: Fig. S11). Proteinase K digestion was used to destroy F1-ORF13 and thus disrupt a possible F1-ORF13-FA2G0-SO 4 complex. Samples directly cleaned with HILIC-SPE, showed a clear decrease in the amount of FA2G0-SO 4 (Additional file 1: Fig. S11A). Reduction of the amount of substrate was F1-ORF13 concentration-dependent. In the samples treated with proteinase K followed by HILIC-SPE clean-up, no significant change in the relative substrate peak height was detected (Additional file 1: Fig. S11B). This confirmed the inability of F1-ORF13 to hydrolyze SO 4 from GlcNAc in the absence of calcium as well as its capacity to recognize and bind to GlcNAc-SO 4 -containing glycans. This characteristic of F1-ORF13 expands its potential as a tool to study N-glycan sulfation. For example, in the absence of calcium it can be used as a terminal GlcNAc-SO 4 N-glycan-binding protein while in the presence of calcium it is a GlcNAc-SO 4 sulfatase. The use of an enzyme in its apo form is an approach already employed for some nonconsuming substrate sensors [51].

Identifying genes encoding active hexosaminidases using in vitro protein expression
Two hexosaminidase candidates were investigated: F3-ORF26 (a gene common to F3, F4 and F9, described above) and F10-ORF19. Neither of the two putative sulfatases encoded by clone F10 showed statistically significant activity on 4MU-GlcNAc-6-SO 4 (Fig. 4B). Yet, clone F10 demonstrated activity on 4MU-GlcNAc-6-SO 4 in the presence and in the absence of exogenous hexosaminidase (Fig. 3A). We hypothesized that the putative hexosaminidase gene also encoded by this clone (F10-ORF19) might be solely responsible for the fluorescent signal detected during screening. To test candidate hexosaminidase genes for activity, both genes (F3-ORF26 and F10-ORF19) were expressed in vitro using the PURExpress ® system and the produced proteins were assayed (Fig. 7).

Protein sequence analysis of active hexosaminidases
The protein sequences of F10-ORF19 and F3-ORF26 hexosaminidases were compared and were 62.5% similar (Additional file 1: Fig. S12A). F3-ORF26 contained a C-terminal region of ~ 230 amino acids absent in F10-ORF19. Both proteins belong to glycoside hydrolase family 20 (GH20) and perfectly matched annotated N-acetyl-hexosaminidases protein sequences from Bacteroides caccae (F10-ORF19) and Phocaeicola dorei (F3-ORF26). To the best of our knowledge, neither the Bacteroides caccae nor Phocaeicola dorei hexosaminidases have been previously biochemically characterized. However, hexosaminidases from other organisms that are able to hydrolyze sulfated GlcNAc residues have been reported under the names 'sulfoglycosidase' or 'mucindesulfating glycosidase' [52,53]. Originating from Prevotella or Bifidobacterium bifidum these enzymes were tested on synthetic p-nitrophenyl substrates or mucin type O-glycans but not on more complex structures like sulfated N-glycans. A phylogenetic tree showing the evolutionary relationship between F10-ORF19, F3-ORF26 and other GH20 members is shown in Additional file 1: Fig. S12B. The F10-ORF19 hexosaminidase from Bacteroides caccae was selected for further biochemical characterization on sulfated N-glycans. Hexosaminidases were assayed on 4MU-GlcNAc-6-SO 4 and its asulfated counterpart 4MU-GlcNAc investigated by xCGE-LIF on the pool of hIgA APTSlabelled glycans previously digested with both sialidase A and β(1-4,6)-galactosidase to expose GlcNAc-SO 4 at the terminal end ( Fig. 8A black electropherogram). F10-ORF19 was capable of hydrolyzing GlcNAc-SO 4 while showing no activity on asulfated GlcNAc (Fig. 8A blue electropherogram). In the presence of a large excess of enzyme, trace activity on asulfated GlcNAc is observed (data not shown) but in the conditions reported here, the enzyme showed selectivity for sulfated GlcNAc residues. This specificity contrasts with that of a wellcharacterized hexosaminidase from Streptococcus pneumoniae that is commonly used in glycoanalytics (β-N-acetylglucosaminidase S) and hydrolyzed both sulfated and non-sulfated GlcNAc as well as bisecting Glc-NAc ( Fig. 8A shaded pink box). F10-ORF19 was active from pH 4 to 7 with an optimal activity around pH 5 (Fig. 8B). F10-ORF19 also showed no activity on urokinase APTS-labelled FA2FGalNAc-(SO 4 ) 2 (Additional file 1: Fig. S8B) indicating it does not act upon sulfated GalNAc and is specific for GlcNAc-6-SO 4 .

Discussion
We created a human enteric metagenomic library and explored it for clones encoding sugar-specific sulfatases using function-based enzyme screening with a coupled assay strategy. We sought enzymes that could address site-specific sulfation in N-glycans and potentially be used as analytical tools. Our screen identified both sulfatases and hexosaminidases that act specifically upon GlcNAc-6-SO 4 . Two enzymes isolated from our screen, a sulfatase (F1-ORF13) and a hexosaminidase (F10-ORF19), were cloned, purified, and further characterized. We showed that F1-ORF13 is a calcium-dependent sulfatase that exclusively removes sulfate from carbon-6 of GlcNAc. The enzyme will catalyze sulfate removal from both a GlcNAc-6-SO 4 monosaccharide or GlcNAc-6-SO 4 when it is in the terminal position of an N-glycan outer-arm. Furthermore, in the absence of Ca 2+ ions, the apo-enzyme will act as a GlcNAc-6-SO 4 glycan binding protein (GBP). Lectins and GBPs have been employed in glycoanalytic workflows to enrich or deplete glycans/ glycopeptides bearing specific epitopes [27,29,54] from a complex mixture but none specifically address sulfated glycosylation to date. This feature is of particular . B F10-ORF19 optimal pH. Optimal pH for F10-ORF19 was determined using the hIgA-APTS labelled glycan pool as substrate and following product formation by xCGE-LIF. The x-axis (time) of the electropherograms was normalized to two internal standards by glyXtool CE , resulting in double normalized migration time units (MTU″). Signal Intensities were normalized to the total peak height, resulting in relative peak height proportions (%). Glycans are represented using SNFG nomenclature [69] interest considering the low abundance of sulfated glycans in most biological samples. Traces of GlcNAc-6-SO 4 -containing N-glycans could be captured prior to analysis, enabling their detection otherwise concealed by non-sulfated structures. An additional study further evaluating F1-ORF13 as a GlcNAc-6-SO 4 GBP for use in glycoanalytics is ongoing. Finally, we demonstrated that the F10-ORF19 hexosaminidase is capable of hydrolyzing GlcNAc-6-SO 4 from the terminal position of N-glycans while showing negligible activity on asulfated GlcNAc. Considered together, the narrow specificity of both enzymes for terminal GlcNAc-6-SO 4 in N-glycans and their ability to be used with other exoglycosidases highlights their utility as novel tools to confirm the presence or absence of GlcNAc-6-SO 4 in glycoanalytics.
The F1-ORF13 sulfatase belongs to the S1 family of sulfatases in which members possess an intriguing modification of a critical serine or cysteine to formylglycine (FGly) forming two subtypes Ser-type and Cys-type S1 sulfatases [49,55]. Discovered by von Figura's group in 1995, the FGly modification is unique to S1 sulfatases and thought to be essential for their activity [56]. To date, two main enzymatic systems that enable formation of FGly have been identified: (1) formylglycine generating enzymes (FGEs) found in eukaryotes and some prokaryotes, and (2) anaerobic sulfatase maturating enzymes (anSMEs) exclusively found in prokaryotes [57,58]. Surprisingly, in our work, expression of active F1-ORF13 was obtained both in vitro and in vivo in E. coli in the absence of an obvious FGE or anSME. Two hypotheses may explain these observations. First, E. coli might have an intrinsic ability to convert F1-ORF13 Ser94 to FGly. Similar observations have been reported for an active Klebsiella Ser-type sulfatase that was successfully produced in E. coli [59]. Sulfatase maturation systems in E. coli are still not completely understood, yet two putative anSME proteins (aslB and ydeM) have been described, and a third, still unidentified FGly-forming system is suspected to exist [57]. We also investigated the potential presence of a putative sulfatase maturation system within fosmid F1. While none of the 29 ORFs predicted on F1 shared homology with known FGEs or anSMEs, F1-ORF2 encoded a hypothetical protein containing a predicted 4Fe-4S cluster, a known hallmark of sulfatase maturating enzymes [60]. However, our initial attempts to co-express F1-ORF2 and F1-ORF13 did not increase the activity of F1-ORF13 (data not shown). A second hypothesis is that the F1-ORF13 sulfatase might possess some activity even in the absence of the FGly modification. This notion is supported by the activity of F1-ORF13 we detect after in vitro expression.
The assay system we used in this study proved successful at identifying enzymes that act on or require a highly specific chemical modification of GlcNAc in highthroughput screening. While functional metagenomics identified enzyme candidates, xCGE-LIF was used to evaluate their capability to act on real N-glycan substrates. Extending the activity testing of enzyme candidates to natural carbohydrates was essential to prove their utility in glycoanalytics. Our screening assay permitted identification of enzymes that address sulfation of the 6-carbon of GlcNAc. However, it could be easily adapted to screen for enzymes that address different chemical modifications, a different position of modification, or modification of other sugars. For example, it would be of interest to identify sulfatases that can specifically act on GalNAc-4-SO 4 or Gal-3-SO 4 , both modifications typically found on N-glycans, a class of glycoconjugates for which the impact of sulfate groups has been poorly studied [14]. In addition, the assay could be extended to identify enzymes that act upon sugars having other PGMs (e.g., methyl, acetyl, phosphate, etc.) that are found in a wide range of eukaryotic glycans [61]. Furthermore, little is known about enzymes that hydrolyze zwitterionic glycan modifications (e.g., phosphoethanolamine, phosphocholine) that are commonly found on glycolipid anchor glycans [62], bacterial biofilm cellulose [63] and invertebrate N-glycans [61]. The biological importance of PGM modifications appears vast [13,64] and expansion of the analytical enzyme toolbox to enable their characterization will help the field gain further insight into the roles of PGMs in glycobiology.

Conclusions
We used functional metagenomics screening to identify enzymes that act upon GlcNAc-6-SO 4 , a glycan modification observed in both N-and O-glycans. Using glycoanalytics, we showed that two identified enzymes, a sulfatase (F1-ORF13) and a hexosaminidase (F10-ORF19), were able to remove sulfate from carbon-6 of GlcNAc and GlcNAc-6-SO 4 from the terminal position of N-glycans, respectively. Additionally, the F1-ORF13 sulfatase binds to GlcNAc-6-SO 4 in absence of its calcium cofactor, a promising feature for the enrichment of GlcNAc-6-SO 4 -containing glycans or glycopeptides. The precise specificities of these enzymes and their ability to act on GlcNAc-6-SO 4 in N-glycans highlights their potential as tools to enhance N-glycan structural analysis.

Human gut microbiome DNA isolation
Human microbiome DNA was isolated from a fecal sample (Donor T3806, Lee Biosolutions, Maryland Heights, MO, USA) of a healthy 29-year-old man. Microbial cells were extracted from 100 mg of fecal material using PBS as detailed in the supplementary information. To proceed with cell lysis, the cell pellet was resuspended in 500 µL of light TE buffer (10 mM Tris-HCl pH 8.0, 0.1 mM EDTA). To this cell suspension, 100 µL of lysozyme (10 mg/mL) and 1 µL of RNAse A (100 mg/mL) were added. The sample was incubated at 37 °C for 30 min, after which, 20% SDS was added to a 0.5% final concentration. Proteinase K in a ratio of 40 µL per milliliter of sample was added and incubated for 20 min at 37 °C. Lysate was centrifuged at 10,000×g for 10 min at 4 °C to pellet cell debris. The supernatant was transferred to a clean tube with a wide bore pipette tip to avoid shearing the DNA. DNA was isolated by addition of an equal volume of phenol:chloroform:IAA (25:24:1). For cleaner DNA, phenol:chloroform:IAA extraction was performed twice followed by isopropanol precipitation. DNA of 30-70 kb was size selected from a 1% low melting point agarose gel run overnight at 4 °C at 35 V. DNA was recovered from the gel using 1 U of β-agarase for each 100 µL of melted agarose.

Metagenomic library construction
The metagenomic library was constructed using the CopyControl ™ Fosmid Library Production kit as recommended and described [65]. Generated clones were then packaged into λ phages for transfection of E. coli following the manufacturer's instructions. A small portion of the packaging reaction was used to transfect EPI300-T1 R cells to determine the phage titer. The remainder was stored as a 20% glycerol stock at − 80 °C until large scale transfection of EPI300-T1 R was performed. From the large-scale transfection, 23,040 clones were picked and archived in 60 × 384 well plates as micro-culture glycerol stocks by BioS&T (Montreal, Canada).
To assess the quality of the library, 24 clones were picked and their fosmids isolated using the FosmidMAX DNA purification kit. One microgram of fosmid DNA was digested with 2U of FseI restriction enzyme and then separated on a 1% agarose gel. For Sanger sequencing 100 ng of fosmid DNA were mixed with the T7 universal primer or the pCC1 reverse primer (Lucigen corporation) (Additional file 1: Table S1) in 10 µL reactions.

Screening for sulfated glycan using a coupled assay
Clones from the metagenomic library were cultured in 384 well plates with 50 µL LB (10 g tryptone, 5 g yeast extract, 10 g NaCl, 1 g dextrose, 1 g MgCl 2 -6H 2 O, 2 mL of 2 M NaOH per liter) containing 12.5 μg/mL chloramphenicol and 1X inducing solution. After overnight incubation at 37 °C, 50 µL of Y-PER ™ lysis buffer supplemented with 40 μg/mL of 4MU-GlcNAc-6-SO 4 and 1 U/mL of β-N-Acetylhexosaminidase f were added to each well. The 384 well plates were incubated at 37 °C in a static incubator for 48 h. The fluorescence signal emitted from each plate was monitored about six times (typically after 1, 3, 6, 24, 30 and 48 h) over the course of the experiment using a SpectraMax Plus 384 Microplate reader (Molecular Devices, San Rose, CA, USA).

PacBio sequencing
Fosmid DNA was isolated using the FosmidMAX DNA purification kit following the manufacturer's instructions. Two 10-kb multiplexed SMRT-bell libraries were prepared, each enabling sequencing of 12 fosmids using a single SMRT cell. Fosmid DNA was first sheared into 10-kb pieces using g-tubes (Covaris, Woburn, MA, USA) as described [54]. Multiplexed libraries were prepared as recommended by PacBio. Briefly, each fosmid was individually end-repaired and ligated to PacBio barcoded adapters. After ligation, all twelve reactions were pooled together. Damaged DNA was repaired and fragments not ligated to SMRT bell adaptors degraded using exonucleases. To remove SMRT-bell templates smaller than 8 kb from the SMRT-bell libraries, a Blue-Pippin size selection (Sage Science, MA, USA) was performed following the manufacturer's instructions using 0.75% agarose cassette.
Size-selected libraries were sequenced on a Pacific Bioscience RSII instrument with the P6 chemistry. One single-molecule real-time (SMRT) cell was used for each library and data was collected for 360 min. For each library, reads were first demultiplexed into 12 sets of reads, one set per barcode, each containing data for a single fosmid. After demultiplexing, the HGAP.3 protocol was used to de novo assemble the reads. Contigs were circularized with the Circlator program [66]. Finally, ORFs were predicted using MetaGeneMark [67]. Annotation of predicted ORFs was performed by BLASTp against the entire NCBI protein database; ORFs with a significant match to proteins with known function were annotated with the same function; ORFs with no significant match to known proteins were annotated as hypothetical. An ORF map was thus created for each properly assembled fosmid using the Geneious software (https:// www. genei ous. com) after the sequence of the pCC1 backbone was trimmed.

In vitro expression of ORFs of interest
ORFs of interest were expressed in vitro using the PUR-Express ® kit. ORFs were amplified from the fosmid using primers designed following the manufacturer's instructions (Additional file 1: Table S1). After PCR, linear templates were purified with the Monarch PCR & DNA clean-up kit. A 50 µL PURExpress ® reaction was then prepared by mixing 500 ng of template, 20 µL of solution A, 15 µL of solution B and 1 µL of RNase inhibitor murine. In vitro expression was performed for 2 h at 37 °C. In vitro expressed proteins were then assayed in triplicate using 10 µL of PURExpress ® mixture, 2 µL of 4MU-GlcNAc-6-SO 4 or 4MU-GlcNAc (100 µg/mL) for hexosaminidase activity. For sulfatase activity 5U of β-N-Acetylhexosaminidase f were also added to the reactions. Incubation was at 37 °C and activity determined by measuring fluorescence at 365/445 nm using a SpectraMax Plus 384 microplate reader (Molecular Devices, San Rose, CA, USA).

F1-ORF13 sulfatase and F10-ORF19 hexosaminidase in vivo expression and purification
F1-ORF13 sulfatase and F10-ORF19 hexosaminidase were cloned for in vivo expression. F1-ORF13 was synthesized with codon optimization for expression in E. coli and cloned with a C-terminal 6xHis-tag into a pET21a( +) vector by GenScript (Piscataway, NJ, USA). F10-ORF19 hexosaminidase was cloned without its predicted signal sequence (residues 1-22) and with a C-terminal 6-His-tag using the NEBuilder ® HiFi DNA Assembly Cloning kit. Primers were designed using the NEBuilder ® online tool (https:// nebui lder. neb. com/) (Additional file 1: Table S1). The pET28c( +) vector was modified to contain a transcription terminator sequence upstream of the promoter region for tighter regulation of protein expression.
F1-ORF13 and F10-ORF19 were expressed in NEB T7 Express E. coli cells. For each ORF, a 1L culture of LB containing 100 µg/mL ampicillin was inoculated with a single colony and induced for protein expression by addition of IPTG to 0.4 mM final concentration once the culture OD 600 reached 0.4-0.6. Induction was performed overnight at 16 °C. Cells were harvested and resuspended in loading buffer (20 mM sodium phosphate, 500 mM NaCl, 20 mM imidazole, pH 7.4). Lysis was performed using a HPL6 High Pressure Homogenizer (Maximator GmbH, Nordhausen, Germany) for 3 passes at 10 kPsi. Proteins were purified on a HisTrap Fast Flow column. Bound proteins were eluted from the column with a linear gradient of imidazole concentrations from 20 to 500 mM. Fractions containing the protein of interest were pooled and dialyzed against imidazole-free buffer (20 mM sodium phosphate, 500 mM NaCl, 1 mM EDTA, pH 7.4) using a 3.5 kDa MWCO Slide-A-Lyzer dialysis cassette.

F1-ORF13 activity on sulfated monosaccharides
Specificity of F1-ORF13 was tested on the following sulfated monosaccharides: GlcNAc-6-SO 4 , GlcN-6-SO 4 , GalNAc-4-SO 4 , GalNAc-6-SO 4 and Gal-4-SO 4 . Reactions were performed in 20 µL volume with 10 µL of in vitro PURExpressed F1-ORF13 or PURExpress ® control (no DNA template) mixtures in 25 mM HEPES pH 8.3, 10 mM CaCl 2 with 160 µM sulfated monosaccharide. Incubation was at 37 °C overnight. Proteins were precipitated by addition of 80 µL of cold acetone to each 20 µL reaction and incubation at − 20 °C for 1 h. Samples were centrifuged at 14,000×g for 10 min at 4 °C and supernatant recovered. An additional 500 µL of acetone was added to the pellet, samples were quickly vortexed and centrifuged a second time. The supernatant was recovered and pooled with the previous one. Pooled supernatants were dried in a speed vacuum concentrator (Martin Christ Gefriertrocknungsanlagen GmbH, Osterode am Harz, Germany) and resuspended in 125 µL of water. Samples were analyzed on a Dionex ISC-5000 + a high-performance anion exchange chromatography system with pulsed amperometric detection (HPAEC-PAD) (see Additional file 1 for further details).

Enzyme activities on N-glycans N-glycan release
N-glycans were released from hIgA or human urokinase with PNGase F. To that end, four aliquots each containing 420 µg of urokinase or 210 µg of hIgA were mixed with 2% SDS for a final SDS concentration of ~ 0.3%. Aliquots were incubated for 10 min at 60 °C. SDS was then neutralized by addition of 8% IGEPAL ® in PBS to a final concentration of 1.8%. Enzymatic glycan release was achieved with 7 U of PNGase F and aliquots incubated for 3 h at 37 °C.

APTS-labelling
Released N-glycans were dried and labelled via reductive amination with 1-aminopyrene-3,6,8-trisulfonic acid APTS as previously described [68]. Briefly, 28 µL of MQ water, 28 µL of 20 mM APTS in 3.6 M citric acid and 28 µL of reducing agent were added to each of the four aliquots. Labelling was performed for 3 h at 37 °C. A volume of 1.4 mL of 80% acetonitrile (in water) was added to each of the four aliquots prior to pooling them together. Labelled N-glycans were cleaned-up using HILIC-SPE as previously described [68].

N-glycan fractionation
Labelled glycans were fractionated on a Dionex Ultimate 3000 HPLC equipped with a Jasco FP-2020 Plus fluorescence detector (see details in Additional file 1). Fractions containing N-glycans of interest were dried and resuspended with MQ water for use as enzyme substrates.

Substrate preparation
Substrates used for testing F1-ORF13 activity were the following APTS-labelled N-glycans: desialylated and degalactosylated APTS-labelled hIgA N-glycan pool, FA2G2S2-SO 4 , FA2G2-SO 4 , FA2G0-SO 4 and an N-glycan composed of the core FA2 with one branched fucose and two terminal sulfated GalNAcs (termed here FA2FGal-NAc-(SO 4 ) 2 ). FA2G2S2-SO 4 and FA2FGalNAc-(SO 4 ) 2 were isolated from hIgA and urokinase respectively by N-glycan fractionation as described in the supplementary information. FA2G2-SO 4 was prepared from FA2G2S2-SO 4 by digestion with sialidase A. Purified FA2G2S2-SO 4 resulting from fractionation was dried, treated with sialidase A as recommended by the supplier and digest performed overnight at 37 °C. Digested N-glycans were cleaned-up by HILIC-SPE as described previously [68] with the exception that the 5 washes were omitted. FA2G0-SO 4 was subsequently prepared by digesting FA2G2-SO 4 with β(1-4,6)-galactosidase, following the supplier instructions. Reaction was incubated overnight at 37 °C and cleaned-up by HILIC-SPE as described for the sialidase A digest. Total N-glycans from hIgA were desialylated and degalactosylated using sialidase A and β(1-4,6)-galactosidase as described above.

F1-ORF13 sulfatase activity on N-glycans released from hIgA and human urokinase
To test F1-ORF13's metal ion requirement, 10 µl reactions were performed as described above using HEPES pH 8.3 as buffer and 1 µL of 100 mM solution of CaCl 2 , MgCl 2 , MnSO 4 or ZnSO 4 . 1 µL of MQ water was used for the control sample. Incubation was performed for 3 h at 37 °C prior to HILIC-SPE clean-up and xCGE-LIF analysis.
To investigate the ability of F1-ORF13 to bind the APTS-labelled FA2G0-SO 4 N-glycan substrate in absence of calcium, 10 µL solutions containing 2 µL of MES buffer pH 5.5 and 0.6 or 6 µg F1-ORF13 were incubated at 37 °C for 3 h. Reactions were performed in duplicate. One set of reactions was directly subjected to HILIC-SPE, the other was digested with 1 µL of 20 mg/mL proteinase K (AppliChem GmbH, Darmstadt, Germany). Proteinase K treated samples were incubated for 2 h at 42 °C, cleaned with HILIC-SPE, and analyzed by xCGE-LIF.