Discovery of novel enzymes with industrial potential from a cold and alkaline environment by a combination of functional metagenomics and culturing

Background The use of cold-active enzymes has many advantages, including reduced energy consumption and easy inactivation. The ikaite columns of SW Greenland are permanently cold (4-6°C) and alkaline (above pH 10), and the microorganisms living there and their enzymes are adapted to these conditions. Since only a small fraction of the total microbial diversity can be cultured in the laboratory, a combined approach involving functional screening of a strain collection and a metagenomic library was undertaken for discovery of novel enzymes from the ikaite columns. Results A strain collection with 322 cultured isolates was screened for enzymatic activities identifying a large number of enzyme producers, with a high re-discovery rate to previously characterized strains. A functional expression library established in Escherichia coli identified a number of novel cold-active enzymes. Both α-amylases and β-galactosidases were characterized in more detail with respect to temperature and pH profiles and one of the β-galactosidases, BGalI17E2, was able to hydrolyze lactose at 5°C. A metagenome sequence of the expression library indicated that the majority of enzymatic activities were not detected by functional expression. Phylogenetic analysis showed that different bacterial communities were targeted with the culture dependent and independent approaches and revealed the bias of multiple displacement amplification (MDA) of DNA isolated from complex microbial communities. Conclusions Many cold- and/or alkaline-active enzymes of industrial relevance were identified in the culture based approach and the majority of the enzyme-producing isolates were closely related to previously characterized strains. The function-based metagenomic approach, on the other hand, identified several enzymes (β-galactosidases, α-amylases and a phosphatase) with low homology to known sequences that were easily expressed in the production host E. coli. The β-galactosidase BGalI17E2 was able to hydrolyze lactose at low temperature, suggesting a possibly use in the dairy industry for this enzyme. The two different approaches complemented each other by targeting different microbial communities, highlighting the usefulness of combining methods for bioprospecting. Finally, we document here that ikaite columns constitute an important source of cold- and/or alkaline-active enzymes with industrial application potential.


Background
Many industrial and biotechnological applications make use of cold-active enzymes or could benefit from the use of such enzymes as they enable these processes to run at low temperature. Such processes may save energy and production costs, improve hygiene, maintain taste and other organoleptic properties, and reduce the risk of contaminations. In addition, cold-active enzymes are heat labile and can easily and selectively be inactivated by moderately elevated temperatures. Cold-active enzymes may be used in fine chemical synthesis, environmental biotechnology, production of biofuels and energy, and in the food and feed, detergent, pharmaceutical, medical and textile industries [1]. Approximately 75% of the Earth's biosphere is cold (less than 5°C) [2] and consequently, bacteria producing cold-active enzymes can be found in numerous habitats. Bioprospecting for cold-active enzymes has been conducted in many environments including Antarctic soil [3] and sediments [4], Arctic and Subarctic glaciers [5][6][7][8], the deep sea [9] and permafrost soils [10]. Also, enzymes that are active at high pH are of industrial interest, since these are used in food and feed, textile, waste management, medical and detergent industries [11]. Especially the detergent industry is of commercial interest, and alkaline proteases, amylases, cellulases and lipases are all used in detergents [12]. Thus, the combination of cold-and alkaline-active enzymes could be used in detergents for environment-friendly, low temperature washing.
The ikaite columns of SW Greenland are submarine tufa columns formed over alkaline springs by precipitation of the metastable hexahydrate of calcium carbonate, called ikaite [13]. The columns represent a permanently cold (4-6°C) and alkaline (above pH 10) ecological niche of moderate salinity (ca. 10‰) and together with the ice-covered Lake Untersee [14] and a series of small ponds [15] in the Antarctica, the ikaite columns constitute one of the very few permanently cold and alkaline environments on Earth. They harbor a rich microbial community adapted to these conditions and a significant part of the bacteria isolated from the ikaite columns represent previously uncharacterized species and genera [16,17]. So far, four new bacterial strains with cold-active enzymes have been described in detail; the α-amylaseand protease-producing Arsukibacterium ikkense [18], the α-amylase-, α-, β-galactosidaseand β-glucuronidase-producing Alkalilactibacillus ikkensis [19], the phosphatase-, esterase-, protease-and βgalactosidase-producing Rhodonellum psychrophilum [20] as well as a lipase-producing γ-Proteobacterium [21]. Work conducted during the last decade on the ecology and microbial enzymes from the ikaite columns has recently been reviewed, highlighting the ikaite columns as a unique biological environment with good prospects for finding novel bacterial species and enzymes for industrial applications [22].
It is well established that only a fraction of the total bacterial diversity can be cultured in the laboratory [23] and that most bacterial phyla have no cultured representatives [24]. Previous work on the ikaite columns has focused on cultured isolates and it was recently reported that various attempts to optimize culturing conditions for ikaite column bacteria only improved the total diversity covered marginally [25]. In order to circumvent this problem, a combined culture dependent and independent approach was taken in a search for cold-active enzymes from the cold and alkaline ikaite columns. In this report, traditional screening of strain collections was coupled with screening of a functional expression library. Furthermore, a metagenomic sequence of the expression library was established and searched for putative enzyme-encoding sequences allowing comparisons between the discovery rate and value of the different approaches. Several cold-and alkaline-active enzymes were identified and a few selected α-amylases and β-galactosidases were characterized in more detail.

Culture dependent approach
Ikaite column material collected during expeditions over the last decade was used to establish a strain collection of 322 cultured isolates. The strain collection was screened for nine different enzymatic activities at pH 10, and 203 enzyme producing strains were identified ( Table 1). The dominating activities were phosphatase, α-galactosidase, protease and α-amylase followed by βgalactosidase and β-glucanase. Only one positive was found for both cellulase and β-xylanase. Many of the activities were identified in strains showing more than one activity, with the combinations phosphatase/protease (32 isolates), and α-galactosidase/β-galactosidase (20 isolates) being dominant. β-Galactosidase, β-glucanase, cellulase and β-xylanase were not found as single activities. Most of the activities were produced at all temperatures tested (10°C, 20°C and 28°C) (data not shown), indicating that the majority of strains are psychrotolerant and not true psychrophiles. More importantly, it also demonstrates that the ikaite column isolates produce industrially relevant extracellular enzymes active at both high pH and low temperature. Phylogeny determined by 16S rRNA gene analyses on 65 randomly picked strains representing all enzyme groups showed that most strains were either α-(15%) or γ-(69%) Proteobacteria. A previous study of randomly picked cultured isolates from the ikaite columns identified Firmicutes, CFB-group, α-Proteobacteria and γ-Proteobacteria [16]. The observation that γ-Proteobacteria dominate the cultured species with enzymatic activities is in agreement with two of the already characterized enzyme producing species, A. ikkense [18] and a lipaseproducing strain [21], both being γ-Proteobacteria. In a A total of 322 cultured isolates were screened for 9 different enzyme activities at pH 10 and 10, 20 and 28°C. The closest relative was determined for selected isolates by partial sequencing of the 16S rRNA gene. similar study of a strain collection from Arctic sea ice, 70% of the isolates were γ-Proteobacteria [26], highlighting the easy cultivation of this class of bacteria. Almost half of the selected enzyme producing isolates were closely related to A. ikkense (32 of 65 sequenced isolates) and six showed sequence similarity to R. psychrophilum, indicating that the re-discovery rate was high as most strains were closely related to previously characterized strains.
The one strain with activity on cellulose was related to Demequina aestuarii, a mesophilic Actinobacterium isolated from a tidal sediment in South Korea [27] and to the cellulolytic Cellulomonas fermentans [28]. The isolate could be involved in degradation of algae in ikaite columns, since it showed activity on cellulose and xylan, which are known components of algal cell walls.

Functional expression approach Diversity analysis
Although a large number of enzyme-producing strains were identified in the culture based approach, their phylogenetic affiliations were similar, highlighting the bias introduced by cultivation. To obtain cold-active enzymes from other groups of bacteria, a culture independent approach based on functional screening of an expression library was included. This would also allow for a more direct route to recombinant expression of enzymes in a relevant host. Intact bacterial cells were extracted from fresh ikaite material from 10 different columns. DNA from these cells was isolated and because the amount of extracted DNA was too low for direct library construction, the DNA was amplified using multiple displacement amplification (MDA) prior to generation of the functional expression library. Both the cell extraction and MDA was expected to introduce bias in the final DNA pool. In order to gain insight into this bias, pyrosequencing of the 16S rRNA gene was performed on (i) DNA extracted directly from the columns, (ii) DNA extracted after separation of intact cells, and (iii) DNA after MDA treatment ( Figure 1). The initial diversity was dominated by Proteobacteria, Firmicutes, Cyanobacteria and Bacteroidetes, in agreement with a previous analysis of ikaite columns [22], and as expected, significant bias was introduced in each step together with a concomitant loss of diversity. The resulting DNA used for the expression library consisted mainly of Firmicutes followed by candidate division GN02 and Proteobacteria. Cyanobacteria, Bacteroidetes and Actinobacteria were lost in the MDA step. At the class level, BD1-5 from the phylum GN02 and β-Proteobacteria were selected during the cell extraction. The MDA step is known to be heavily biased on complex communities [29] and in this case the MDA reaction favored BD1-5 and Clostridia. Diversity down to the genus level of the different DNA samples is given in Table 2. The most abundant operational taxonomic units (OTUs) in the total DNA were related to the genera Rhodobaca and Thioalkalivibrio, while OTUs related to Fusibacter, Proteiniclasticum, Tindallia/Anoxynatronum and Alkaliphilus dominated the MDA DNA used for the expression library. Proteiniclasticum is a proteolytic genus [30], the genera Alkaliphilus [31] and Tindallia/Anoxynatronum [32] are alkaliphiles and the presence of OTUs related to these is most likely a result of the alkaline conditions inside the ikaite columns. The four genera dominating the MDA DNA (including Fusibacter [33]) are anaerobic and are consequently not expected to be found in the strain collection. The OTUs related to Rhodobaca, dominating the total DNA extraction, are known to be capable of both phototrophic and chemotrophic growth [34]. They were still present after cell extraction, but were almost completely lost at the MDA step (data not shown). The γ-Proteobacteria dominating the enzyme producers in the strain collection were not identified in the MDA DNA used for the library (data not shown), and the anaerobic genera found in the library were not found in the strain collection, confirming that the culture independent  Figure 1 Changes in microbial diversity during library preparation. Diversity shifts at phylum (left) and class (right) level of samples introduced during preparation of DNA for the expression library. Total DNA represents DNA extracted directly before manipulation, Intact Cells represents DNA from extracted cells, and MDA DNA represents DNA after MDA treatment.
approach is likely to identify a different set of enzymes than the culture dependent approach. Even though the DNA extraction applied in this study is biased, functional diversity is not necessarily severely affected due to functional redundancy in a community, as was demonstrated by Delmont et al. [35].

Functional metagenomics
The MDA DNA was partially digested, inserted into the bacterial artificial chromosome (BAC) shuttle vector, mod.pGNS-BAC, and transformed into E. coli to yield a functional expression library of 2,843 clones. The average insert size was around 15 kb and 14% had no insert, giving a total size of the cloned metagenome of approximately 36 Mbp (data not shown). The library was screened for various enzymatic activities as presented in Table 3. Three α-amylase, two β-galactosidase and one phosphatase producing clones were identified. The α-amylase clones and one of the β-galactosidase clones showed activity at 15°C, but not at 37°C. This indicated that these enzymes were only active at low temperature, although it cannot be ruled out that the lack of activity was an effect of decreased enzyme production in E. coli at higher temperatures. One of the main advantages of functional expression is that once positive clones are identified, a suitable host for expression of the enzymes has already been established. Several factors however, have to work in concert for enzymatic activities to be picked up in a functional expression screening and there are many limiting factors including codon usage, promoter recognition, presence of chaperones and successful secretion [36]. Gabor et al. [37] calculated that an estimated 40% of genes from 32 selected different genomes had expression signals that would be recognized in E. coli. Firmicutes were best with approximately 70% and Actinobacteria worst with around 10%, suggesting that E. coli could be a suitable host for the ikaite library given the high fraction of Firmicutes, although the study did not consider other limiting factors. The library constructed had relatively few clones, but the hit-rate was comparable to those of similar studies: one αamylase was found when screening a cosmid-library of 35,000 clones [38], six esterases were identified in a screening of 60,000 fosmid clones [8], 11 cellulases were found when screening a small insert BAC-library of 10,000 clones [39], three β-galactosidases in a screening of 2,100 plasmid clones [40], one amylolytic enzyme in a screening of 30,000 plasmid clones [41], one cellulase in a plasmid library of 8,500 clones [42], and 38 amylases, 13 phosphatases but no proteases in a screening of 32,000 plasmid clones [43].

Metagenomic sequencing
In order to get an overview of the enzyme potential in the functional expression library, high throughput Illumina sequencing was carried out on a pool of all BAC clones (see Table 4 for sequence statistics). The final assembly consisted of 4,621 contigs with an average size of 2,215 bp giving a total metagenome size of 10.2 Mbp. This is considerably lower than the estimated size of 36 Mbp for the cloned metagenome (see above), suggesting either an unequal coverage in the sequencing or a significant redundancy among the BAC clone inserts. The latter is consistent with the results on the microbial diversity in the starting material, which was dominated by a few related phylogenetic groups ( Figure 1).
The predicted full-length and partial coding sequences in the resulting metagenome contigs were annotated by comparison to the Pfam protein family database, searched for relevant enzyme domains, and compared to the data obtained from the functional expression library (Table 5). A total of 60 domains with similarity to GH families known to contain α-amylase, αand β-galactosidase, cellulase and β-xylanase activities were identified in 47 unique coding sequences (Additional file 1: Table S1). Since there is considerable functional overlap between, and variety within, GH families, this is likely to be an overestimate of the abundance of these specific enzyme activities. Some GH domains could be assigned to more than one activity and the data given in Table 5 represents the number of non-redundant domain matches for each activity. Putative proteases were identified from protease domaincontaining Pfam families, as specified by the MEROPS peptidase database, resulting in 313 protease domains in 289 unique coding sequences (Additional file 1: Table S1). The majority of these are likely to be intracellular housekeeping proteases involved in normal cellular metabolism. A non-exhaustive search for phosphatases identified at least 68 phosphatase domains in 17 Pfam families (data not shown).
It is clear that the number of relevant enzymes identified in the metagenome is considerably higher than those obtained from the functional screening. Apart from the limiting factors discussed above, the AZCL-linked substrates used for detecting α-amylase, cellulase, β-glucanase, βxylanase and protease activity are extracellular and requires secretion of the active enzymes from E. coli for optimal detection. The subcellular localization of the identified enzyme-encoding sequences was predicted using PSORTb. Seven GH sequences were predicted to be extracellular in either gram-positive or gram-negative bacteria and six of these were from family GH13, which include the majority of α-amylases (Additional file 1: Table S1). The predicted proteases included 15 potentially extracellular enzymes and another 13 possibly localized to the No protease, cellulase, β-glucanase, β-xylanase, β-mannanase or lipase producing isolates were identified. *: The three α-amylase positive clones carried inserts covering the same gene. outer membrane/periplasm or cell wall, indicating that the absence of protease-producing clones in the functional library is not caused by a lack of protease-encoding sequences (Additional file 1: Table S1).

α-Amylases (IKA3C6, IKA16D10 and IKA28E6)
The three α-amylases and two β-galactosidases identified in the expression library were analyzed in more detail.
End-sequencing of the three E. coli clones with αamylase activity (IKA3C6, IKA28E6 and IKA16D10) and comparison to the obtained metagenomic sequence of the library, showed that the three inserts were overlapping and covering the same genomic region, indicating that they were most likely expressing the same gene. Of the three clones, IKA3C6 was the most active, indicating that the location of the gene in the insert in this clone was more favorable for expression (data not shown).
The α-amylase encoded by IKA3C6 was successfully expressed in an optimized E. coli expression system and the activity of crude extract was characterized with regard to temperature and pH ( Figure 2). The IKA3C6 αamylase showed a temperature optimum around 15°C and a pH optimum around pH 8-9, and retained more than 60% activity at 10°C. These profiles are somewhat similar to the well characterized α-amylase from the Antarctic bacterium Pseudoalteromonas haloplanctis, which has a temperature optimum at 30°C and pH 7 [44], as well as the α-amylase from an Arctic sea-ice isolate related to Brachybacterium faecium, with an optimum at 30°C and pH 7 [26]. The α-amylase activity of the E. coli clone was primarily intracellular, which together with the pH profile showing optimal activity at near neutral pH could indicate that the IKA3C6 αamylase is an intracellular enzyme in the natural host,  although it could also be an effect of the heterologous expression in E. coli. The α-amylase from the IKA3C6 clone, Amy I3C6 , is currently the focus of further studies.

β-galactosidases (IKA3H5 and IKA17E2)
Sequence information was obtained for the two βgalactosidases identified in the expression library (IKA3H5 and IKA17E2). The two β-galactosidase protein sequences from IKA3H5 (BGal I3H5 ) and IKA17E2 (BGal I17E2 ) were compared to the LacZ sequence from mesophilic E. coli as well as to the cold-active β-galactosidase gene from A. ikkensis previously isolated from the ikaite columns, which retains more than 60% activity at 0°C [45]. Both sequences were shorter than LacZ and the A. ikkensis enzyme, especially BGal-I17E2 with only 455 amino acids ( Table 6). BGal I3H5 belonged to glycosyl hydrolase (GH) family 2 like LacZ and the β-galactosidase from A. ikkensis, whereas BGal I17E2 represented GH family 1. Both BGal I17E2 and BGal I3H5 displayed typical adaptations to low temperature compared to LacZ (lower arginine, proline and arginine/ (arginine + lysine ratio)) [46], and for BGal I17E2 this was even more pronounced than the A. ikkensis βgalactosidase, indicating that the enzymes are indeed cold-adapted. The active site glutamic acid (E) was conserved in all four enzymes as was the adjacent methionine (M) ( Table 6). The neighboring tyrosine (Y) was conserved in all except A. ikkensis. The closest homologues of BGal I17E2 and BGal I3H5 were β-galactosidases from Clostridium hathewayi (53% identity) and Roseburia hominis (51% identity), respectively. These are both obligate anaerobic Firmicutes belonging to the class Clostridia [47] indicating that the natural hosts of the enzymes are associated with Clostridia. This is in agreement with Clostridia being the dominating class in the DNA used for the library (Figure 1), highlighting the complementary nature of the culture dependent and independent approaches used in this study. The two enzymes were successfully expressed in an optimized E. coli expression system and the activities of crude extracts were characterized with regards to temperature and pH (Figure 3). Both β-galactosidases showed similar profiles with a temperature optimum around 37°C and a pH optimum around pH 6. Both enzymes retained 20-30% activity at 10°C. These profiles are similar to a β-galactosidase from the Antarctic soil bacterium Paracoccus sp. 32d [49], a β-galactosidase from an Arctic sea-ice relative of Psychromonas antarctica [26], as well as a β-galactosidase obtained from a metagenome from the Baltic Sea [50].
One possible application for β-galactosidases is the hydrolysis of lactose to galactose and glucose in order to generate lactose free milk for lactose intolerant people. An optimal enzyme for this purpose would be active at pH 6.7-6.8 and at 4-8°C [51], which correlates with the pH profiles of the two identified β-galactosidases. Several cold-active β-galactosidases with the ability to hydrolyze lactose have been reported [45,49,50,[52][53][54] and interestingly, recombinantly produced β-galactosidase from the Antarctic bacterium Pseudoalteromonas haloplanctis was able to outperform a commercial yeast enzyme at 4°C, where it retained around 20% of its activity even though the pH optimum was 8.5 [54]. The two enzymes from this study also show approximately 20% activity at 5°C and their pH optimum is around the pH of milk, suggesting that they could be good candidates for enzymatic hydrolysis of lactose. Therefore, their ability to hydrolyze lactose at 37°C and 5°C was analyzed (Figure 4). BGal-I3H5 was unable to hydrolyze lactose, whereas BGal-I17E2 showed activity at both 37°C and 5°C. Further studies on e.g. inhibitory effects of glucose and galactose on the hydrolytic activity, performance in milk and potential transglycosylating activity are needed in order to determine the potential use of BGal I17E2 for generation of lactose free milk.

Conclusions
Using a combination of culture dependent and independent approaches we have demonstrated that the ikaite columns are a rich source of cold and/or alkaline-active enzymes. The two approaches complemented each other by targeting different phylogenetic groups of bacteria. The strength of the cultured strain collection was the high hit rate, although there was a significant degree of phylogenetic redundancy in the collection and most of the native bacterial isolates were closely related to previously characterized strains. Functional expression had a very low hit-rate, but the identified sequences were immediately available in a relevant production organism. The identified α-amylase and β-galactosidases showed characteristics of cold-adapted enzymes, and BGal-I17E2 was able to hydrolyze lactose at low temperature. Metagenomic sequencing of the library revealed a much higher degree of novelty among the phylogenetic groups covered, but also that most of the potential activities were not expressed.
Interesting candidate genes could therefore be chosen for heterologous expression in future studies. A useful combination of culture dependent and independent approaches is to use genomic DNA from natural isolates for functional expression, in order to pick up easily produced enzymes in a relevant host. Such an approach has been applied to identify α-amylases from Paenibacillus sp. [55] and Halothermothrix orenii [56] and could be similarly applied to isolates from the ikaite strain collection. This study shows that the combination of approaches for bioprospecting can be highly beneficial in the discovery of novel enzyme activities, and in addition to the immediate  relevance of cold-and alkaline-active enzymes in industrial applications, it also underlines the significance of the ikaite columns as a unique biological resource for bioprospecting.

Bacterial strains and vectors
The bacterial strains and vectors used in this study are presented in Table 7. Media compositions are reported for the specific experiments.

Establishment and screening of a strain collection
Material from ikaite columns collected from 2001 to 2011 and stored at −18°C was spread on R2 plates buffered to pH 10.4 [16] supplemented with NaCl (10 g/L) and incubated at 10°C. Single colonies were picked, streaked to purity, grown in liquid R2 in 96 well format, supplemented with glycerol (15%) and stored at −80°C.

Phylogenetic analysis of the strain collection
Template for 16S rRNA gene PCR was obtained by transferring colony material to 50 μl demineralized water, boiling for 10 min followed by cooling and centrifugation and using 2 μl of the supernatant as template. The PCR reaction ( The PCR program consisted of an initial denaturation at 98°C for 3 min, followed by 30 cycles of 98°C for 15 s, 55°C for 45 s, 72°C for 60 s, and a final elongation at 72°C for 10 min. Sequencing was performed using the primer BAC338F (ACTCCTACGGGAGGCAG) sequencing the V3 and V4 regions of the 16S rRNA gene. Sequences were end-trimmed with a phred quality score limit of 20 and manually inspected to remove low quality sequences. Phylogenetic affiliation was determined by Blast analysis against the GenBank 16S Microbial database in CLC Main Workbench (www.clcbio.com) with default settings.

Extraction and preparation of DNA and establishment of functional expression library
Material from the interior of 10 different ikaite columns was collected on site directly after harvest of the columns in August 2011. The material was homogenized and kept at 5°C. To determine the total diversity, DNA was extracted from the pooled material with the PowerLyzer  [57]: In brief, a slurry was made by mixing 130 g ikaite material with 450 ml 0.9% NaCl in a Warring blender at low speed. The slurry was centrifuged at 3,000 × g for 5 min and pellets were resuspended in 100 ml 0.9% NaCl and harvested at 3,000 × g for 5 min; this was done twice. Pellets were resuspended in 100 ml 0.9% NaCl with 0.1% NaN 3 and 17 ml methanol and 17 ml detergent mix (100 mM EDTA-Na 2 , 100 mM Na 4 O 7 P 2 , 1% (v/v) Tween 80) were added and cells were detached by vortexing at 1400 rpm for 60 min. Cells were separated from particles by centrifugation at 500 × g for 2 min and collected from the supernatant by centrifugation at 10,000 × g for 10 min. All steps were performed at 4°C. DNA from extracted cells was obtained by resuspending pellets in 1.5 ml STET-buffer (8% (w/v) sucrose, 5% (v/v) Triton X-100, 50 mM EDTA, 50 mM Tris-HCl) with lysozyme (2 mg/ml) and incubating at 37°C for 30 min before adding SDS (2% final concentration) and continuing incubation for 30 min at 37°C followed by 30 min at 65°C. Finally, DNA was extracted using traditional phenol/chloroform extraction [58]. High molecular weight (HMW) DNA (>8 kb) was gel-purified using a QIAquick Gel Extraction Kit (Qiagen) without dyes. Purified DNA was used as template for MDA with Repli-g Mini Kit (Qiagen) following the standard protocol. A modified version of the pGNS-BAC vector [59], mod. pGNS-BAC, was produced by introducing a new multiple cloning site with four unique restriction sites (ApaLI, BspDI, BbvCI and NsiI) into the HindIII site of the original vector to reduce the self-ligation rate. The vector was digested with ApaLI and NsiI (New England Biolabs, Ipswich, MA, USA), gel-purified with a QIAquick Gel Extraction Kit and phosphatase treated with Shrimp Alkaline Phosphatase (New England Biolabs, Ipswich, MA, USA). MDA DNA was partially digested with ApaLI and NsiI and HMW DNA (>8 kb) from the digestion was gelpurified using GELase (Epicentre, Chicago, IL, USA). The purified DNA was ligated into the mod.pGNS-BAC vector using T4 DNA ligase (New England Biolabs, Ipswich, MA, USA), transformed into MegaX DH10B T1 R electrocompentent E. coli cells (Life Technologies), and spread onto LB library plates supplemented with 12.5 μg/ml chloramphenicol. Colonies were picked into 96 well format in LB with 10% glycerol and 12.5 μg/ml chloramphenicol using a QPix colony picker (Genetix -Molecular devices, Workingham, UK), and grown over night at 37°C with shaking before being stored at −80°C. Randomly picked clones (29) were analyzed for insert size by purification of the BAC-vector and digestion with ApaLI and NsiI, followed by gel electrophoresis.

Pyrosequencing analysis of bacterial diversity
A fragment covering the V3 and V4 hypervariable regions of the 16S rRNA gene from bacteria and archaea was amplified from DNA extracted after sampling, cell extraction, and MDA using the primers 341 F (CCTAYGGGRBGCASCAG) and 806R (GGACTACN NGGGTATCTAAT). Amplification, pyrosequencing and phylogenetic analysis was performed as previously described [60]. Briefly, pyrosequencing was performed at The National High-Throughput Sequencing Centre at University of Copenhagen on a Genome Sequencer FLX pyrosequencing system (454 Life Sciences, Roche, Branford, CT, USA). Trimming and quality-filtering of the resulting sequences was performed using Biopieces (www.biopieces.org) using a minimum average phred quality score of 25. Sequences shorter than 250 bases and sequences containing more than one ambiguous nucleotide were discarded. Phylogenetic analysis was performed using the QIIME pipeline (www.qiime.org) [61]. Operational taxonomic units (OTUs) were clustered at 97% identity using the USEARCH [62] quality filter pipeline in QIIME, which included reference-based detection of chimeric sequences and removal of clusters containing only one sequence (singletons). The taxonomy of the resulting cleaned set of OTUs was determined using the RDP classifier at a confidence threshold of 50% and the Greengenes taxonomy database version 13_05 (http://greengenes.lbl.gov/) [63].

Screening of functional expression library
The functional expression library was screened on LB plates supplemented with 12.5 μg/ml chloramphenicol, 0.01% (w/v) arabinose and the appropriate substrates as described above for the strain collection. In addition, lipolytic activity was screened on plates containing 1% tributyrin. Strains were transferred to plates using a hand held 96 well pin replicator, and the clones were grown over night at 37°C and then transferred to 20°C for two days followed by transfer to 15°C. Enzyme activities were scored continuously.

Characterization of α-amylase
Open reading frames (ORFs) in the three BAC library clones encoding α-amylase activity (IKA3C6, IKA16D10 and IKA28E6) were identified by BlastX analysis on a combination of BAC and metagenome sequences. The open reading frame (ORF) encoding α-amylase in IKA3C6 (GenBank: KJ790257) was cloned into the expression vector pET21b with a C-terminal 6x His-tag and transformed into E. coli Tuner cells (Merck Millipore, Darmstadt, Germany). Positive clones were identified on LB plates supplemented with 100 μg/ml ampicillin, 1 mM IPTG and AZCL-amylose (0.05% w/v). The α-amylase enzyme was produced from 50 ml liquid culture grown at 37°C at 150 rpm. Expression was induced at an OD 600 of 0.8 by addition of 1 mM IPTG and incubation was continued at 20°C for 16 h, before harvesting at 10,000 × g for 10 min. Cell pellets were resuspended in 2 ml 100 mM phosphate-buffer pH 7.6 and intracellular proteins were extracted by bead beating in a FastPrep homogenizer (Thermo Scientific) with 3x 25 s at setting 5.5 with cooling on ice in between. The lysed cells were centrifuged at 10,000 × g for 5 min at 4°C and the supernatant was collected. Temperature and pH profiles were produced on the crude extract using an assay for reducing-end sugars [64] after incubation in 100 mM buffer (Tris-HCl buffer pH 8.6 for the temperature profile and pH 6-9, and glycine-NaOH buffer for pH 8-10) with 5 mg/ml amylopectin as substrate. Assays for the pH profile were performed at 20°C.

Characterization of β-galactosidases
ORFs in the two BAC library clones encoding βgalactosidase activity, IKA17E2 (GenBank: KJ790256) and IKA3H5 (GenBank: KJ790255) were identified by BlastX analysis on a combination of BAC and metagenome sequences. Genes were cloned into the expression vector pET21b with a C-terminal 6x His-tag and transformed into E. coli Tuner cells (Merck Millipore, Darmstadt, Germany). Positive clones were identified on LB plates supplemented with 100 μg/ml ampicillin, 1 mM IPTG and 10 mg/ml X-gal. The β-galactosidase enzymes were produced from 50 ml liquid cultures grown over night at 37°C at 150 rpm, and intracellular extracts were obtained as described above. Temperature and pH profiles were produced on the crude extracts using an ONPG assay [65] in 100 mM buffer (phosphate buffer pH 7 for the temperature profile, citrate-phosphate for pH 3-7, phosphate buffer for pH 6-8, and glycine-NaOH buffer for pH [8][9][10]. Assays for the pH profile were performed at 20°C. Lactose assays were performed by adding 5 μl crude enzyme extract to 1 ml lactose (5 mg/ml) and incubating at 5°C or 37°C for 1 h or over night before analysis on TLC aluminum sheets (Merck Millipore, Darmstadt, Germany) running in 1-butanol:2propanol:water (3:12:4).

Metagenomic sequencing and analysis
DNA for sequencing of the functional expression library was extracted from a pool of liquid cultures of all E. coli BAC clones using the BACMAX DNA Purification Kit (Epicentre). The metagenome sequence was obtained by 2 x 250 bp paired-end sequencing of a short-insert library on an Illumina MiSeq system at The National High-Throughput DNA Sequencing Centre at University of Copenhagen. The resulting sequences were cleaned using Biopieces (http://www.biopieces.org) by trimming of adaptors and poor-quality sequence from sequence ends and removal of sequences containing ambiguous nucleotides or with an average quality score of less than 30. Before assembly, all sequences showing at least 95% identity to the mod.pGNS-BAC vector or to the E. coli K12 genome were discarded. Assembly was performed in CLC Assembly Cell (http://www.clcbio.com) (see Table 4 for sequence statistics). Initial analysis and gene-calling of the assembled contigs was carried out on the MG-RAST server (http://metagenomics.anl.gov/). The predicted fulllength and partial coding sequences were annotated by a batch search against the Pfam protein family database (http://pfam.sanger.ac.uk/) with an e-value cut-off of 1e-5. The resulting identified protein domains were searched for relevant enzyme targets using information from the Pfam database, the carbohydrate-active enzymes database (CAZy; www.cazy.org) and the MER-OPS peptidase database (http://merops.sanger.ac.uk/). Protein localization was predicted using PSORTb version 3.0 for both gram-positive and gram-negative bacteria (http://www.psort.org/psortb/).

Additional file
Additional file 1: Table S1. Predicted proteases and glycosyl hydrolases from sequencing of the functional expression library. Predicted Pfam domains from the metagenomic sequence of the functional expression library with relevant glycosyl hydrolase (GH) or protease function.