Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts
© Forde et al; licensee BioMed Central Ltd. 2011
Published: 30 August 2011
Skip to main content
© Forde et al; licensee BioMed Central Ltd. 2011
Published: 30 August 2011
The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade.
The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group.
The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host interactions of these motile intestinal lactobacilli.
The lactic acid bacteria (LAB) are low G+C, Gram-positive bacteria that produce lactic acid through the fermentation of hexose sugars . The LAB are not a monophyletic group, but rather a pragmatic phenotypic division encompassing 13 genera. The largest of these is the genus Lactobacillus, with over 171 currently recognized species . The lactobacilli are considered a subdominant element in the human gastrointestinal tract (GIT) and have been extensively studied for both their industrial application and health benefits . The genus Lactobacillus is highly diverse . On the basis of phylogenetic markers such as the 16S rRNA  or the groEL gene , clades or clusters of species have been defined within the genus Lactobacillus. In the most recent comprehensive description of this genus, twelve Lactobacillus and two Pediococcus clades were proposed . The process of assigning species to clades within a larger genus is not novel, and cladistics has formed an integral part of many Lactobacillus phylogenetic analyses [4, 5, 7–10]. As more species are identified, a clearer resolution of the clades emerges. For example, the L. plantarum group originally included twelve species , but has since undergone significant reclassification, and now contains only three species, namely L. plantarum, L. paraplantarum and L. pentosus. Furthermore, the L. buchneri group that was a major clade in early Lactobacillus phylogenies  has since been revised, and robust divisions within the group are evident 
The L. acidophilus group , formerly known as the L. delbrueckii group , is one of the largest Lactobacillus clades. It harbours the “L. acidophilus complex”, a cluster of several species including L. acidophilus, L. amylovorus, L. crispatus, L. gallinarum, L. gasseri, L. helveticus and L. johnsonii[12–14] that were mistakenly identified as L. acidophilus strains upon their original isolation [13, 15]. Members of this clade have been isolated from humans and environmental sources, and represent some of the best characterised lactobacilli. Similarly, the L. salivarius and L. reuteri clades were named after the best characterised of their member species and may be considered as major phylogenetic units within the genus Lactobacillus. The L. reuteri clade includes member species that were isolated either from humans (L. antri; L. coleohominis; L. gastricus; L. oris; L. vaginalis), animals (L. reuteri) or birds (L. ingluviei) or from foods such as rye-bran fermentations (L. frumenti) and sourdough (L. panis; L. pontis and L. secaliphilus) . Likewise, the species comprising the L. salivarius clade have been isolated from vertebrate intestine/faeces, soil, water and plants or food . This clade includes L. ruminis which is phylogenetically close to L. salivarius and which shares the same ecological niche [17–19].
Application of genomic technologies has been very beneficial for understanding the biology of commensal lactobacilli . The full genomes of 14 Lactobacillus species have been sequenced and published [18, 21–31] and 140 Lactobacillus sequencing projects are on-going . There is a bias towards the analysis of species that are phylogenetically close to L. acidophilus: of the 14 Lactobacillus genomes currently available, 6 are from the L. acidophilus complex. Until recently, only one genome from a member of the L. salivarius clade had been fully sequenced . Additionally, while the development of next generation sequencing technologies has led to a near exponential increase in the number of sequenced bacterial genomes, the majority of these genomes remain at low quality level, have been assembled and scaffolded without human intervention, contain numerous sequence gaps and are poorly annotated. As a consequence these draft genome sequences are often unsuitable for whole genome comparative analysis, particularly where the emphasis is on synteny, operon structure, or plasmid configuration.
Lactobacillus ruminis was first isolated from the faeces of humans in 1960  and subsequently from the bovine rumen . L. ruminis has been identified as one of 17 species of lactobacilli which are routinely isolated from the faeces of humans , cattle  and pigs  and is considered to be a member of the autochthonous microbiota in the gastrointestinal tract (GIT) [18, 19]. L. ruminis is unusual among the lactobacilli as it is one of only 14 members of this genus to be characterised as being motile . As well as being motile, L. ruminis is of interest because the immunomodulatory characteristics of this species, specifically its ability to stimulate tumour necrosis factor (TNF) and nuclear-factor κB (NF- κB) production in monocytes , has identified L. ruminis as a candidate probiotic. In this study, we determined the genome sequence of Lactobacillus ruminis ATCC 27782 (a motile strain isolated from cows), representing the first genome sequence of a motile Lactobacillus and the second completely finished  genome from a member of the L. salivarius clade.
Comparison of the major genomic features of L. ruminis ATCC 27782, L. ruminis ATCC 25644, and L. salivarius UCC118. Figures for ATCC 25644 are estimates based on the draft assembly and automated annotation, and pseudogenes were not predicted due to low quality regions and sequence gaps. Numbers in parentheses for L. salivarius UCC118 refer to contributions from the megaplasmid pMP118.
L. ruminisATCC 27782
L. ruminisATCC 25644
L. salivariusUCC 118
G+C Content (%)
Coding density (%)
In addition to the 1901 protein-coding regions, the genome of L. ruminis contains 85 predicted pseudogenes (4.3% of all coding sequences; Figure 1), characterized by the presence of in-sequence frame-shifts, deletions, stop codons, or interruption by insertion sequences (IS). A large proportion (29.4%), of the pseudogenes themselves were identified as being IS element related. Inactivation of IS elements in this manner is a common feature of bacterial genomes, and is considered a mechanism for transposition regulation . The remaining 60 pseudogenes are catalogued in Additional File 1: Table 1. IS elements are a common feature of bacterial genomes. We identified eighty-three transposases (4.2% of coding sequences) representing 9 families of IS elements in the genome of L. ruminis ATCC 27782, with 25 characterized as pseudogenes (Additional File 2: Table 2). Seven of the nine families are present in multiple copies, with IS256, IS66, IS3, IS200/IS605 having the largest numbers of replicates, 10, 16, 19, and 25 copies respectively.
Six rRNA operons, consisting of 16S, 23S and 5S rRNA genes, were identified distributed throughout the genome. All rRNA operons were orientated in the same direction as DNA replication. Sixty seven tRNA genes, representing all 20 amino acids, were identified in the genome. Only 26 of the 67 tRNAs were located on the lagging strand, with the majority clustered at, or close to, the first of the two rRNA operons on this strand. The remaining 41 were distributed throughout the leading strand with the majority clustered around the four rRNA operons. Redundant tRNA genes were present for 18 of the 20 tRNA species, with the exceptions being those for cysteine and tryptophan.
L. ruminis is one of 12 species in the L. salivarius clade which have been identified as being motile (only 14 species of the genus Lactobacillus are known to be motile). Annotation of the L. ruminis ATCC 27782 genome identified all the motility and motility-associated proteins required to produce a fully functional flagellar apparatus. The genomics of L. ruminis motility and flagellar assembly are described in detail elsewhere . To summarize, the motility-encoding regions of the ATCC25644 and ATCC27782 genomes span 45,687 bp and 48,062 bp respectively, constituting a single contiguous gene block. L. ruminis motility is conferred by a total of forty-five predicted proteins involved in flagellum regulation, synthesis, export and chemotaxis, and which conform to the expectations for flagellum production in Gram positive bacteria . The motility locus of ATCC 27782 is larger because it includes a second copy of the gene for flagellin, fliC, and a glycosyltransferase pseudogene, the relevance of which for motility is unclear. The closest homolog of most of the L. ruminis motility genes was in Enterococcus casseliflavus or Enterococcus gallinarum, which is consistent with phylogenetic relatedness of the enterococci to the lactobacilli , and distribution of the motility phenotype in the phylum Firmicutes.
The in silico analysis of the L. ruminis genome suggests that it is unable to synthesize the vitamins and cofactors riboflavin, vitamin B6, folate, nicotinamide and nicotinate. Partial pathways for both purine and pyrimidine biosynthesis were annotated (Additional File 3: Figure 1 and Additional File 4: Figure 2, respectively). However, while L. ruminis appears to lack the ability to synthesise adenosine and guanosine, it is predicted to synthesize the nucleotides adenine and guanine from adenosine monophospate (AMP) and guanine monophosphate (GMP) respectively.
In contrast to other Lactobacillus species such as L. helveticus and L. sakei, which convert pyruvate to acetyl-CoA through the intermediate acetyl phosphate, L. ruminis cannot produce acetyl-CoA in this manner. Instead L. ruminis appears to produce Acetyl-CoA through the action of the enzyme pyruvate formate-lyase (Additional File 5: Figure 3). Pyruvate formate-lyase catalyses the non-oxidative cleavage of pyruvate to acetyl-CoA and formate. An anaerobically induced pyruvate formate-lyase system has been fully characterised in E. coli.
Through de-novo synthesis and inter-conversions, L. ruminis can synthesize 8 of the 20 amino acids. Present in the genome is a gene predicted to encode the enzyme L-serine dehydratase (EC. 184.108.40.206) which catalyses the conversion of pyruvate into serine. Serine in turn can be converted by tryptophan synthase into tryptophan (Additional File 6: Figure 4). Tryptophan can also be synthesised de novo through the Shikimate pathway. L. ruminis is also predicted to be capable of de novo synthesis of histidine. While the L. ruminis ATCC 27782 genome apparently encodes complete pathways for the production of threonine and aspartate, it lacks the enzymes threonine aldolase (EC: 220.127.116.11) and glycine hydromethyltransferase (EC: 18.104.22.168). Consequently this strain cannot synthesis glycine. L. ruminis is also predicted to lack the ability to synthesize glutamate. However, if extracellular glutamate is imported (two glutamate ABC transport systems are present in the genome of L. ruminis, LRC_13790-13800 and LRC_18670-18680), L. ruminis could subsequently synthesize glutamine, arginine and proline. In summary, L. ruminis is potentially capable of synthesizing 8 amino acids and being auxotrophic for 12. This level of auxotrophy is greater than that exhibited by its nearest sequenced neighbour Lactobacillus salivarius UCC118  which is auxotrophic for only 8 amino acids. This highlights the dependence this autochthonous bacterium has on extracellular sources of amino acids that are likely to be present in the intestinal milieu. However, L. ruminis is considerably less auxotrophic than more distantly related Lactobacillus species such as L. acidophilus NCFM (auxotrophic for 14 amino acids) and L. sakei (auxotrophic for 18 amino acids).
Apart from carbohydrate metabolism (see below), preliminary analysis of the genome of L. ruminis ATCC 25644 revealed a near identical predicted metabolic profile to that described for L. ruminis ATCC 27782. However, some subtle differences were noted; for example ATCC 25644 appears to lack the enzyme asparatate aminotransferase (EC:22.214.171.124) but possesses the enzymes 3-isopropylmalate dehydrogenase (EC:126.96.36.199), succinyl-diaminopimelate desuccinylase (EC:188.8.131.52) and aryl-alcohol dehydrogenase (EC:184.108.40.206). The two L. ruminis strains are predicted to be auxotrophic for the same 12 amino acids and to have identical pyruvate metabolism systems. Similar to ATCC 27782 and most other lactobacilli, L. ruminis ATCC 25644 cannot synthesize the majority of vitamins and co-factors.
The ability of intestinal bacteria to utilize carbohydrates is an important factor for determining competitiveness and diet interaction in the host intestine, and we describe this topic in detail elsewhere in this volume . Sixteen carbohydrate utilization pathways were predicted in genomes of ATCC 27782 and ATCC 25644, including those for utilization of glucose, fructose, mannose, galactose, starch and sucrose . The ATCC 25644 encodes six putative operons for the transport and utilisation of the prebiotics fructo-oligosaccharides (FOS), galacto-oligosaccharides (GOS), soya-bean oligosaccharides (SOS), and 1,3:1,4-β-D-Gluco-oligosaccharides . Only three of these operons were identified in the ATCC 27782 genome, which were putatively linked to the utilisation of SOS and 1,3:1,4-β-D-Gluco-oligosaccharides. Lack of an operon for FOS utilization in the bovine isolate ATCC 27782 is consistent with the inability of this strain to use FOS as a sole carbon source. A predicted cellobiose utilization operon in the L. ruminis 25644 genome is likely to be responsible for the transport and hydrolysis of both cellobiose and 1,3:1,4-β-D-Glucan hydrolysates .
Bacteriocins are small antimicrobial peptides produced by many lactic acid bacteria, that may exhibit either a narrow spectrum (affecting only closely related species) or broad spectrum (affecting species in different genera) . The genome of L. ruminis ATCC 27782 includes a 6.1 kb region encoding seven bacteriocin-related and two hypothetical genes (Additional File 7: Figure 5). In silico analysis identified the bacteriocin (59 aa protein; LRC_02417) as a Class II pediocin-like bacteriocin . The bacteriocin shows significant residue identity to Class II bacteriocins from Bacillus coagulans, Pedicococcus acidilacti, L. plantarum, and other LAB (Additional File 8: Figure 6), and possesses a conserved N terminal pediocin box region and the YGNGVXCXXXXCXV motif . In addition to the bacteriocin structural gene, the locus also encodes two putative bacteriocin immunity proteins (LRC_17030 and LRC_17110), a sensor histidine kinase and response regulator (LRC_17060-17070) and transport apparatus comprising an accessory protein and ATP-binding cassette (ABC) transporter (LRC_17040 and LRC_17080). A preliminary analysis has so far failed to show bacteriocin activity associated with L. ruminis strain ATCC 27782, and it is not yet known if this locus is active. Analysis of the genome of ATCC 25644 also identified a region containing genes associated with bacteriocin production. However, the fragmented assembly means that it is presently unknown if the genetic complement of this locus is complete. Sequences associated with bacteriocin production were distributed across three contigs, with the genes for two sensor histidine kinases and a response regulator being truncated by sequencing gaps. Although a gene for a potential bacteriocin immunity protein (similar to PedB from Lactobacillus gasseri) was identified, no genes encoding bacteriocin peptides or transport apparatus were identified.
CRISPR loci (clustered regularly interspaced short palindromic repeats) are a family of DNA repeats that function like an adaptive immune response system, and are found in only 40% of bacteria. This system provides acquired immunity to exogenous DNA from viruses and plasmids , and thus represent a barrier to attack or genetic transformation. Two CRISPR/CRISPR-associated sequence (cas) systems were identified in the genome of L. ruminis ATCC 27782. The systems, CRISPR1 and CRISPR2, are located 12.9kb apart and consist of 8 and 7 cas genes respectively. CRISPR1 consists of 8 cas genes and is preceded by a 1059 bp CRISPR region composed of a 36bp direct repeat and 14 spacers. The CRISPR region is separated from the cas genes by a small hypothetical protein and a transposase fragment. CRIPSR2 consists of 7 cas genes and is proceeded by a much longer CRISPR region composed of a 30 bp direct repeat and 36 spacers. Analysis of both CRISPR regions revealed no significant hits to any known plasmid or phage sequences, emphasizing the phylogenetic distance of the L. ruminis genetic milieu from previously well characterized systems.
We identified one CRISPR system in the draft genome of L. ruminis ATCC 25644. CRISPR1 consists of 4 cas genes proceeded by a CRISPR region containing a 36 bp direct repeat (DR) and 16 spacers. The region is disrupted by a sequencing gap of 887 bp (inferred from mate-pair information) dividing the region into direct repeats with 11 and 5 spacers respectively. Given that each DR and spacer is 65 bp, the sequencing gap could contain another 13 spacers. The presence of a CRISPR system in a second L. ruminis genome confirms the importance of resistance to exogenous DNA in this species.
Intestinal commensal bacteria must also be able to endure a range of physiological stresses. Indeed, the ability of bacteria to respond to stresses such as those encountered during gastric and intestinal transit is key to their survival. The L. ruminis ATCC 27782 genome encodes a number of stress resistance proteins including those predicted to confer resistance to heat, cold, alkaline and phage shock proteins (Additional File 9: Table 3). The genome also includes the conserved SOS regulon genes. Specifically, L. ruminis ATCC 27782 encodes four heat shock proteins, the cold shock proteins CspA and CspE, a single alkaline shock protein, and there are two copies of pspC whose product is predicted to be involved in phage shock/resistance. The genome of L. ruminis ATCC 27782 also harbours genes for a number of Clp proteases, (clpB, clpX, and clpP), which are involved in the degradation of mis-folded proteins 
ATCC 27782 is moderately oxygen tolerant, though less so than other members of the L. salivarius clade . Consequently, the ability of this bacterium to respond to and eliminate reactive oxygen species is extremely important. The L. ruminis genome encodes a number of thioredoxins, a class of protein which act as antioxidants through the reduction of other proteins by cysteine thiol-disulfide exchange .
The Lactobacillus cell surface has an important role in governing interaction with host animals, at the level of initial colonization, long-term persistence, and potentially also modulatory roles on both the innate and adaptive immune responses, and the rest of the microbiota by surface exclusion . Sortase enzymes function as an important mechanism which anchors surface proteins, and they are found in all Gram-positive bacteria where they act as both proteases and transpeptidases . The Sortase type A enzymes (SrtA) function by anchoring proteins containing the characteristic substrate LPxTG motif to the peptidoglycan of the cell wall. Genes for two sortase-like proteins were annotated in the L. ruminis genome (SrtA, LRC_16570 and SrtC, LRC_00630), as well as 10 predicted sortase-anchored proteins (Additional File 10: Table 4), that were identified by searching for LPxTG motifs. The presence of multiple sortase-like proteins in the genome is not unusual in Gram-positive bacteria , and the NCBI protein databases currently contain 173 SrtA sequences from eight Lactobacillus species, plus an additional 48 SrtC sequences. The sortase-like protein encoded by LRC_00630 contains a SrtC Conserved Domain. It shows 42% BLAST identity to SrtC of L. rhamnosus LGG. The LRC_00630 gene is preceded by three genes predicted to encode sortase dependant proteins (LRC_00600, LRC_00610 and LRC_00620). This genetic arrangement suggest that both the genes for the sortase enzyme and its substrates may have been acquired as a unit by horizontal gene transfer, and their arrangement also suggests they may be co-transcribed or co-regulated. Both SrtA and SrtC recognize similar motifs, but the conservation of amino acids in these motifs differs i.e. LPxTGc for SrtA and lPxTGG for SrtC, where uppercase letters are absolutely conserved . On this basis alone, the target proteins for the SrtA and SrtC enzymes of L. ruminis ATCC 27782 cannot be distinguished, and will require experimental investigation.
LRC_00600 (annotated as Sortase-anchored surface protein) is a predicted 1,140 residue protein with homology to hypothetical proteins or presumptive (but unproven) collagen adhesins. LRC_00610 (annotated as Sortase-anchored surface protein) shows 28% BLAST identity to SpaE, a minor backbone protein of the adhesive pili produced by L. rhamnosus LGG . However, it also displays higher levels of residue identity to many putative/hypothetical sortase-dependant proteins from LAB or Firmicutes. LRC_00620 (505 amino acid residues) shows significant residue identity to homologues primarily in the Enterococcus spp,. including pilin subunits from E. faecalis and E. faecium. It is therefore possible that this locus encodes a sortase-dependent pilus organelle. Genetic evidence for possible production of such structures has been noted in L. johnsonii and other lactobacilli , but their visualization and characterization has only been described for L. rhamnosus LGG (as noted above). When transcription of the LRC_00600-00630 locus in ATCC 27782 and ATCC 25644 was examined by microarray analysis, we observed that these genes were significantly up-regulated in the human isolate ATCC 25644 compared to the bovine isolate ATCC 27782, by factors of 15.2, 14.3, 7.1 and 23.8 respectively. While highly suggestive of a surface role in this strain, these presumptive pili are not visible under the conditions routinely used for negative staining (see below), and direct experimental verification by another method is now required.
There is no clustering of genes for sortase dependant proteins around the gene for the second sortase-like enzyme (LRC_16570) which we annotated as SrtA. The genes for the remaining sortase-dependant proteins are distributed throughout the genome, with another three-gene cluster in (LRC_16760, LRC_16780, LRC_16790) in the latter half of the genome. The biological function of these proteins is not known (Additional File 10), and their characterization will require a functional genomics approach as deployed for the closely related L salivarius, and L. acidophilus.
In addition to sortase anchored proteins the L. ruminis ATCC 27782 genome also encodes a predicted fibronectin binding protein (LRC_09530) and a number of proteins expected to be involved in the export and synthesis of teichoic acids (LRC_01020, LRC_01380, LRC_03490, LRC_17520, LRC_06890, LRC_06900). Additionally, the ATCC27782 genome includes the dlt operon (dltA to dltD; LRC_17120 to LRC_17150) involved in the esterification of lipoteichoic acid (LTA) by D-alanine, which suggests the presence of lipoteichoic acids in the L. ruminis cell wall.
Since this study provided the first complete genome sequence information for a member of the L. salivarius clade other than L. salivarius itself, we initially compared the L. ruminis ATCC 27782 genome to that of L. salivarius UCC118. L. ruminis is robustly positioned in the L. salivarius clade by independent analyses [5, 42]. At summary statistic level (Table 1), the genomes of L. ruminis and L. salivarius are very similar, reflecting the close phylogenetic relationship of these two species. However, one major difference is the abundance of extra-chromosomal elements in L. salivarius. While L. ruminis has a single circular genome of 2.06 Mb, the L. salivarius UCC118 genome comprises a 1.8 Mb chromosome and possesses 3 plasmids, one of which is 242kb in size . Multiple plasmids including megaplasmids are present in all L. salivarius strains tested to date . Notwithstanding this difference in architecture, the genomes of L. ruminis and L. salivarius share a similar number of coding sequences, rRNA operons and tRNA genes (Table 1). Notably, the L. ruminis ATCC 27782 genome harbours a larger number of pseudogenes (85 compared to 69) and more IS elements (83 compared to 43). The greater number of pseudogenes and smaller genome size may indicate that the L. ruminis genome is at a more advanced stage of decay than L. salivarius, relative to their last common ancestor which was presumably free-living and had a larger genome.
To further examine this phenomenon, we investigated core proteins which we determined using METAPHORE  (see Methods), first within the L. salivarius clade (L. salivarius and L. ruminis genomes). A protein was considered an ortholog if it shared 30% amino acid identity over 80% of the sequence length. Only 59% of the protein coding regions (ie excluding IS elements and pseudogenes) in the L. ruminis genome have an ortholog in the L. salivarius UCC 118 genome. Including the L. salivarius megaplasmid in the analysis, the genomes of L. ruminis and L. salivarius contained 309 and 358 genes, respectively, which were absent in the other genome at the cut-off value for orthology imposed for their proteins (Additional File 13: Table 6 for L. ruminis–specific proteins, and Additional File 14: Table 7 for L. salivarius-specific proteins). However, a large proportion of these unique proteins in each genome corresponded to hypothetical genes (97 in L. ruminis and 115 in L. salivarius). A further 58 unique L. salivarius proteins were associated with prophages compared to only 11 in the L. ruminis genome. The L. ruminis SrtC homolog (LRC_00630) and two of its sortase dependant proteins (LRC_00600, LRC_00610) are absent from the L. salivarius genome, as are 9 of the CRISPR associated proteins. The presence of only 1 small CRISPR region in the genome of L. salivarius may account for the greater abundance of phage associated genes within its genome. The L. ruminis-specific proteins include those for motility , ability to utilize certain carbohydrates such as cellobiose , and a large number of predicted membrane proteins of unknown function (Additional File 13: Table 6). The previously discussed pediocin-like bacteriocin was also identified by this analysis. The complement of L. salivarius-specific proteins is striking for how many of them are encoded by discrete tracts of the genome, even outside of phage-related sequences, exemplified by LSL_0330 to LSL_0365 and LSL_0410 to LSL_0476 (many predicted membrane proteins); LSL_0921 to LSL_0963 (a cluster of hypothetical proteins); and the two EPS clusters . Some of these regions are also evident from the ACT comparison (Figure 4), as discrete regions where homology is lacking between the genomes. This suggests that regions were differentially retained from the last common ancestor of the L. salivarius clade – or differentially acquired. The average GC% of unique genes for the genomes of L. ruminis ATCC 27782 and L. salivarius UCC118 was 42.7% and 31.9% respectively. However the GC% ranges were from 26.2% to 57.3% for L. ruminis and from 21.5% to 45% for L. salivarius, indicating that a number of genes unique to each genome may have been acquired by horizontal gene transfer.
Due to the lack of any other sequenced species from this subgroup, the 1,100 proteins conserved in both genomes were considered the core proteins of the L. salivarius clade. The majority of the core proteins have a defined function with only 166 hypothetical proteins (35% of the total number of hypothetical proteins) and 189 hypothetical proteins (32 % of the total number of hypothetical proteins) in L. ruminis and L. salivarius respectively. More comprehensive manual comparative analysis (data not shown) revealed that the core protein set of the L. salivarius clade was predominated by genes present in operon-like clusters, an organization which has previously been noted in another study of core genes in the Lactobacilli , suggesting conserved function, organization and control of such core genes. In addition to housekeeping genes and clusters of ribosomal and ATPase proteins, L. ruminis and L. salivarius share a clusters of genes involved in EPS production and purine metabolism. Five two-component regulatory systems were shared between both genomes and while their function is currently unknown, they may form the basis of environmental response systems shared by members of this clade.
Comparative analysis of orthologues shared between the L. salivarius clade and selected lactobacillus groups.
L. acidophilus, L. johnsonii;
L. reuteri, L. fermentum
L. brevis, L. buchneri
L. casei, L. sakei
Unique proteins in selected lactobacillus groups.
L. acidophilus, L. johnsonii;
L. reuteri, L. fermentum
L. brevis, L. buchneri
L. salivarius , L. ruminis
L. casei, L. sakei
The genome sequences of these two L. ruminis strains provide a platform for functional genomic analysis of this species, an overlooked autochthonous member of the intestinal microbiota of many animals including humans. Similar to other commensal lactobacilli, the in silico analysis of the L. ruminis genome suggested it may be undergoing genome decay. The comparative analysis of L. ruminis ATCC 27782 and L. salivarius UCC118 revealed a lack of genome synteny between these two members of the L. salivarius clade which reflects the high degree of diversity evident across the whole genus. Adaptations to a competitive environment in the intestine include a large locus devoted to EPS production by L. ruminis, a pediocin-like bacteriocin locus, and a putative sortase-dependent pilus locus that is expressed at higher levels in the strain isolated from humans.
The genomes of both L. ruminis ATCC 25644 and L. ruminis ATCC 27782 were sequenced by generating approximately 200,000 reads of average read length 125-150 nt, from a half plate on a 454 FLX instrument , using a 3 kb mate pair library, generating approximately 21-fold and 28-fold coverage (Agincourt Biosciences, Beverly, MA), respectively. In addition to the 454 data for the ATCC 27782 genome, an additional half lane of Illumina sequencing (22.5 Mb total sequence data) was obtained. The Illumina data consisted of a 3 kb mate-pair library and a 400 bp paired-end library (Fasteris, Geneva, Switzerland). Each Illumina library provided an average of 217-fold coverage. Initial de novo genome assembly of the 454 sequences was performed using the Roche/454 Life Sciences Newbler (Gs) assembler , producing an initial assembly of 72 contigs distributed over 8 scaffolds for the genome of ATCC 27782. The resulting 454 assembly was then used as a reference for the mapping assembly of the Illumina data. This mapping assembly was performed using Mira and undertaken to extend contigs, close gaps and for error correction of the draft genome.
A PCR-based strategy was adopted for gap closure. Contig-contig gaps were closed using primers designed at the end of contigs and amplified using Dreamtaq DNA polymerase (Fermentas, Ontario, Canada). Scaffolds were ordered and oriented by PCR. Primers were designed at the ends of the scaffolds and the inter-scaffold region was amplified using Extensor long PCR enzyme mix (Abgene, Epsom, UK). PCR products for both the sequencing gaps and the inter-scaffold gaps were sequenced by Eurofins MWG Operon (Ebersberg, Germany) and the sequences were intergrated into the assembly using PHRAP . Correct placement of the gap sequences was confirmed by observation using Tablet, a next generation sequencing graphical viewer .
Initial automated gene calling was performed using Glimmer 3  and Genemark . Intergenic regions were examined for missed gene calls using BlastXtract . tRNAs were identified using tRNA-scan  and ribosomal binding sites using RBSfinder . Preceding the manual annotation of the L. ruminis ATCC 27782 genome, the protein sequences of each gene product were searched against a variety of databases with the aim of assigning a functional annotation. All predicted proteins were searched (BLASTP) against the NCBI-non-redundant protein database (nr) and, through Interproscan , against the pFAM, TigrFAM, PIR, HAMAP, PROSITE, PRINTS, PRODOM, PANTHER, SUPERFAMILY, GENE3D databases. In addition, transmembrane domains were identified with TMHMM  and Signal peptides with SignalP . The automated annotation was then manually curated in Artemis .
Accession numbers: The finished genome of ATCC 27782 is available under accession number XXYYZZ123. The draft genome of ATCC 25644 is available under accession number CCGGHHIIUU.
Whole genome nucleotide alignments were generated using the Big Blast software (available from the Welcome Trust Sanger Institute  and alignments were visualized with the Artemis Comparison Tool (ACT) . Protein alignments were performed using the MUMmer package . Identification of orthologs, unique genes and core genes was performed using the custom in-house software METAPHORE . METAPHORE performs a bi-directional blastp comparison of two or more genomes and proteins are only considered orthologs if they share a minimium 30% amino acid identity over 80% of their sequence length. For an ortholog to be considered a core gene, it must be present in all possible pairwise genome combinations.
Microarray production, scanning and data analysis followed an established protocol . In summary, L. ruminis cells were grown anaerobically for 15 hrs in 20 ml de Man-Rogosa-Sharpe (MRS) broth aliquots until the OD600 was in the range of 0.5-0.8. The cells were harvested by centrifugation at room temperature and the pellets were immediately washed and resuspended in 500 µl RNAprotect Bacteria Reagent (Qiagen). Total RNA was extracted using an RNeasy mini kit (Qiagen), according to the manufacturer’s protocol for difficult to lyse cells with modifications including an extended incubation with proteinase K (40 mins). RNA was treated with DNase using the Turbo DNA-free kit (Ambion) according to the routine DNase treatment protocol. Then, 10 ug of total RNA was reverse transcribed with random nonomers (MWG-Biotech, Germany) and the ULS cDNA synthesis and labelling kit (Kreatech, Amsterdam, Netherlands). Labelling took place at 85°C for one hour.
Custom oligonucleotide microarrays that were designed to include the annotated open reading frames of the L. ruminis ATCC 25644 and ATCC 27782 genomes were commissioned and produced by Agilent Ltd. (Santa Clara, California). Four 44 K microarrays were present on each slide. Every 1000 nt of coding sequence was represented on the arrays by at least six features. Where the sequence of a given probe was identical for a gene common to ATCC 25644 and ATCC 27782, the probe was represented on the array six, rather than twelve times. A total of fourteen user defined control probes were represented ten times on each array in addition to the 1417 Agilent controls.
An Oligo aCGH/ChIP-on chip hybridization kit (Agilent) was used for hybridisation of the labelled cDNA to the microarrays. Probe hybridization took place at 65°C for 20 hrs with constant rotation (10 rpm). Microarrays were scanned using the Agilent Microarray Scanner System (G2505B) and the scanned files were converted to data files with Feature Extraction software (Aglient, version 9.1). Outliers were identified and removed using the Grubbs test  and the mean of replicate probes was calculated. The Cyber-T test  was employed to calculate p-values. Significance was apportioned to genes with an expression ratio ≥5 and a p-value of ≤1.0x10-4. Final expression ratios presented are the average of three biological replicates.
Artemis comparison tool
Basic Local Alignment Search Tool
Clustered Regularly Interspaced Short Palindromic Repeats
Lactic Acid Bacteria
National Center for Biotechnology Information
polymerase chain reaction
Nonredundant protein database
tumour necrosis factor
This work was supported by a Principal Investigator award (07/IN.1/B1780) from Science Foundation Ireland to PWOT. BAN was the recipient of an Embark studentship from the Irish Research Council for Science Engineering and Technology.
This article has been published as part of Microbial Cell Factories Volume 10 Supplement 1, 2011: Proceedings of the 10th Symposium on Lactic Acid Bacterium. The full contents of the supplement are available online at http://www.microbialcellfactories.com/supplements/10/S1.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.