Gleaning evolutionary insights from the genome sequence of a probiotic yeast Saccharomyces boulardii

Background The yeast Saccharomyces boulardii is used worldwide as a probiotic to alleviate the effects of several gastrointestinal diseases and control antibiotics-associated diarrhea. While many studies report the probiotic effects of S. boulardii, no genome information for this yeast is currently available in the public domain. Results We report the 11.4 Mbp draft genome of this probiotic yeast. The draft genome was obtained by assembling Roche 454 FLX + shotgun data into 194 contigs with an N50 of 251 Kbp. We compare our draft genome with all other Saccharomyces cerevisiae genomes. Conclusions Our analysis confirms the close similarity of S. boulardii to S. cerevisiae strains and provides a framework to understand the probiotic effects of this yeast, which exhibits unique physiological and metabolic properties.


Background
Probiotics are live microbes that assist in restoring the symbiotic intestinal gut flora balance and thus bestow health benefits to the host [1,2]. The most commonly used human probiotics are members of the Lactobacillus and Bifidobacterium species [3]. Besides these bacteria, Saccharomyces boulardii (Sb), a yeast strain, is also widely used as a probiotic to treat a variety of conditions [4] including antibiotics-associated diarrhea and recurrent Clostridium difficile infection. A primary advantage of using Sb as a probiotic is that it can be used by patients undergoing antibiotic regimen due to its natural resistance to antibiotics [5]. The genetic transfer of antibiotic resistance genes, a frequent event between pathogenic and gastrointestinal tract (GIT) bacteria, is not as frequent between yeast and bacteria [6,7]. Furthermore, Sb is also tolerant to various local stresses such as the presence of gastrointestinal (GI) enzymes, bile salts, organic acids etc. and can withstand considerable variations in pH and temperature while transiting through the human GIT [8].
Sb is a tropical strain of yeast that is thermophilic and mostly non-pathogenic to humans [5,[9][10][11][12]. It was first isolated from the skin of lychee and mangosteen fruits in 1923 by the French scientist Henri Boulard in the Indo-China region, and has since then shown to be effective as a preventive and therapeutic agent for diarrhea and other GI disorders caused by the administration of antimicrobial agents [13]. The detailed and precise mechanisms of the action of Sb to confer protection against several diseases remain as yet unexplored although some specific proteins have been suggested to play key roles in its probiotic function. For example, a 54 kDa serine protease from Sb was reported to provide protection against C. difficile infections by cleaving toxins A and B [14,15]. Similarly, a 120 kDa protein has been suggested to play a role in neutralizing the secretion induced by the cholera toxin by reducing cyclic Adenosine Monophosphate (cAMP) levels [16]. Sb can also inhibit the Escherichia coli endotoxin by dephosphorylation mediated by a 63 kDa protein phosphatase [17]. Likewise, Sb can decrease IL-8 proinflammatory cytokine secretion in Enterohemorrhagic E. coli (EHEC) infections by inhibiting the NF-κB and Mitogen-Activated Protein Kinase (MAPK) signaling pathways [18]. Sb has also been shown to inhibit Candida albicans translocation from GIT of mice [19] and can inhibit in vitro adhesion of Entamoeba histolytica trophozoites to human erythrocytes [20] and C. difficile adhesion to vero cells [21]. These enteropathogens adhere to the host tissue surface as the initial event for infecting the host. The outer membrane of Sb which is particularly rich in mannose, as compared to other yeasts, adheres to the enteropathogens strongly and inhibits their binding to the mucous membrane of the GIT [22].
The taxonomic position of Sb has been an issue of intense debate [8]. Sb was initially suggested to be a new species of the hemiascomycota genus Saccharomyces [23]. Based on comparative electrophoretic karyotyping and multivariate analysis of the polymorphism observed in pulsed-field gel electrophoresis Cardinali and Martini [24] classified Sb outside of the S. cerevisiae group. However, molecular phylogenetics and typing using molecular techniques viz., species-specific polymerase chain reaction (PCR), randomly amplified polymorphic DNA-PCR, restriction fragment length polymorphism analysis (RFLP) of rDNA spacer region and pulsed field gel electrophoresis (PFGE) helped identify Sb as a strain of S. cerevisiae [25]. Moreover, comparative genomic hybridization also established that S. cerevisiae and Sb are members of the same species [25]. But, Sb differs from other S. cerevisiae genetically as comparative genome hybridization using oligonucleotide-based microarrays reveal trisomy of chromosome IX and altered copy number of individual genes [26]. When compared to S. cerevisiae strains S288c and 1171 T there is 100% similarity in the D1/D2 domain sequence of the 26S rDNA of eight Sb strains and more than 95% similarity to mitochondrial cytochrome-c oxidase II gene (COX2) sequences [27]. Another differentiating criteria reported in literature is that Sb is incapable of metabolizing galactose as a source of carbon [23,28]. However, McCullogh et al. (1998) [29] have shown that galactose could be metabolized by some Sb strains. Therefore, we determined the genome sequence of Sb in order to get insights into the evolutionary history and taxonomic position and to get a better understanding of the various probiotic effects of this yeast, which exhibits unique physiological and metabolic properties.

Isolation and purification of S. boulardii genomic DNA
A sachet of Dr. Reddy's Laboratories Econorm 250 mg (B. No. 1500, Mfg. Date: 05/12, Expiry Date: 04/14) containing lyophilized cells of Sb was used as the source of the probiotic yeast. Yeast cells were suspended in Milli-Q water, serially diluted, and plated on Yeast Mold (YM) agar (Difco) plates. The plates were incubated at 37°C for 48 hours. An isolated colony was picked from the plate and cultured in Yeast Extract-Peptone-Dextrose (YEPD) broth (HIMEDIA) for 24 hours at 37°C in a rotary shaker (180 RPM). The cells were centrifuged at 5000 g for 10 minutes and washed with distilled water. DNA isolation was performed using the ZR Fungal/Bacterial DNA mini prep kit (Zymogen) as per instructions in its user manual. After isolation, the genomic DNA was treated with RNase A (1 μl of a 10 μg/mL stock solution for 100 μl of solution containing DNA) and incubated at 37°C for 30 minutes. Then, 1/10 volume of 3 M sodium acetate (pH 5.2) and 2.5 volumes of absolute ethanol was added followed by incubation at −20°C overnight and centrifugation at 14,000 rpm for 30 minutes at 4°C. Supernatant was carefully discarded; pellet was rinsed with 70% ethanol and centrifuged again at 14,000 rpm for 15 minutes at 4°C. The ratio of OD at 260/280 nm was~1.8 as observed by NanoDropND-1000 spectrophotometer.

Internal transcribed spacer-polymerase chain reaction (ITS-PCR)
To amplify the ITS regions, primers ITS1 (TCCGTAGG TGAACCTGCGG) and ITS4 (TCCTCCGCTTATTGAT ATG) were used. Amplification was done using a mixture containing 1× Standard Taq Reaction Buffer, dNTPs (200 μM), BSA (0.3 μg/μL), template DNA (500 ng), 1.25 units/50 μl PCR Taq DNA Polymerase (Taq DNA Polymerase with Standard Taq Buffer; New England BioLabs) and forward and reverse primers (0.2 μM each). The cycling parameters used for amplification were: initial denaturation for 5 minutes at 95°C, followed by 30 cycles of 30 seconds at 95°C, 30 seconds at 50°C and 90 seconds at 72°C, with a final extension for 10 minutes at 72°C and cooling to 4°C. The amplified products were separated on 1.2% agarose gels, visualized and photographed on an Alpha Image Analyzer (Alpha Innotech Corporation, CA). The DNA from the amplified bands was eluted with QIAquick Gel Extraction Kit (Qiagen N.V.). The eluted DNA was further amplified for sequencing using Terminator Ready Reaction Mix (1 μl), Sequencing Buffer (1 μl; 5×; 200 mM Tris-Cl, 5 mM MgCl 2 , pH 9.0), PCR amplified DNA (35 ng), primer (3.2 pmol) and Milli-Q water to make up the volume to 10 μl. PCR cycling conditions were: initial denaturation for 1 minute at 96°C, followed by 24 cycles of 10 seconds at 96°C, 5 seconds at 50°C and 4 minutes at 60°C and cooling to 4°C. The final PCR product was sequenced on a Sanger sequencer and the resulting ITS sequence was compared against all available ITS sequences on the NCBI database.

Genome sequencing
Sb EDRL (Econorm -Dr. Reddy's Laboratories) was sequenced using the 454/Roche GS FLX Titanium system. Library preparation was carried out according to the GS FLX Titanium Rapid Library Preparation Kit (Roche Applied Sciences) at Centre for Cellular and Molecular Platforms (C-CAMP), Bangalore, India. The genomic DNA was first sheared into 400-1000 base-pair-long fragments and its quality was assessed using a BioAnalyzer DNA 7500 LabChip. Blunt ends were generated and adaptor ligation was followed by removal of small fragments. After library immobilization, the library was quantified using the RiboGreen method and the final yield was calculated. Half PicoTiter Plate of 454 shotgun sequencing was performed and resulted in a total of 733,390 reads for Sb EDRL with~50× coverage.

Assembly, mapping and annotation
The 454 shotgun reads were assembled de novo using Newbler v2 [31]. Multiple assemblies were obtained by varying the parameters for the minimum overlap length (ml) and minimum overlap identity (mi). The assembly that contained the lowest number of contigs and best N50 score (parameters: minlen 45; mi 96; ml 100) was chosen for further analysis. The quality of the assembly was further checked by mapping back the reads on the draft genome and visually checking for errors. Feature annotation was carried out by the MAKER pipeline [32] and tRNA was predicted by tRNAscan-SE 1.23 [33]. Features thus annotated were subjected to BLASTp [34] for functional characterization of the proteins with an E-value cutoff of 1e -5 . In addition to de novo assembly, the 454 shotgun reads were mapped onto 34 draft and complete genomes of S. cerevisiae available at NCBI (Additional file 1) using the mapping algorithm of CLCbio Genomics wb6 (www.clcbio.com).

Comparative genomics
The draft genome annotated by the MAKER [32] pipeline and the reference-mapped genomes were searched for the presence of proteins in the molecular weight range of 54 kDa and 120 kDa using an in-house Practical Extraction and Report Language (PERL) script. The 63 kDa protein was retrieved by BLASTp using as query a 38-amino acid sequence stretch of this protein previously reported in literature [17], and the molecular weight was further confirmed using our PERL script. In order to narrow down on the 54 kDa protease, proteins in the molecular weight range 50-60 kDa, retrieved as mentioned above, were subject to BLASTp against the MEROPS database (http://merops.sanger.ac.uk/) and an independent hmmscan [35] run against the protein family (PFAM) database [36] with an E-value cutoff of 1e -5 . These proteins were further used as queries in the Fold and Function Assignment System (FFAS) [37] program to retrieve annotated homologs and subjected to BLASTp against the Gene Ontology (GO) database [38] with an E-value cutoff of 1e -5 . Likewise, the sequences of 119-121 kDa proteins obtained using our PERL script was subjected to BLASTp against the GO database with an E-value cutoff of 1e -5 and Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) [39] analysis was performed to find interacting partners. Galactose metabolizing enzymes of the Leloir pathway [40] were found in the annotated genome by initiating standalone BLASTp searches using sequences of these enzymes from other S. cerevisiae.

Quality assurance
The genomic DNA was purified from a commercially available lyophilized Sb (Econorm sachet; Dr. Reddy's Laboratories) and was further confirmed by ITS sequencing. The ITS sequence was >99% identical to that of the S. boulardii culture collection (KC2540481.1) strain. PMS1, PMS2 and PMS3 markers were used to perform microsatellite fingerprinting in order to verify the similarity between Sb EDRL to other commercially available Sb strains (Sb Uni-Sankyo Ltd. [Now known as Sanzyme Ltd.] and Sb Kirkman) (Figure 1).

Genome characteristics
Next-generation sequencing of Sb EDRL on the Roche 454 GS-FLX Titanium platform resulted in a total of 733,390 shotgun reads of length 40-1773 bp. High-quality reads with~50× coverage were assembled using Newbler v2.8 to obtain a draft genome of 11.4 Mbp in 194 contigs (N50: 251,807 bp). The GC content was 38% and 285 tRNA were present in Sb EDRL. Feature annotation was performed using the MAKER pipeline with Augustus [41] as the gene predictor. Of the 5803 coding sequences (CDS) regions predicted, 4635 (79%) could find hits with S. cerevisiae proteins when subject to a BLAST analysis against the non-redundant (nr) database. Reference-based mapping of Sb data onto other S. cerevisiae genomes revealed that the maximum number of reads map to S. cerevisiae strains Lalvin QA23 followed by EC1118, RM11-1a and S288c suggesting high similarity to these genomes (Additional file 2).

54-kDa serine protease and 120 kDa protein
Sb is being used in treatment of C. difficile-induced diarrhea and colitis [14]. Sb can inhibit the toxins A and B of C. difficile by producing a 54 kDa serine protease that cleaves these toxins [14,15]. Approximately 600 proteins of the Sb genome in the molecular weight range of 50-60 kDa were subjected to BLASTp against the MEROPS database and of these, 221 hits were obtained. These proteins were further subjected to BLASTp runs against the GO database [38]. Four proteins were found to be putative serine proteases based on their GO annotation and presence of conserved serine protease signature motifs. These four proteins belong to the carboxypeptidase and subtilisin-like sub-classes of serine protease (Additional file 2 and Additional file 3). Independently, all the annotated proteins from the MAKER pipeline were subjected to hmmscan against the PFAM database. Twenty two serine proteases spanning 10 subclasses of the serine protease family were obtained. Of these 22 proteins, 4 were in the molecular weight range of 50-60 kDa and were the same proteins identified by our previous search against the MEROPS database as putative serine proteases. It is therefore tempting to speculate that one or more of these four proteins could possibly play a role in cleaving the C. difficile toxins A and B.
A 120 kDa protein has been suggested to neutralize the effects of cholera toxin by reducing cAMP levels in mouse [17]. Fifteen proteins in the molecular weight range of 119-121 kDa were retrieved (Additional file 3). These proteins were subjected to BLASTp against the GO database [38] with an E-value cutoff of 1e -5 and FFAS search was performed. These 15 proteins mostly belong to the family of kinases and transporters (Additional file 2). The interacting partners of these proteins were fetched out by STRING analysis (Additional file 2). It is possible that any of these proteins may be involved in the cAMP pathway to neutralize the effects of the cholera toxin.

Galactose metabolism
One of the distinguishing features of Sb is that unlike S. cerevisiae, it is incapable of utilizing galactose as a source of carbon [23,28,29,40,43]. However there are reports of some Sb strains, which can utilize galactose [29]. The enzymes galactose-mutarotase, galactokinase, galactose-1phosphate uridyltransferase, UDP-galactose-4-epimerase and phosphoglucomutase are constituents of the Leloir pathway which helps to catalyze the conversion of galactose to glucose-6-phosphate [29]. Glucose-6-phosphate can then further be utilized via the glycolysis pathway. Galactokinase, galactose-1-phosphate uridyltransferase and UDP-galactose-4-epimerase are in synteny in most of the S. cerevisiae strains such as RM11-1a, S288c, AWRI796, BY4741, BY4742, Fosters B, FostersO, JAY291, LalvinQA23, Sigma1278b, T7, UC5, Vin13, VL3, YJM789, AWRI1631, CEN.PK113-7D, EC1118, Kyokai no. 7, ZTW1 and a similar synteny was observed for our Sb (Figure 3). S. cerevisiae strains CLIB382 and Y10 were not included in this analysis as their genomes are not annotated completely on the SGD. Surprisingly, our Sb genome has all the genes responsible for galactose uptake and fermentation, although it has been suggested that Sb is not able to utilize galactose as a carbon source [28,44]. We have experimentally confirmed that Sb EDRL whose genome is reported here can assimilate galactose but cannot ferment it (unpublished results). Given that all genes responsible for galactose assimilation and fermentation are present in our genome, an effect similar to that observed by van den Brink et. al., 2009 [45] for S. cerevisiae CEN.PK113-7D wherein the yeast is unable to suddenly switch from glucose to galactose in the absence of oxygen (fermentation), due to energetic requirements, may help explain the inability of our Sb to ferment galactose.

Future directions
The genome of Sb offers an important starting point to glean insights into the probiotic effects of this yeast and help differentiate it from other strains of S. cerevisiae. It is possible that the various probiotic effects that have been attributed to Sb do not necessarily correspond to the same strain of this organism. Therefore it will be of interest to obtain genomes of different commercially available strains of Sb in order to get a complete genome-level understanding of this important probiotic yeast.

Conclusions
The first draft genome of Sb provides a framework to understand at a molecular level, some of the properties of this novel probiotic yeast. While it has been shown that Sb strains do not utilize galactose [28], our genome surprisingly reveals the conservation of all enzymes involved in the Leloir pathway. In addition, we were able to locate a 63 kDa phosphatase that has been suggested to inhibit the toxicity of E. coli surface endotoxins [17]. Furthermore, we could shortlist putative candidates for the 54 kDa serine protease and the 120 kDa protein that confers additional probiotic functions to Sb. It is of interest to note that none of these proteins however are unique to Sb and have close homologues in other strains of S. cerevisiae.