Comparative genomic analysis of Clostridium difficile ribotype 027 strains including the newly sequenced strain NCKUH-21 isolated from a patient in Taiwan

Background Clostridium difficile is a Gram-positive anaerobe and the leading cause of antibiotic-associated diarrhea worldwide. The emergence of ribotype 027 (RT027) strains is associated with increased incidence of infection and mortality. To further understand the relationship between C. difficile NCKUH-21, a RT027 strain isolated from a patient in Taiwan, and other RT027 strains, we performed whole-genome shotgun sequencing on NCKUH-21 and comparative genomic analyses. Results The genome size, G+C content, and gene number for the NCKUH-21 strain were determined to be similar to those for other C. difficile strains. The core genome phylogeny indicated that the five RT027 strains R20291, CD196, NCKUH-21, BI1, and 2007855 formed a clade. A pathogenicity locus, tcdR-tcdB-tcdE-orf-tcdA-tcdC, was conserved in the genome. A genomic region highly similar to the Clostridium phage \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upvarphi$$\end{document}φCD38-2 was present in the NCKUH-21 strain but absent in the other RT027 strains and designated as the prophage \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upvarphi$$\end{document}φNCKUH-21. The prophage \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upvarphi$$\end{document}φNCKUH-21 genes were significantly higher in G+C content than the other genes in the NCKUH-21 genome, indicating that the prophage does not match the base composition of the host genome. Conclusions This is the first whole-genome analysis of a RT027 C. difficile strain isolated from Taiwan. Due to the high identity with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\upvarphi$$\end{document}φCD38-2, the prophage identified in the NCKUH-21 genome has the potential to regulate toxin production. These results provide important information for understanding the pathogenicity of RT027 C. difficile in Taiwan. Electronic supplementary material The online version of this article (10.1186/s13099-017-0219-4) contains supplementary material, which is available to authorized users.


Background
Clostridium difficile is a Gram-positive, endospore-forming obligate anaerobe and the current leading cause of antibiotic-associated diarrhea (AAD) within hospital settings worldwide [1]. Estimates have revealed that C. difficile infections (CDI) are responsible for 15-25% of all AAD cases [2]. Onset of CDI can be engendered by disruption of the hosts' gut microbiota by broad-spectrum antibiotic treatments. Aging, prolonged stay in health care settings, and proton-pump inhibitor use all contribute to increased risk of CDI [3]. Although C. difficile has been characterized for decades, it first gained prominence in 2003 when an outbreak in North America was found to be caused by a strain with toxin hyperproduction capabilities [4]. The rapid spread of C. difficile NAP1/BI/027 strain (PCR ribotype 027 or RT027), which is the same strain characterized with different methods has resulted in outbreaks worldwide, although cases in Asia and Latin America were less reported compared with Europe and North America.

Open Access
Gut Pathogens *Correspondence: ihsiuhuang@mail.ncku.edu.tw; jc923@mail.ncku.edu.tw 7 Department of Microbiology and Immunology, College of Medicine, National Cheng Kung University, 1 University Road, Tainan 70101, Taiwan Full list of author information is available at the end of the article According to a previous case report, NCKUH-21 is the strain isolated from the first severe RT027 CDI in Taiwan, and it contains a deletion of 18 base pairs and a truncated mutation (D117A) in tcdC [5]. To further understand the relationship between NCKUH-21 and other RT027 strains including historic strains and hypervirulent strains, we determined the genome sequence of the C. difficile strain NCKUH-21 (the accession numbers: BDSN01000001-BDSN01000094) and compared it with other sequenced RT027 strains. We assessed the presence of virulence and antibiotic resistance genes for the NCKUH-21 genome. We also compared the genome sequences of the NCKUH-21 strain with its close relatives to investigate the genome synteny, reconstruct the phylogenetic tree, and identify NCKUH-21 strain-specific genes.

Methods
Genome sequencing, assembly, and annotation for the strain NCKUH-21, as well as comparative genomics of nine C. difficile strains (Table 1), were performed as described in Additional file 1: Materials and methods.

Quality assurance
Genomic DNAs were purified from a pure culture of a single bacterial isolate of NCKUH-21. A BLAST search against a nonredundant database revealed no potential contamination of the genomic libraries.

Genomic features
Illumina MiSeq sequencing was performed to determine the genome sequence of the C. difficile strain NCKUH-21. The de novo assembly contained 94 contigs of length 4,217,149 bp, with a G+C content of 28.4% with sequencing coverage of 1611×. Genome annotation yielded a total of 3810 protein-coding sequences (CDSs).
Among the C. difficile strains analyzed in this paper, the genome size (Mb) ranged from 4.05 to 4.46, G+C content ranged from 28.4 to 29.2%, and CDS number ranged from 3485 to 4128 (Table 1). The general genomic features for the NCKUH-21 strain were thus similar to those of the other C. difficile strains.

Phylogeny
Clostridium difficile strains with the same PCR ribotype were reported to cluster together in the phylogenetic trees for the conserved genes [6]. The Roary pipeline produced a total of 8775 homologous groups of genes ("pangenome"), of which 69 were shared by all the strains used in this study ("core-genome"). The core genome phylogeny indicated that the RT027 strains (R20291, CD196, NCKUH-21, BI1, and 2007855) formed a monophyletic group or clade, joined by the Z31 and 630 strains, followed by the M68 strain, and finally the M120 strain ( Fig. 1).

Synteny
The Mauve Contig Mover (http://darlinglab.org/mauve/ user-guide/reordering.html) was used to reorder the contigs of NCKUH-21 relative to the complete genome of C. difficile CD196. The genomes of the nine C. difficile strains were aligned using progressiveMauve, and this alignment was visualized using genoPlotR to investigate genomic rearrangement (Fig. 2). The genome synteny was determined to be conserved among all but one of the strains. An exception was the Z31 strain with large-scale genomic rearrangement, which had not been previously reported [7].
A total of 140 protein-coding genes were present in the NCKUH-21 strain but absent in the other strains (Additional file 2: Table S1). The NCKUH-21 strain-specific genes could have been gained on the branch leading to the NCKUH-21 strain, and they could thus be linked to its specific phenotype (e.g., virulence and pathogenicity). Of the 140 NCKUH-21 strain-specific genes, 50 were encoded on the 40,525-bp-long contig sequence of the NCKUH-21 genome (the accession number: BDSN01000034), which showed a 99% identity match to the Clostridium phage ϕCD38-2 (GenBank accession: HM568888). The genomic region highly similar to the Clostridium phage ϕCD38-2 was designated as the prophage ϕNCKUH-21.

Prophage ϕNCKUH-21
The prophage ϕNCKUH-21 detected in the draft genome for the C. difficile strain NCKUH-21 was further confirmed by phage induction examination and electron microscope imaging (data not shown). A previous study suggested that lysogenic ϕCD38-2 replicates as a circular plasmid and boosts toxin production in C. difficile [10]. The high sequence identity between ϕNCKUH-21 and ϕCD38-2 suggests that these prophages have a similar role in C. difficile.
Reports have revealed that bacterial phages tend to be lower in G+C content than their hosts and that viruses match the G+C content of their hosts [11,12], including the C. difficile bacteriophage ϕCD119 [13]. Base composition statistics for the NCKUH-21 genes were calculated as the relative frequency of G+C at third codon positions (GC3). The median GC3 value for the prophage ϕ NCKUH-21 genes (0.21) was higher than that for the other genes (0.14) in the NCKUH-21 genome. A Wilcoxon rank sum test, which compared the GC3 values between the two groups of genes, was highly significant (P < 2.2e−16). This suggests that the prophage ϕNCKUH-21 does not match the base composition of the host genome and may thus have been acquired by horizontal transfer based on the hypothesis of genome amelioration [14].

Concluding remarks
From 2013 to 2014, three RT027 C. difficile strains were isolated from patients in Taiwan [5,15,16]. Among them, NCKUH-21 is the first strain to have a whole-genome sequence for genome comparison. Whether the other two RT027 isolates also carry a complete prophage, what their phylogenetic relation with NCKUH-21 is, and what the relative toxin production level is between the three isolates are all topics for further research.