Complete genome sequence of Peptoclostridium difficile strain Z31

Background Peptoclostridium (Clostridium) difficile is a spore-forming bacterium responsible for nosocomial infections in humans. It is recognized as an important agent of diarrhea and colitis in several animal species and a possible zoonotic agent. Despite the known importance of P. difficile infection in humans and animals, no vaccine or other effective measure to control the disease is commercially available. A possible alternative treatment for P. difficile infection is the use of a nontoxigenic strain of P. difficile as a competitive exclusion agent. However, a thorough knowledge of this strain is necessary for this purpose. We selected P. difficile Z31, a nontoxigenic strain (PCR ribotype 009), for investigation because it prevents P. difficile infection in a hamster model. Results The genome sequence of P. difficile Z31 is a circular chromosome of 4298,263 bp, with a 29.21 % GC content, encoding 4128 proteins, and containing 78 pseudogenes. This strain belongs to ST 3, clade 1, and has five phage regions in its genome. Genes responsible for resistance to tetracycline and erythromycin were detected and more importantly, Z31 also contains genes that promote spore production and stability, cell attachment, intestinal adherence, and biofilm formation. Conclusion In this study, we present the first complete genome sequence of nontoxigenic P. difficile strain Z31. When the Z31 genome was compared with those of other isolates available in GenBank, including a draft genome of a nontoxigenic strain, several unique regions were evident. Z31 contains no toxin genes, but encodes several non-toxin virulence factors, which may favor host colonization. Electronic supplementary material The online version of this article (doi:10.1186/s13099-016-0095-3) contains supplementary material, which is available to authorized users.

Until the late 1970s, P. difficile was not recognized as pathogenic bacteria. However, in this decade, P. difficile and its toxins were related in fecal contents of human patients with pseudomembranous colitis [6] and the disease was reproduced in hamsters [7], confirming the importance of this microorganism as an enteropathogen. Today, this bacterium is known to be the cause of P. difficile infection (PDI), the main cause of nosocomial diarrhea in humans worldwide and a possible cause of diarrhea in general community [8,9].
In veterinary medicine, P. difficile is the most important uncontrolled cause of neonatal diarrhea in piglets in the USA and Europe, and also occurs in other domestic animals and some wild species [10,11]. In piglets, CDI affects animals to 1-7 days of life, and it was demonstrated that until 1 day of life, 68-100 % of the animals are infected by the microorganism [12,13]. The disease is subclinical, and just few animals show diarrhea, however, the infection can affect the development of the animals causing economic losses to the farmer [14].
The pathogeny of PDI involves the colonization of colon by some toxigenic strain of P. difficile and production of its toxins, the toxin A, an enterotoxin, and toxin B, a cytotoxin, that act synergistically causing cytoskeleton damages, cell rounding, disruption of tight junctions and cell death [15]. The genes responsible to produce toxins, the main difference between toxigenic and nontoxigenic strains, are localized in a pathogenicity locus of 19 kb, called PaLoc [16].
Despite the known importance of P. difficile in humans and animals, no vaccine is yet commercially available. Studies have shown that recombinant and classical immunogens expressing toxins A and B can prevent the occurrence of diarrhea or reduce the severity of P. difficile infection (PDI) in a rodent model [17]. These vaccines might limit, but cannot prevent, the fecal shedding of the microorganism, which is essential because P. difficile is a nosocomial pathogen. Because this bacterium is also a potential zoonotic agent, preventing its colonization of domestic animals should be a priority [10]. Among other alternative preventive strategies examined, the use of nontoxigenic P. difficile strains to prevent PDI has been shown to reduce the occurrence of the disease in humans and piglets by preventing their colonization by toxigenic strains [18][19][20][21].
There has been no report of the complete genome sequence of a nontoxigenic P. difficile strain, a necessary step in understanding this candidate live vaccine. Therefore, in this study, we determined the complete genome sequence of P. difficile nontoxigenic strain Z31.

Growth conditions and DNA isolation
Peptoclostridium difficile Z31, ribotype 009, a nontoxigenic strain isolated from a healthy dog on February 1, 2009, in the city of Belo Horizonte (state of Minas Gerais, Brazil), was selected for sequencing because it prevented PDI in hamster model [22], similar to some strains previously reported [23]. This strain was grown in Mueller-Hinton agar supplemented with 5 % blood and 0.1 % taurocholate at 37 °C under anaerobic conditions for 48-72 h. Its genomic DNA was extracted with the Maxwell 16 ® Research Instrument (Promega, USA) combined with lysozyme (10 mg/mL) and proteinase K (20 mg/mL). Briefly, cells were incubated overnight in lysozyme solution (10 mg/mL) at 37 °C. Proteinase K was added and the mixture was incubated at 56 °C for 30 min. According to the kit instructions: (i) the samples were lysed in the presence of a chaotropic agent and a detergent; (ii) the nucleic acids were bound to silica magnetic particles; (iii) the bound particles were washed, to isolate them from other cell components; and (iv) the nucleic acids were eluted into a formulation for sequencing. The extracted DNA was stored at −80 °C until analysis.

Genome sequencing and assembly
The genome was sequenced with the Ion Torrent PGM ™ , in a mate-pair sequencing kit with an insert size of 3 kbp (~144-fold coverage) and with a fragment sequencing 400 bp kit (~318-fold coverage). The quality of the raw data was analyzed with FastQC [24] and the sequence was assembled with the Mira 4.9.1 software [25] and Newbler 2.9 (Roche, USA) for the fragment library, and with SPAdes 3.5.0 [26] for the mate-pair library (the parameters for all the assembler software are shown in Additional file 1). This was the ab initio strategy applied to all libraries. The larges contigs obtained with Newbler and Mira were used as input, as trusted-contigs, in SPAdes. We obtained 20 scaffolds, with an N50 value of 698,574 bp, and the largest scaffold had a length size of 1691,449 bp. Gap filling was conducted with CLC Genomics Workbench 7 (Qiagen, USA), after the construction of a super scaffold with the CONTIGuator 2.0 software [27], using the default parameters and P. difficile strain CD196 (GenBank: NC_013315.1) as the reference. The gaps in the rRNA operon regions were filled by consensus mapping to the reference, and the remaining gaps were mapped recursively to the raw data on the gap flanks, and it was repeated several times until an overlap was found.

Genome annotation
The genome was annotated automatically with the Prokka 1.10 software (Rapid Bacterial Genome Annotation) [28], with the default parameters and nested databases in the order: TrEMBL Uniprot containing only (Pepto) Clostridium spp. proteins and RefSeq database. The genome was also curated manually in all putative frameshifts using the Artemis software [29], based on the coverage visualized with the CLC Genomics Workbench 7 software, with corrected indel assembly bias. Genes encoding signal peptides were identified with the SignalP 4.0 software [30] on a local installation, followed by the identification of transmembrane helices with Tmhmm 2.0 [31] and a Pfam domain search with PfamScan [32]. These three tools were used with their default parameters.

Multilocus sequence typing (MLST) and in silico PCR
MLST was performed with PubMLST (available at http:// pubmlst.org/cdifficile/) using the complete genome sequence. An in silico PCR search for genes related to virulence factors and antimicrobial resistance was performed with the jPCR software [33], with the default parameters and the primer sets shown in Additional file 2.

Quality assurance
Genomic DNA was isolated from a pure bacterial isolate and confirmed with 16S rRNA gene sequencing. All the raw sequencing data were mapped onto the final genome and the lack of contamination with other genomes was confirmed by the coverage and the low number of unmapped reads.
An alignment was constructed with the 16S rRNA sequence regions on the assembled scaffolds, predicted with the Barrnap software (available at https://github. com/tseemann/barrnap), and the 16S rRNA genes of genomes available in GenBank. A phylogenetic tree was constructed from this alignment with the neighborjoining method based on 1000 randomly selected bootstrap replicates, using the CLC Genomic Workbench 7.0 software. On the tree, strain Z31 was positioned among other P. difficile strains (Fig. 2).

Results and discussion
After the genome assembly, gap filling, and annotation process, an in silico PCR was performed through searching for genes related to virulence factor, antibiotic resistance, and other known toxins. Considering the perspective of using the nontoxigenic strain Z31 to prevent PDI by competitive exclusion, some non-toxin virulence factors Fig. 2 Phylogenetic tree of Peptoclostridium difficile strain Z31 representing the relative position in the genus Peptoclostridium based on 16S sequences. The statistical method used was maximum likelihood and the bootstrap number was 1000. Thus, the values next to the nodes represent the percentage on the number of times, in 1000 repetitions, in which that clade was formed are desirable, predominantly those factors responsible for spore production and stability and those that promote cell attachment and host colonization. Z31 is positive for Cwp84 and surface-layer protein A (SlpA). SlpA is considered the major factor responsible for bacterial intestinal adherence, and Cwp84 is essential for the formation of that protein [34,35]. GroEL, Cwp66, and a fibronectin-binding protein (Fbp68), which are also important in host-cell adherence, were also found [34][35][36][37][38]. Strain Z31 was also positive for genes encoding the flagellar proteins FliC and FliD, which play roles in the colonization and adherence of Z31 in vivo and are essential in later stages of biofilm formation [39][40][41]. These factors found in Z31 related to cell attachment are extremely important, because non toxigenic strains have to be able to compete with toxigenic strains by the colonization sites to prevent the disease [23].
The gene encoding the major regulator of sporulation in P. difficile, Spo0A, was detected in this strain. An absence or deficiency of Spo0A can cripple or impair the sporulation process [35,42]. Genes encoding five spore coat proteins (cotA, cotB, cotC, cotD, and sodA) were also detected. The cotA protein is the most important protein in stabilizing the spore coat and ensures the integrity of this structure [43]. The formation of stable spores is also important for a nontoxigenic strain candidate to prevent the disease, because the bacteria need to pass through the stomach and be able to colonize the colon [23]. Vegetative cells are sensible to low pH, on the other hand, the spores resist to this conditions, allowing a great number of viable particles reaches the colon [44]. Genes responsible for resistance to tetracycline (tetM) and erythromycin (ermG) were also detected with previously described PCR primers [45,46]. In contrast, none of the genes encoding proteins directly linked to toxin production were detected (tcdA, tcdB, tcdC, cdtA, or cdtB) [47] confirming the absence of the pathogenicity locus (PaLoc), which is essential for P. difficile infection [48].
Furthermore, the complete genomes of this species available at GenBank were selected to perform a similarity analysis with Gegenees software [49] with sequence fragmentation length of 500 nucleotides and a threshold of 30 %. Also, two complete genomes of species of the Clostridium genus were included as an outgroup. The similarity matrix was used to generate a heatplot and a ".nexus" format for phylogenomic analysis (Additional file 3). Although the Z31 strain is a nontoxigenic strain, the Additional file 3 shows that clade of this strain is paraphyletic with the type strain ATCC9689, a known as toxigenic strain, suggesting an evolutionary derivation of a same organism. Thereby, the nontoxigenic behavior of the Z31 strain seems to be occasioned by the losses of the toxin genes.

Initial findings
The P. difficile genome is composed of a circular chromosome of 4298,263 bp. The GC content is 29.21 % and the genome contains 78 pseudogenes. Briefly, the genome has 4206 CDSs, and encodes 29 rRNAs, one transfermessenger RNA (tmRNA), and 58 tRNAs. Table 1 summarizes the subset of the 3809 genes with predicted functions that are associated with each COG functional categories. In summary, 3324 genes were predicted to have Pfam domains, 166 to have signal peptides, and 1011 to have transmembrane helices. No CRISPR repeats were found. Figure 3 shows the disposition of RNAs and CDSs coding sequences on the forward and reverse strands, the GC content, and the GC skew.
When the genome of Z31 was compared with those of other P. difficile strains deposited in Gen-Bank [50], it showed high similarity to them Table 1 Number of genes associated with general COG functional categories [55] a The percentage is based on the total number of protein coding genes in the annotated genome b The total not correspond to the final quantity of CDSs for each genome, because some genes are associated with more than one COG functional categories , with the exception of some genomic islands (Fig. 3), four of which were predicted with PHAST [51] to be phage regions. A brief description of these phages is given in Table 2.
A robust high-throughput MLST scheme for P. difficile was developed and validated [52], and allowed this species to be genotyped directly. Z31 was typed with MLST at loci adk 1, atpA 1, dxr 2, glyA 1, recA 1, sodA 1, and tpi 1, which classified this strain as ST3 in MLST clade 1. This result corroborates previous work, which reported that strains from PCR ribotype 009 are commonly classified as ST3 [53]. Strain Z31, P. difficile ATCC9689/ DSM1296, and P. difficile BI9 were the only three ST3 strains identified among the strains whose complete genomes or near-complete genomes (e.g., one scaffold) are deposited in GenBank. However, Z31 contain some unique regions, as shown in Fig. 3 (U1-U12). In contrast, P. difficile 5.3, described as nontoxigenic by Darling et al. [54], belongs to ST15, clade 1, a common classification for strains of PCR ribotype 010.

Future directions
Further analysis of the P. difficile Z31 genome will provide new information about the adaptation of this strain to the gastrointestinal tract, and new insights into its inhibition of toxigenic P. difficile strains.