Genomic characterization of nine Clostridioides difficile strains isolated from Korean patients with Clostridioides difficile infection

Background Clostridioides difficile infection (CDI) is an infectious nosocomial disease caused by Clostridioides difficile, an opportunistic pathogen that occurs in the intestine after extensive antibiotic regimens. Results Nine C. difficile strains (CBA7201–CBA7209) were isolated from nine patients diagnosed with CDI at the national university hospital in Korea, and the whole genomes of these strains were sequenced to identify their genomic characteristics. Comparative genomic analysis was performed using 51 reference strains and the nine isolated herein. Phylogenetic analysis based on 16S rRNA gene sequences confirmed that all 60 C. difficile strains belong to the genus Clostridioides, while core-genome tree indicated that they were divided into five groups, which was consistent with the results of MLST clade analysis. All strains were confirmed to have a clindamycin antibiotic resistance gene, but the other antibiotic resistance genes differ depending on the MLST clade. Interestingly, the six strains belonging to the sequence type 17 among the nine C. difficile strains isolated here exhibited unique genomic characteristics for PaLoc and CdtLoc, the two toxin gene loci identified in this study, and harbored similar antibiotic resistance genes. Conclusion In this study, we identified the specific genomic characteristics of Korean C. difficile strains, which could serve as basic information for CDI prevention and treatment in Korea. Supplementary Information The online version contains supplementary material available at 10.1186/s13099-021-00451-3.

seen in Europe and North America [4]. In the Republic of Korea, the number of CDI patients per 100,000 people in a population increased from 1.43 in 2008 to 5.06 in 2011, while CDI-associated mortality increased from 0.14 to 0.35 [5]. These numbers are increasing yearly, especially in patients over 65 years old [6]. CDI is an opportunistic infectious disease caused by C. difficile, which grows and secretes toxins in the intestinal tract of the patient, resulting in a variety of symptoms including diarrhea and pseudomembranous colitis; it can be life-threatening [7]. CDI is treated with antibiotics such as cephalosporin, clindamycin, quinolone, metronidazole, and vancomycin [8,9]. However, in some cases, diarrhea may recur or death may occur because these antibiotics can trigger infection [10].
C. difficile is a gram-positive, spore-forming, anaerobic, intestinal bacterium [2]. Recently, 16S rRNA gene sequence analysis of the Clostridium genus indicated that the similarity between C. difficile and Clostridium hiranonis is less than 97%; this has led to the reclassification of Clostridium difficile as Clostridioides difficile [11]. Secretion of toxin A (enterotoxin) and toxin B (cytotoxin) by C. difficile is mainly responsible for the intestinal inflammation resulting from CDI [12]. These two toxins inactivate host cell GTP-binding proteins and destroy the cytoskeleton, inducing apoptosis, severe inflammation, and intestinal cell damage [13][14][15]. Moreover, some hypervirulent strains (e.g. NAP1/ ribotype 027) synthesize the actin-ADP-ribosylating toxin known as binary toxin C. difficile transferase (CDT), which leads to the depolymerization of actin filaments, disrupting the actin cytoskeleton in the cytosol [16][17][18][19]. Therefore, the toxins A, B, and CDT can cause severe CDI symptoms [20].
The first complete genome sequence of C. difficile reported was strain 630 [21]. Subsequently, the genomic information of a variety of C. difficile strains has been reported and deposited (https:// www. ncbi. nlm. nih. gov/ genome/ genom es/ 535). In addition to uncovering the genetic and evolutionary diversity of C. difficile strains [22,23], virulence factors of these strains, such as toxins, antibiotic resistance, mobility, and adhesion have been also investigated through a comparative genomic analysis [24]. Although there have been many studies on C. difficile strains isolated from various patients with CDI worldwide, there have been few genomic studies conducted on C. difficile strains in the Republic of Korea. Thus, this study aimed to investigate the unique genomic characteristics of nine C. difficile strains isolated from South Korean patients through a comparative genomic analysis with previously characterized strains.

Ethical statement and sample collection
Stool samples were collected from nine patients diagnosed with CDI who visited the Department of infectious disease, Chonnam national university hospital in Gwangju or Hwasun, Republic of Korea. The study protocol was approved by the institutional review boards of the Republic of Korea centers for disease control and Prevention [IRB file no. CNUH-2017-161 and CNUHH-2017-076]. Written informed consent was obtained from all participants. The characteristics of all patients who agreed to fecal sampling after confirmation of CDI and the list of strains isolated from each fecal sample, as well as the prescribed antibiotics for each patient before fecal sampling, are summarized in Table 1.

Culture conditions and identification of C. difficile isolated from CDI participants
Collected stool samples were treated with chloroform for efficient isolation of C. difficile [25] as chloroform selectively isolates C. difficile by removing non-spore forming bacteria in the stool samples. Chloroform (60 µL; concentration, 3%) was added to filtered PBS (1740 µL), after which 200 µL of the stool sample was added. The mixed samples were suspended in a shaking incubator for 1 h at 37 °C, after which chloroform was evaporated with N 2 gas (Automated gas distribution workstation; Raontech, Gwangju, Republic of Korea), followed by culturing in Cycloserine-Cefoxitin Fructose Agar (CCFA) medium, which is an enriched selective and differential medium for the isolation and presumptive identification of C. difficile. CCFA medium consists of 40.0 g proteose peptone, 5.0 g sodium phosphate dibasic, 1.0 g potassium phosphate monobasic, 2.0 g sodium chloride, 6.0 g fructose, 15.0 g agar, 9.0 mg neutral red, 500.0 mg cycloserine (10.0% solution), and 15.6 mg cefoxitin (1.56% solution). Cycloserine inhibits the growth of Gram-negative bacteria, while cefoxitin inhibits the growth of both Gram-positive and -negative organisms [26]. C. difficile can be resistant to cefoxitin, and CCFA with cefoxitin is an initial formulation that can be used to isolate C. difficile strains [26][27][28]. The samples were cultured under anaerobic conditions in an anaerobic chamber (BAC-TRON anaerobic chamber; Sheldon Manufacturing, Inc., Cornelius, OR) containing an atmosphere of 90% N 2 , 5% H 2 , and 5% CO 2 , at 37 °C. After incubation for more than 48 h, single colonies were obtained and transferred at least three times until considered pure. The 16S rRNA gene of the pure cultures was amplified using colony PCR with the bacterial universal primers 27F (5ʹ-GTT TGA TCC TGG CTCAG-3ʹ) and 1492R (5ʹ-TAC GGY TAC CTT GTT ACG ACTT-3ʹ) [29]; identification of the cultures was performed based on the 16S rRNA gene sequences obtained from Sanger sequencing, with the EzBioCloud Database [30]. For comparative genomic analysis, nine C. difficile strains (designated CBA7201-CBA7209) were selected from each patient with CDI.

Genomic DNA extraction and whole-genome sequencing analysis
For genome sequencing of the selected nine C. difficile strains, cells were cultivated to the stationary phase in brain heart infusion (BD Biosciences, Franklin Lakes, NJ) broth medium at 37 °C and harvested by centrifugation. Genomic DNA was extracted and purified using MagAttract HMW DNA Kit (Qiagen, Hilden, Germany) and MG Genomic DNA Purification Kit (MGmed, Seoul, Republic of Korea), followed by quantification with Pico-Green (Invitrogen, Carlsbad, CA). The genomes were then sequenced with the PacBio RS II System using single-molecule real-time (SMRT) sequencing technology based on a 20 kb library (Pacific Biosciences, Menlo Park, CA). Assembly was performed using the hierarchical genome assembly process 2 protocol in PacBio SMRT analysis v2.3.0 [31]. The whole-genome sequences of strains CBA7201-CBA7209 were deposited in GenBank (accession numbers QKRF00000000, QLNX00000000, QKRE00000000, CP029566, QLNY00000000, QLNZ00000000, QLOA00000000, QKRD00000000, and QLOB00000000, respectively) and automatically annotated by the NCBI prokaryotic genome annotation pipeline [32]. A total of 60 C. difficile strains including the nine strains isolated herein and 51 strains from the NCBI GenBank, were used for comparative genomic analysis.

Phylogenetic analyses of C. difficile genomes based on 16S rRNA gene and whole genome sequences
A phylogenetic tree based on the 16S rRNA gene sequences was constructed to infer the phylogenetic relationships among the 60 strains. The 16S rRNA gene sequences were aligned using the fast secondary-structure aware infernal aligner in the ribosomal database project [34]. For pan-genome and core-genome analysis between the 60 strains, the bacterial pan-genome analysis pipeline ver. 1.3 was used [35]. The core-genome of all C. difficile strains was extracted through all-againstall comparisons using the USEARCH (ver. 9.0) with a 50% sequence identity cut-off and their concatenated nucleotide sequences were aligned using the MAFFT program (ver. 7.407) available in the Roary pipeline [36]. The phylogenetic trees based on the aligned 16S rRNA gene sequences and the concatenated common gene sequences were constructed using the neighbor-joining (NJ) algorithm in the MEGA7 software [37]. Average nucleotide identity (ANI) and in silico DNA-DNA hybridization (DDH) analysis were used to assess the relatedness among the 60 C. difficile strains and two reference strains (Clostridioides mangenotii DSM 1289 T and Clostridium hiranonis TO-931). The pair-wise ANI value among the genomes was calculated using a stand-alone OrthoANI software [38]. Pair-wise in silico DDH was calculated using the genome-to-genome distance calculator    [39]. In silico DDH values among the C. difficile strains were calculated and visualized using the GENE-E software (https:// softw are. broad insti tute. org/ GENE-E/).

Functional and pathogen-associated gene analysis of C. difficile strains
The amino acid sequences of 60 C. difficile strains were analyzed using GhostKOALA based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to obtain predicted protein annotation information [40]. The resulting KEGG Orthology (KO) numbers were summarized and visualized on the KEGG pathway using iPath2.0 [41]. Flagella assembly, pathogenicity locus (PaLoc) (tcdRBEAC), and binary toxin (cdtAB) genes in C. difficile strains were confirmed through BLASTP analyses using the reference protein sequences available in closely related C. difficile strains. Antibiotic resistance genes were identified using the Comprehensive Antibiotic Resistance Database (CARD) [42]. Nucleotide sequence similarity was calculated using EMBOSS Water, the pairwise sequence alignment tool provided by EMBL-EBI (https:// www. ebi. ac. uk/ Tools/ psa/), against the nucleotide sequence of C. difficile 630 strain [43].

Quality assurance
Before genomic DNA extraction, the single colonies of each of strain CBA7201-CBA7209 were transferred three times in CCFA medium to obtain pure single colony. After obtaining the whole genome sequence of strain CBA7201-CBA7209, the sequences of the 16S rRNA genes, extracted using RNAmmer 1.21 server, were confirmed using the EzBioCloud database.

Isolation and phylogenetic relatedness of C. difficile strains
A total of nine C. difficile strains (CBA7201-CBA7209) were isolated and selected for genomic analysis, as considering different isolation source and 16S rRNA gene sequence homology (Table1). After incubation for 24 h under anaerobic conditions on CCFA medium at 37 °C, the colony morphology of C. difficile strains appeared white or grayish-white and had an irregular radial shape. Although C. difficile is an anaerobic bacterium, strains CBA7201-CBA7209 were viable when exposed to aerobic conditions for 24 h. C. difficile can resist environmental stressors, such as exposure to oxygen, through spore formation. This stress-resistant feature may aid the spread of C. difficile in various environments [44,45].
To assess the phylogenetic relationship between the C. difficile strains, a phylogenetic tree based on 16S rRNA gene sequences was constructed using the nine C. difficile isolates (CBA7201-CBA7209), 51 strains from GenBank, and two other closely related species, C. mangenotii DSM 1289 T and Clostridium hiranonis TO-931 (Fig. 1). For strains CBA7201-CBA7209, none were sufficiently different to be classified as strains from other species, as all 60 strains of C. difficile tested were grouped into one lineage that was distinct from the two outgroup strains [46,47]. The 60 strains showed 99.9% 16S rRNA gene sequence similarity with the type strain C. difficile DSM 1296 T , and were thus, classified as C. difficile. However, the C. difficile strains and Clostridium hiranonis TO-931 were distinct, supporting the reclassification of C. difficile under the genus Clostridioides [11]. C. difficile was first classified as Clostridium because its characteristics (anaerobic, Gram-positive, and spore-forming) were similar to those of other Clostridium species. However, further studies using molecular methods indicated a diversity of organisms in the genus Clostridium, and 16S rRNA phylogenetic analysis confirmed that C. difficile had less than 97% similarity with other species from the genus Clostridium. Currently, the genus Clostridioides includes two species, C. difficile and C. mangenotii [11,48].
Phylogenetic analysis based on the 16S rRNA gene, a molecular ecological marker, showed no differences among the 60 C. difficile strains, suggesting a limitation to the information this marker gene can provide [49]. To overcome this, comparative genome analysis with their entire genomes is conducted to identify the characteristics of various C. difficile strains [50,51].

General features of the C. difficile genomes
The complete genomes of the nine C. difficile strains isolated in this study were obtained by performing wholegenome sequencing using the PacBio RS II System. The C. difficile CBA7201-CBA7209 genomes and 51 additional C. difficile genomes available in GenBank were compared and their general characteristics described in Table 2. The average genome size and gene numbers were 4.18 ± 0.1 Mb and 3927 ± 131, respectively. The genome of C. difficile CBA7204 was the smallest (4.04 Mb), whereas that of C. difficile CD161, which was isolated in China and is known as a hypervirulent strain, was the largest (4.47 Mb) [52]. The G + C content of the C. difficile genomes ranged from 28.5% to 29.2%. The MLST scheme for C. difficile is based on the following seven highly conserved housekeeping genes: adenylate kinase (adk), ATP synthase subunit alpha (atpA), 1-deoxy-D-xylulose 5-phosphate reductoisomerase (dxr), serine hydroxymethyltransferase (glyA), recombinase A (recA), superoxide dismutase (sodA), and triosephosphate isomerase (tpi). STs are determined according to a combination of these seven housekeeping genes and are classified into five MLST clades (clade 1-5) [53]. To investigate genomic diversity, MLST clades and STs of the 60 strains of C. difficile were assigned using PubMLST and are listed in Table 2. A number of C. difficile strains, including CBA7201-CBA7209 and 19 reference strains, were assigned to MLST clade 1, which is consistent with the most frequently identified C. difficile strains worldwide [54]. Here, the MLST clade 1 belonging to the 28 strains, included 15 kinds of STs (ST2, ST3, ST4, ST8, ST15, ST17, ST21, ST24, ST42, ST48, ST54, ST83, ST103, ST110, and ST203) and had the most types of STs, making it the most diverse in terms of PCR ribotype (RT), which consists of a combination of STs and toxic genes encoding toxins A, B, and CDT. Among them, ST17 (C. difficile CBA7201, CBA7202, CBA7203, CBA7205, CBA7207, CBA7209) is associated with RT018 [55] which is the most prominent ribotype reported in hospitals in the Republic of Korea [56] and Japan [57][58][59]. RT018 is highly contagious and has been found to account for more than 95% CDI relapse cases. A study also revealed that patients with the RT018 were older than those with other RTs, and there was an association between the infectious RT018 and age [60]. Here, the most identified ST was ST1, which was identified in 14 out of the 60 strains and belongs to MLST clade 2; ST1 has been reported to be associated with an increased mortality rate of communicable diseases in North America and Europe [61][62][63].

Phylogenetic relatedness of C. difficile strains based on pan-and core-genome analysis
Pan-genome analysis is a useful tool for effectively analyzing and expressing the genomic characteristics of bacteria. Through this analysis, we found 5,814 genes in pan-genome and 1,660 genes in core-genome across the 60 strains (Additional file 1: Figure S1), with the number of total unique genes being 643. The curve analysis based on the Heaps' law regression model showed that the pan-genome was open (B pan = 0.14), indicating that more sequenced strains are needed to capture the complete gene complement [64]. The number of accessory and unique genes in the 60 strains are listed in Additional file 2: Table S1. OrthoANI values showed the pairwise relatedness of the nine C. difficile strains isolated in this study with the reference strains (Additional file 2: Table S2). This result suggests that the nine Korean strains are not part of a common clone. Moreover, the unique genes and different OrthoANI values reflect the existence of evolutionary differences and ecological niches for each strain that help it adapt to varying environmental stressors. To further investigate the phylogenic relationships among the C. difficile strains, we constructed a phylogenetic tree based on the amino acid sequences of 1,660 core genes. Despite being the same species, the core gene-based phylogenetic tree indicated that the strains were divided into five groups (Fig. 2). Unlike the 16S rRNA-based phylogenetic tree, the difference between the strains in the core gene-based phylogenetic tree was more clearly distinguishable. This classification is consistent with the hierarchical classification based on the in silico DDH (Additional file 1: Fig. S2), as well as clustering based on the MLST clade ( Fig. 2; vertical bar on the right side). These results indicate that MLST clade-based classification using the seven conserved housekeeping genes is a very efficient method for distinguishing C. difficile strains. According to the core gene-based phylogenetic tree analysis, a 28-genomes group containing C. difficile strains CBA7201-CBA7209, and strain 630 was consistent with MLST clade 1. Among them, C. difficile CBA7201, CBA7202, CBA7203, CBA7205, CBA7207, and CBA7209 formed a distinct group, indicating similar genomic features. The other groups were clustered into MLST clades 2, 3, 4, and 5. Strains belonging to clade 5 in the core gene phylogenetic tree were significantly divergent from the other clade 1-4 strains, indicating that clade 5 strains, which were all ST11 (Table 2), may have undergone different evolutionary processes [50,65].

Functional category analysis of C. difficile genome
To identify the general metabolic diversity, functional features, and virulence factors of the C. difficile strains, KEGG analysis was carried out using the core and accessory genes (Fig. 3). All core and accessory genes were most frequently classified under amino acid metabolism, carbohydrate metabolism, and membrane transport. The terms amino acid metabolism and carbohydrate metabolism were more abundant in the core genes than in the accessory genes, which is a common feature of C. difficile. In contrast, membrane transport (phosphor transferase system, ABC transporter, and bacterial secretion system) was relatively abundant in the accessory genes, indicating that the bacteria can absorb or secrete various substances depending on the strain. Among the genes assigned to the human disease category, drug resistance genes were relatively abundant in the accessory genes, suggesting that C. difficile strains have been exposed to various antibiotics or antibiotic-resistant genes and that their antibiotic resistance was obtained differently depending on the strains [66].
Using CARD, antibiotic resistance-related genes of the 60 C. difficile strains are summarized in Table 3 and Additional file 2: Table S3. The 60 C. difficile strains contained at least one resistance gene among the 20 Antibiotic Resistance Ontology (ARO) terms; this gene is associated with resistance to erythromycin and clindamycin, which belong to the macrolide and lincosamide class of antibiotics, respectively (ARO term: C. difficile 23S rRNA with mutation conferring resistance to erythromycin and clindamycin). Most cases of resistance to these antibiotics can be associated with alterations in nucleotides of 23S rRNA, a component of the large ribosomal subunit [67]. The resistance of C. difficile strains to erythromycin and clindamycin has been confirmed in previous CDI-related studies [27]. Moreover, clindamycin has been reported to pose a risk as it promotes CDI; thus, care must be taken when prescribing it [66].
Interestingly, C. difficile has different antibiotic resistance genes depending on the MLST clade. Most strains in MLST clade 1-3 possessed resistance genes against vancomycin (vanR G , vanXY G ), a class of glycopeptide antibiotics. The vanR G is a vanR variant and vanXY G is a variant of vanXY found in the vanG gene cluster. Resistance of enterococci to glycopeptides was reported first [68], after which nine genotypes associated with this resistance were identified. The vanG is one of the nine genotypes (vanA, vanB, vanC, vanD, vanE, vanG, vanL, vanM, and vanN) that are involved in glycopeptide antibiotics resistance. It has been reported that similar vanG gene clusters also exist in C. difficile. The vanG phenotype is known to correspond to low-level resistance to vancomycin, which results from the acquisition of two vanG operons, vanG1 and vanG2. The vanR G gene is one of three regulatory genes of vanG1, while the vanXY G gene is one of five effector genes of vanG1 [69]. However, the similarity of the vanR G and vanXY G genes was 77.45%-77.87% and 58.82%-59.22% in CARD, respectively (Additional file 2: Table S3), suggesting that glycopeptide resistance by these two genes is not expected to function properly. Some of the strains belonging to the MLST clade 4 and 5 did not possess resistance genes of glycopeptide antibiotics, but had other resistance genes [ARO term: AAC(6')-Ie-APH(2'')-Ia, aad(6), ANT(6)-Ia, ANT(6)-Ib, APH(3')-IIIa, catI, SAT-4, tet (40), tet (44), tet(W/N/W), tetB(P), tetM, tetO)] specific for antibiotics that inhibit protein synthesis by 30S ribosomal subunits, such as aminoglycoside and tetracycline antibiotics [70,71]. These findings indicate that different antibiotics can be prescribed depending on the MLST clade of the strain causing the infection. Compared with other strains, the nine C. difficile strains isolated from Korean patients with CDI possessed a higher number of antibiotic-resistance genes. In some cases, up to 11 antibiotics were prescribed for a patient, Gwangju02 (Table 1), and C. difficile strains with the genes resistant to the prescribed antibiotics were isolated from a total of seven patients. Therefore, more The cell motility category genes were relatively abundant among the accessory genes, indicating that genes associated with cell motility are different depending on the C. difficile strain. Most genes assigned to the flagellar assembly in the cell motility category were classified into the core, accessory genes (Additional file 1: Figure  S3). The core, soft-core and accessory genomes refer to the set of genes for all 60 strains, 57-59 strains and the remaining 2-56 strains, respectively. There was no softcore genome identified in the data set. In the flagellar assembly pathway, the genes encoding flagellar motor rotation proteins (MotA and MotB), flagellin filament structural proteins (FliC), flagellar cap proteins (FliD), flagellar hook-associated proteins (FlgL and FlgK), and flagellar secretion chaperone proteins (FliS), were found in the core genome of C. difficile strains. On the other hand, genes encoding flagellar hook protein (FlgE), flagellar basal-body rod modification protein (FlgD), flagellar basal-body rod protein (FlgB, FlaC, and FlaG), flagellar hook-basal body complex protein (FliE), flagellar M-ring protein (FliF), flagellar motor switch protein (FliG, FliM and FliN), flagella biosynthesis protein (FlhA and FlhB), and flagella assembly protein (FliH) and flagellar biosynthetic protein (FliQ), were found in the accessory genome of C. difficile strains. These results indicate that these genes are well conserved among strains and that flagellar construction as well as attachment and invasion of intestinal epithelial cells, are essential for C. difficile infection [72,73]. Given that flagella motility can affect adhesion and colonization of intestinal epithelial cells, C. difficile flagella contribute to pathogenicity and result in mucosal damage and inflammatory responses in the host [74].

PaLoc and CDT locus (CdtLoc) of C. difficile
Toxins A and B are encoded by the tcdA (enterotoxin) and tcdB (cytotoxin) genes located on a chromosomal region called the PaLoc (19.6 kb) [75]. The PaLoc structure consists of the tcdA and tcdB genes sandwiched between the tcdR (positive regulator) and tcdC (negative regulator) genes, with the tcdE gene (toxin secretion) located between the two toxin genes (Fig. 4). Among the 60 strains, all possessed these toxin genes except for the seven non-toxigenic strains (CBA7204, DSM 29688, DSM 28666, DSM 29637, DSM 28670, DSM 28669, and DSM 29629). However, the sequence similarity and   structural differences were strain-dependent. All Korean C. difficile strains except for strain CBA7204 were found to possess the structure including all of the genes of PaLoc region with two hypothetical protein-coding genes between tcdE and tcdA genes, as shown in Fig. 4a; 30 strains including 16 strains belonging to MLST clade 1 and 14 strains belonging to clade 2, had the same PaLoc region structure. The remaining strains of clade 1 including strain 630, had only one hypothetical protein-coding gene (Fig. 4b). The insertion of the mobile genetic elements (MGEs) Tn6218, which contains an macrolide, lincosamide and streptogramin-associated antibiotic resistance gene [51,66] between the truncated tcdE and tcdA genes, is a common characteristic of clade 3 C. difficile strains (Fig. 4c) [76]. Some clade 4 strains face issues with enterotoxin expression due to truncated tcdA genes (Fig. 4d) [1]. Meanwhile, clade 5 C. difficile strains have a truncated tcdC gene, indicating difficulties in suppressing toxin production; thus these strains may become hypervirulent C. difficile strains (Fig. 4e) [24]. Four strains including strain CBA7204 belonging to clade 1 and three strains belonging to clade 4 were identified not to have the PaLoc genes (Fig. 4f ). Sequence similarities of PaLoc genes between strain 630 that is the most used reference strain in C. difficile genomic analysis and other strains, were compared [21]. Compared with C. difficile 630, strains CBA7201, CBA7202, CBA7203, CBA7205, CBA7207, and CBA7209 exhibited similar PaLoc sequence similarities and formed a unique group (Fig. 5). Among the strains belonging to clade 1, except for the non-toxigenic strains and strain QCD-63q42, each PaLoc gene showed a similarity of 99.3% or more. Strain QCD-63q42 had 84.7% similarity of tcdA gene with strain 630. As shown in Fig. 4, the strains in clade 2 had the same gene size as the strains in clade 1, but the similarity values of tcdB (93.5%), tcdA (98.5%), and tcdC (95.7%) genes were slightly lower than those in clade 1. In the case of the strains belonging to clade 3, the sequence similarities of tcdR (97.7-97.8%), tcdB (98.6-98.7%), tcdA (98.5-98.8%), and tcdC (90.7-95.7%) genes were found. In the case of tcdE gene of the strains in clade 3, the gene size was 424 bp (Fig. 4c), which was Fig. 4 Schematic representation of the PaLoc region and lateral genes. The PaLoc consisting of the tcdR, tcdB, tcdE, tcdA, and tcdC genes in this order. a-f PaLoc with two hypothetical protein-coding genes between the tcdE and tcdA genes (a); one hypothetical protein-coding gene between tcdE and tcdA genes (b); one or two hypothetical protein-coding genes and the mobile genetic elements (MGEs) Tn6218 between tcdE and tcdA gene (c); five truncated tcdA genes (d); two truncated tcdC genes (e); and non-toxigenic strains (f). Strains CBA7201-CBA7209 isolated from Korean patients with CDI are highlighted in bold shorter than the gene (501 bp) of strain 630, but the similarity of the gene was 98.8%, which was relatively high compared to the rest of the PaLoc genes. In the case of clade 4, 3 of the 7 strains were non-toxigenic (Fig. 4f ) and the remaining strains showed low similarity value of tcdR (97.8%) and tcdB (94.9%) genes. The truncated tcdA gene of clade 4 had a similarity value of 88.6-89.6% (Fig. 4d).
In the case of clade 5, it was confirmed that the truncated tcdC gene had a particularly low similarity with a value of 82.4% (Fig. 4e).
The actin-ADP-ribosylating toxin is located on a chromosomal region CdtLoc. C. difficile strains carrying the CDT gene may exhibit stronger virulence due to interaction with the existing toxins A and B [16][17][18][19]. The structure of the CdtLoc region consists of a binary toxin regulatory gene (cdtR) and binary toxin genes (cdtA and cdtB). These three genes differ in sequence similarity and structure depending on the C. difficile strains (Fig. 6). Some clade 1 strains and all clade 4 strains were not found to possess a CdtLoc region (Fig. 6a), while other clade 1 strains may show difficulty in producing the toxin due to gene truncation (Fig. 6b-f ); for instance, strain DSM 27639 has an inserted gene (approximately 30 kb) between the cdtA and cdtB genes (Fig. 6d) [77]. However, all strains belonging to clade 2, 3, and 5 possessed an intact CdtLoc region (6.2 kb), indicating the normal expression of toxin-producing genes (Fig. 6g).

Common features of the six C. difficile strains isolated from Korean patients with CDI
Finally, we identified several common features among C. difficile CBA7201, CBA7202, CBA7203, CBA7205, CBA7207, and CBA7209 isolated from Korean patients with CDI. These strains were not single clones, as evidence by differences between the strains confirmed via phylogenetic analysis; core, accessory, and unique gene analysis; and OrthoANI values ( Fig. 3 and Additional file 2: Table S1 and S2). Nevertheless, they share several common features. All the six strains belong to clade 1 and ST17 (Table 2) and possess similar antibiotic-resistance genes (Table 3). In addition, they exhibited similar toxin gene expression in terms of PaLoc and CdtLoc structure (Figs. 4, 5, 6). Interestingly, the cephalosporins such as Zenocef, Cetrazol, Pacetin, and Cefazolin, were commonly prescribed to the six patients with CDI, from whom these six strains were isolated (Table 1). However, the cephalosporin resistance gene was not detected in all six strains isolated from cephalosporin-containing medium ( Table 3). In a previous case study, the cephalosporins presented a risk factor to patients with CDI, and the decrease in cephalosporin prescription rate was related to a decrease in diarrhea cases associated with C. difficile [9,[78][79][80][81]. Therefore, further studies are needed to elucidate the association Fig. 6 Schematic representation of the CDT region and lateral genes. Gene organization of the region without the cdt gene a and of binary toxin-negative strains b-f and binary toxin gene-positive strains (g). Strains CBA7201-CBA7209 isolated from Korean patients with CDI are highlighted in bold among antibiotics, C. difficile strains, and patients with CDI.

Conclusion
In this study, we investigated the genomic, phylogenetic, functional, and pathogenic features of nine C. difficile strains isolated from Korean patients and performed a comparative genomic analysis with other strains isolated from various countries. Along with the identified genomic features of Korean C. difficile isolates, accumulation of more whole-genome sequence information of diverse C. difficile strains could serve as basic information for CDI prevention and treatment in Korea.