Genome sequences of the Shiga-like toxin-producing Escherichia coli NCCP15655 and NCCP15656

Background Virulence genes can spread among commensal bacteria through horizontal gene transfer. The bacterium with novel virulence factors may pose a severe threat to public health because of the absence of a management system unlike known pathogens. Especially, when a pathogenic bacterium acquires a new kind of virulence genes, it tends to exhibit stronger virulence. In this study, we analyzed the genomes of the two strains of Escherichia coli that were isolated from the feces of patients with diarrhea and produce Shiga-like toxin. Results Phylogenetic analysis of conserved genes and average nucleotide identity values of the draft genome sequences indicate that strains NCCP15655 and NCCP15656, isolated from diarrhea patients, belong to the B1 group of E. coli and form a sister clade with strain E24377A. However, the proportion the genes belonging to the subsystem category “phages, prophages, transposable elements, plasmids” and “virulence, disease and defense” are higher than E24377A. Indeed, in their genomes, genes encoding Shiga toxin type 1, Shiga toxin type 2, and type 1 fimbriae were detected. Moreover, a plasmid encoding hemolysin and entropathogenic E. coli secreted protein C was identified in both genomes. Conclusions Through the genome analysis of NCCP15655 and NCCP15656, we identified two types of Shiga-like toxin genes that could be responsible for the manifestation of the diarrhea symptom. However, the LEE island, which is one of the major virulence factors of enterohemorrhagic E. coli, was not detected and they are most similar with non-Shiga-like toxin-producing E. coli at the genomic level. NCCP15655 and NCCP15656 will be good examples of Shiga-like toxin-producing E. coli whose genomes are not as similar with typical enterohemorrhagic E. coli as non-Shiga-like toxin-producing E. coli.


Background
Shiga-like toxin-producing Escherichia coli (STEC), also called verotoxin-producing E. coli, is a major pathogenic group of E. coli that causes bloody diarrhea and hemolytic uremic syndrome (HUS) and enterohemorrhagic E. coli (EHEC) is one of such STEC [1]. Gene(s) encoding the Shiga-like toxin (Stx) are carried by a lambdoid phage and the most frequently isolated serotypes of Shiga-like toxin-producing EHEC are O157, O104, O26, O111, and O145 [1][2][3]. E. coli is a common member of the normal flora in the large intestine, but sometimes they acquire pathogenic genes from other bacteria or bacteriophages. Indeed, there are several cases in which non-pathogenic strains or unknown serotypes of STEC cause diseases with symptoms similar to those of the STEC strains [4][5][6]. The causative organism of the 2011 German outbreak, which is the largest STEC outbreak [7,8], is E. coli O104:H4 that is enteroaggregative E. coli (EAEC) harboring the Stx prophage [9]. The major virulence feature of EHEC is the Shiga-like toxin, which is an exotoxin that causes cellular toxicity. Another feature of EHEC is intimin, which is an outermembrane adhesion protein encoded by the locus for enterocyte effacement (LEE) island [10]. The major virulence factor of EAEC is aggregative adhesion fimbriae, which mediate bacterial adherence and make 'stacked brick wall' structure on the host cells [11]. This EHEC/ EAEC hybrid strain also acquired plasmid-encoded antibiotic resistance genes and exhibited strong virulence [12]. In South Korea in 2002 and 2006, there were two case reports that the serotype O8 and O104:H4 E. coli strains caused HUS in a 16 year-old man [13] and a 29 year old woman [14], respectively. Moreover, in 2012, we reported the genome sequence and analysis results of the virulence genes of EHEC strains isolated from Korea [2,3,15]. To reveal the genomic features of STEC in Korea, we sequenced a dozen of E. coli strains from diarrhea patients in Korea from 2001 to 2011. Among them, two strains of Shiga-like toxin-producing E. coli belonging to same group were selected for genome analysis. In this study, we reported the genomes of two E. coli strains, named as NCCP15655 and NCCP15656, which had been isolated from the feces of a female patient and a male patient with diarrhea in South Korea in 2003. In the strains, the gene encoding Shiga-like toxin was detected, but serotypes were not determined by experiment. Through the genome analysis of these two isolates, we report a case of pathogenic E. coli strains with two types of Shiga-like toxin genes in a single genome whose structure is most similar to non-EHEC strains.

Bacteria and DNA isolation
In 2003, two E. coli strains were isolated from stool samples of a female patient and a male patient with symptom of diarrhea in Korea. To test the presence of the Shiga-like toxin genes (stx1 and stx2), the two strains were subject to PCR with the primers specific to stx1 (F′-CGTACGGGGATGCAGATAAATCGC and R′-CAG TCATTACATAAGAACGCCCAC) and stx2 (F′-GTTC TGCGTTTTGTCACTGTCAC and R′-GTCGCCAGTTA TCTGACATTCTGG). These two strains were deposited at the National Culture Collection for Pathogens in Korea National Institute of Health (KNIH) and their accession numbers are NCCP15655 (from a female patient) and NCCP15656 (from a male patient). Genomic DNA was extracted using chemical and enzymatic methods as described in Molecular Cloning, A Laboratory Manual [16].
(See figure on previous page.) Figure 1 Phylogenetic relationship among genome-sequenced E. coli and Shigella strains. The phylogenetic tree was generated by PhyML with amino-acid sequences of 1,273 core genes from completely sequenced E. coli and Shigella strains. Each color indicates the phylogenetic group of E. coli (red, A; yellow, B1; black, Shigella; blue, E; purple, D; green, B2). Bootstrap values (percentages of 1,000 replications) greater than 50% are shown at each node. Escherichia fergusonii ATCC 35469 were used for the out-group. The scale bar represents 0.001 nucleotide substitutions per site. high-quality reads with 235-fold coverage for NCCP15656 were generated from 500-bp paired-end libraries. Sequence trimming and de novo assembly were performed using CLC Genomics Workbench version 5.1 (CLC bio, Inc.) and scaffolding was carried out with SSPACE [17]. Automatic gap filling was performed using IMAGE [18] and manual gap filling was performed using CLC Genomics Workbench. Structural gene prediction was performed using Glimmer 3 [19] and functional annotation was performed using blastp against MicroScope database [20] of E. coli and Shigella species. We then employed automatic annotation using the RAST server [21] and compared it with the annotation result from MicroScope database for more accurate functional assignment. We also performed additional blastp against the subsystem database of the RAST server for the gene categorization.

Gene clustering and phylogenetic tree construction
Core gene set of 71 genomes (60 E. coli strains, 10 Shigella strains, and 1 Escherichia fergusonii) was identified using OrthoMCL (version 2.0.3) [22] with parameters of e-value ≤ 1E-5, identity ≥ 85%, and coverage ≥ 80% [23]. Duplicated genes were excluded from the core gene set. 1,273 core genes were used for the phylogenetic tree construction. Amino-acid sequences of each core gene were aligned with MUSCLE (version 3.6) [24] and converted to phylip format after concatenation of all core genes. A maximum likelihood tree was constructed using PhyML (version 2.4.5) [25] with JTT evolutionary model [26].

Quality assurance
Genome sequencing was conducted using a single bacterial isolate and contamination possibility was checked using CLC Genomics Workbench in the step of de novo assembly, mapping reads to contigs and generation of detailed mapping report. The contamination of other genomes can be checked through confirmation of coverage level distribution in a detailed mapping report as well as inspection of the alignment result with accurate paired distance.

Initial findings
Genome structure The draft genome of Escherichia coli NCCP15655 and NCCP15656 consist of five contigs and 15 contigs, respectively. The sum of five contigs of NCCP15655 is 4,965,708 bp (50.86% G + C content) and 4,970 coding sequences (CDSs), seven ribosomal RNA operons and 97 tRNAs were predicted. The sum of 15 contigs of NCCP15656 is 4,925,312 bp (50.93% G + C content) and 4.919 CDSs, seven ribosomal RNA operons and 92 tRNAs were detected. NCCP15655 and NCCP15656 have two CRISPRs in each that consist of direct repeat sequences and seven spacer sequences. The spacers 5 and 6 in CRISPR 1 and spacer 7 in CRISPR 2 had no homology with sequences in the GenBank database.

Phylogenetic relationship and comparison with closely related strains
A phylogenomic tree was constructed using 1,273 core genes of NCCP15655, NCCP15656, and the completely sequenced strains in Escherichia/Shigella group. The tree showed that NCCP15655 and NCCP15656 belong to the group B1 and formed a sister clade with strain E24377A, which is an enterotoxigenic E. coli (ETEC) (Figure 1). ANIb values between strain NCCP15655/ NCCP15656 and other strains belonging to B1 group were 98.27~99.08 (Table 1). NCCP15655 and NCCP15656 are Shiga-like toxin producing E. coli but they form a sister clade with ETEC strain E24377A despite of highest similarity of ANI value with non-pathogenic strains. Thus, we compared the genomic features using subsystem classification between NCCP1565/NCCP15656 and E24377A. In spite of the high similarity of genomes and phylogenetic proximity, there are distinct differences between NCCP15655/NCCP15656 and E24377A in the proportion of subsystem-assigned genes. Subsystem classification results showed that the proportions of the genes belonging to the subsystem category "phages, prophages, transposable elements, plasmids" and "virulence, disease and defense" are higher in NCCP15655 and NCCP15656 than E24377A ( Figure 2 and Table 2). The number of genes belonging to the sub-category 'phages, prophages' and 'bacteriophage structural proteins' of "phages, prophages, transposable elements, plasmids" and subcategory 'resistance to antibiotics and toxic compounds' , 'adhesion' , and 'type III, type IV, type VI, ESAT secretion systems' of "virulence, disease and defense" are higher in NCCP15655 and NCCP15656 than E24377A. In the genome of NCCP15655 and NCCP15656, the genes belonging to sub-category 'phages, prophages' and 'bacteriophage structural proteins' include Stx phage and the genes belonging to sub-category 'type III, type IV, type VI, ESAT secretion systems' encoded conjugative plasmid-related proteins. A conjugative plasmid in NCCP15655 and NCCP15656 harbors the hlyABCD genes that encode a hemolysin. Interestingly, although the two strains have been isolated independently from different individuals, the two strains are remarkably similar. In fact, the serotype determined by the wzt and wzm gene for O-antigen and the fliC gene for H-antigen indicated that the serotype of NCCP15655 and NCCP15656 is O8:H49. Moreover, at the genomic level, two strains are highly similar and ANIb values between the strains range from 99.98 to 99.99 (Table 1). Based on these relationships, we postulate that they might share a very recent common ancestor, if not clonal.

Shiga-like toxin and virulence genes
In the NCCP15655 and NCCP15656 genomes, genes encoding Shiga toxin type 1 (Stx1) and Shiga toxin type 2 (Stx2) were detected. The Stx1 subunit A is composed of 315 amino-acids and subunit B is composed of 89 amino-acids. In the NCCP15655 genome, the stx 1 genes were detected in the region of a prophage, which have 100% amino-acid identity with the Shiga toxin of Shigella dysenteriae Sd197. The Stx2 subunit A is composed of 319 amino-acids and subunit B is composed of 89 amino-acids. The stx 2 genes were detected in another prophage region, which is located at the end of the contig. The stx 2 gene is very similar to that of E. coli strain 11128, which has stx 1 genes (Figure 3). The results from subtype analysis of the stx genes indicated that stx 1 is stx 1a and stx 2 is stx 2a in both strains. Unlike the typical EHEC strain, in the genomes of NCCP15655 and NCCP15656, the LEE island was not detected but the genes encoding type 1 fimbriae biosynthesis proteins, adhesion AidA, fimbriae-like adhesion SfmA/H, and CFA/I fimbrial minor adhesin were detected. In both strains, a gene encoding type IV pilus biosynthesis proteins, entropathogenic E. coli secreted protein C, which is a serine protease and causes epithelial damage [30], and genes encoding hemolysin were detected in the final contigs designated as plasmid and in chromosome, type 1 fimbriae operon were identified.

Future directions
The Stx phage carrying the Shiga toxin and the LEE island harboring the type III secretion system are the major features of EHEC strains [31]. The genomes of NCCP15655 and NCCP15656 encode the Shiga-like toxin, but not genes related to the LEE island. However, they acquired a plasmid encoding hemolysin and entropathogenic E. coli secreted protein C. NCCP15655 and NCCP15656 acquired the virulence genes through the horizontal gene transfer and caused the diarrhea symptom in human. In the case of E24377A, a gene encoding a heat-labile toxin, which is a major virulence factor of ETEC is located in the plasmid but not detected in NCCP15655 and NCCP15656. These mean that, in certain environment, bacterial strains can obtain virulence factors through the acquisition of a virulence gene-harboring plasmid or a phage and cause the disease. This report is yet another example for pathogenic E. coli strains that have acquired virulence genes through acquisition of plasmids and phages. These genomes will be good examples for further analysis for the study of acquisition and diffusion of virulence genes in E. coli.

Availability of supporting data
These Whole Genome Shotgun projects of NCCP15655 and NCCP15656 have been deposited at GenBank under the accession ATLW00000000 and ATLX00000000, respectively.
(See figure on previous page.) Figure 3 Clustering analysis of the subunit A of the Shiga toxin type 1 and type 2. Un-rooted trees based on the nucleotide sequences of Shiga toxin subunit A were constructed using Neighbor-joining method with Jukes-Cantor model. Bootstrap values (percentages of 1,000 replications) greater than 50% are shown at each node. The scale bar represents 0.005 nucleotide substitutions per site. Yellow, E. coli B1 group; Sky blue, E. coli E group; Black, unknown (A) Shiga toxin type 1, (B) Shiga toxin type 2.
Submit your next manuscript to BioMed Central and take full advantage of: