Genome sequence of Shigella sonnei 4303

Background Shigella spp. are Gram-negative intracellular pathogenic bacteria belonging to the family Enterobacteriaceae and can cause bacterial dysentery, a severe diarrheal disease. The pathophysiological impact of the Gram-negative bacteria is highly related to the composition and structural variability of lipopolysaccharides, the major lipoid components of the outer membrane. Out of the 114 genes involved in the lipopolysaccharide biosynthesis pathway, 47 genes are specific to Shigella spp. Changes in the specific genes can lead to loss of the O polysaccharide side chain, resulting in rough (R) type bacteria with increased sensitivity to temperature, or hydrophobic antibiotics. The formation of various different lipopolysaccharides or lipooligosaccharides has been observed previously in a mutant line showing altered biological properties, but the genetic background has not been investigated in detail. Results The parental strain of the mutant line, Shigella sonnei 4303, was subjected to whole genome sequencing to gain a better insight into the structure and biosynthesis of lipopolysaccharides. The sequencing revealed a 4,546,505 bp long genome including chromosomal and plasmid DNA, and the lipopolysaccharide biosynthesis genes were also identified. A comparison of the genome was performed with the phylogenetically closely related, wild type, well characterized, highly virulent strain, S. sonnei 53G. Conclusion Analysis of the lipopolysaccharide biosynthetic genes helped us to get more insight into the pathogenicity and virulence of the bacteria. The genome revealed high similarities with S. sonnei 53G, which can be used as a standard in characterizing the S. sonnei 4303’s R-type isogenic derivatives. Electronic supplementary material The online version of this article (10.1186/s13099-018-0274-5) contains supplementary material, which is available to authorized users.


Background
Lipopolysaccharides (LPSs) are of importance in bacterial physiology, and also in host-bacteria crosstalk [1]. The pathogenicity of Gram-negative bacteria is influenced by the molecular variability (structures and lengths) of LPSs, e.g., serum sensitivity and biofilm forming ability of Gram-negative bacteria are correlated with the lengths of O sidechains. Previous studies have described that R-type bacteria with truncated LPS molecules (so-called lipooligosaccharides-LOSs) are more sensitive to hydrophobic antibiotics [2].
Recent studies suggested that Shigella sonnei have become more dominant in developed countries [3]. The history of S. sonnei 4303 dates back more than 60 years, when the phenomenon of phase variation in S. sonnei was examined [4]. This non-pathogenic strain was formed by plasmid loss from a pathogenic S. sonnei phase I. strain, due to the instable nature of the virulence plasmid [5]. Later, intensive studies were carried out on the strain and its R-type isogenic derivatives, and the chemical structures and structural variabilities of their lipopolysaccharides and lipooligosaccharides (LOSs) have been described. Several interesting R mutants were characterized, including an absolute R-type strain (S. sonnei 4350) and a strain having truncated LPSs with a d-glycero-d-mannoheptose component incorporated in the structure (S. sonnei 4351) [6][7][8][9][10][11][12]. The lack of appropriate genome-scale information of the investigated strains, including structurally different LPSs, however, hinders our ability to answer fundamental biosynthetic questions. In order to gain more insight into the mechanism of the LPS/LOS biosynthesis S. sonnei 4303 was subjected to whole genome sequencing.

Methods
The genomic library was made by enzymatic shearing with the Ion Xpress Plus fragment library kit, followed by size selection on a 2% agarose E-Gel SizeSelect Gel (Thermo Fisher Scientific Inc., Waltham, MA USA). The template was prepared with 100 pM of the library on an Ion One Touch 2 system (Thermo Fisher Scientific Inc., Waltham, MA, USA). Samples were loaded on an Ion 316v2 Chip and sequenced on an Ion Torrent PGM sequencer, with the Ion PGM Sequencing 200 Kit v2 (Thermo Fisher Scientific Inc., Waltham, MA, USA) in compliance with the manufacturer's recommendations. De novo assembly was performed using the SPAdes v3.1 Genome Assembler software [13]. For whole-genome alignment, scaffolds in the draft assemblies were reordered to the S. sonnei 53G as reference sequence in Mauve software with default parameters [14]. Sequence annotation was performed using Prokka v. 1.9 [15]. MeDuSa (Multi-Draft based Scaffolder) web server was used for genome scaffolding [16]. The genome sequence of S. sonnei 4303 has been deposited in the GenBank under the accession number PRJNA361576.
Phylogenetic analysis was performed with the closest relatives selected by 16S rRNA sequences through NCBI (BLASTn). Phylogenetic analysis was performed by Clustal Omega with default settings [17]. Multiple sequence alignment was completed with adk, fumC, gyrB, mdh, purA housekeeping genes and rRNA genes. The resulted phylogenetic tree represents 12 S. sonnei strains including S. sonnei 4303 and an outgroupped strain, Klebsiella oxytocal FDAARGOS 355.
Nomenclature of the LPS genes were used according to KEGG database [18]. Detailed methodological strategy is described in Additional file 1.

Quality assurance
Morphological and biochemical characterization identified the strain as S. sonnei. The genomic DNA used for sequencing was isolated from a single colony of the bacteria. The 16S rDNA gene was extracted from the genome using RNAmmer 1.2 server [19]. The identity of the strain was confirmed through BLAST annotation against NCBI microbial 16S database.

Results and discussion
In total 4,262,518 high quality reads were generated and used to create the genome of S. sonnei 4303, which yielded a 100-fold coverage. The genome is 4.5 Mbps in size, and contains 4554 predicted genes, 10 rRNA genes, 60 tRNA genes as well as a CRISPR region. In our comparative studies, the genome of a well characterized, highly pathogenic and phylogenetically highly related strain, S. sonnei 53G was used as standard (Fig. 1). S. sonnei 53G was isolated in Japan [20] and was used in different serological studies [21].
Since the primary aim of this study was to create a solid and strain specific information about the genetic background with regards to LPS modifications, the genes involved in the LPS biosynthetic pathway have been further analyzed in silico. According to the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways database, 114 genes participate in these complex biological processes. Screening for the presence/absence of these genes in the S. sonnei 4303 and in the S. sonnei 53G strains revealed 47 genes specific to S. sonnei. Comparative DNA analysis on this common subset of S. sonnei genes revealed six sequence polymorphisms (summarized in Table 1).
Our former study on the LPS structure of S. sonnei 4303 indicated that the lipid A molecules contain only 1 phosphate group at position 1 [6]. Modification of lipid A with an additional phosphate group at position 1, forming a 1-diphosphate species, is mediated by the undecaprenyl phosphotransferase, LpxT. The mutation of lpxT/yeiU encoding gene may explain the monophosphorylated position 1 in S. sonnei 4303.
Taken together, the whole-genome sequencing strategy revealed the mutation of the lpxT, and the presence of new variants of the pagP, lpxP, kdsA and arnA genes. The sequenced genome can be used as a reference for characterizing R-type isogenic derivatives of S. sonnei 4303, to reveal the genetic background of mutants with the truncated lypopolysaccharides [6], e.g., having a d-glycerod-mannoheptose in the core part [7,8].

Additional file
Additional file 1. Complete methodological strategy to the "Genome sequence of Shigella sonnei 4303". Experimental design, Sampling protocol and storage, Nucleic acid isolation, Library preparation and sequencing, Read quality assessment, Comparative genomics.