Pathogen genomes contain genomic islands that are absent in non-pathogens. At least some of these islands contribute to virulence. Genomic islands may have been acquired by the common ancestor of a pathogenic lineage in which case they can serve as a marker for the lineage irrespective of their present contribution to virulence. Although some genomic islands have been described, much less is known about chromosomal EAEC virulence loci than plasmid-borne genes. Recent ordering of EAEC lineages by MLST has allowed us to conduct a within- and between-lineage search for unique DNA. The objective of this study was to identify conserved genetic loci among principal EAEC lineages. We hypothesised that EAEC strains, or subgroups of them would harbour conserved chromosomal loci and that identifying them would serve to improve the understanding of these pathogens, enhance their identification for research and clinical purposes and potentially find vaccine candidates.
Identification of factors that are common to pathogenic bacteria but absent in non-pathogens is an approach that has been shown to have promise for identifying virulence loci and candidate antimicrobial targets. For example,  used in silico methods to mine sequenced genomes for pathogen-specific factors. As there is only one completed EAEC genome, and just three others are in progress, we elected to use lower-resolution PCR-based genetic profiling to compare 22 genomes. Since a number of genomic islands contain, or are proximal to IS3 elements, we hypothesised that IS3-based profiling would identify loci that are lineage specific, and which might contribute to virulence. Using this approach, we were able to identify two diagnostic candidates, aggR and orf1600. The former is a transcriptional activator that has been characterised functionally and used to detect EAEC in epidemiological surveys [17, 19]. The second target we identified is within an island present in EHEC O157 strains (as orfz2240) and in EAEC strains (orf1600) belonging to the ECOR D lineage. Compared to in silico methods, our approach yielded few hits. However, the small size of the z2240/orf1600 island and the aggR gene mean that the loci identified by IS3 profiling could be overlooked by other approaches.
The functionally-characterised protein showing greatest similarity to the predicted product of EHEC orfz2240/EAEC orf1600 is the invasion plasmid antigen H (IpaH) of Shigella. Amino acid residues 4-60 of Z2240 (and of EAEC Orf1600) are 35.8% identical to residues 3-119 of the 532 amino-acid IpaH variant (accession number gi152747). Each Shigella strain has multiple variants of IpaH which are more similar to each other than to Z2240, and vary in length. IpaH is an E3 ubiquitin ligase and is temporally associated with Shigella pathogenicity [30–32] Z2241 is predicted to be a leucine-rich protein of unknown function. If it is expressed, the EAEC hybrid Orf1600 could represent a bifunctional protein. However, EAEC strains appear to be mucosal pathogens and therefore it is not clear if a ubiquitin ligase, which might have a role in targeting intracellular proteins to the proteosome, would contribute to pathogenicity in this pathotype. Multiple attempts to over-express EAEC orf1600 for purification (data not shown) were unsuccessful, most likely due to toxicity. This, with comparative analysis of E. coli genomes, suggests that the 042 version of the island, and orf1600 in particular, may be under negative selection.
It is not known whether any or all of the versions of this island make functional proteins but this does not preclude expression or functional data emerging from future studies. However, identification of two targets, one previously unreported, offers proof-of-principle of our method for identifying general and lineage-specific EAEC loci. Following the realisation that the EAEC category is comprised of multiple pathotypes, convenient markers for significant lineages are needed to help determine their epidemiological significance. One such lineage is ECOR phylogenetic group D EAEC, which is globally disseminated and includes prototypical EAEC strain 042 that produced diarrhoea in three of five volunteers during a human challenge experiment . The EAEC ECOR group D lineage contains strains belonging to ST31-, ST394- and ST38-complexes. ST394-complex EAEC were isolated much more frequently from Nigerian children with diarrhoea than from controls and after ST10, this complex was the most common in that population [12, 25]. All the ST394-complex isolates in the E. coli MLST database appear to be EAEC strains and therefore this ST-complex represents a common complex that is very likely EAEC-specific. ST38 was much less frequently isolated from Nigerian children but was the only complex detected more than three times that was not recovered from controls, suggesting that it may represent a truly virulent lineage . The island reported here could serve as a marker for the EAEC ECOR D lineage and combining the 2240 probe with commonly-employed diagnostic probes that detect the plasmid marked by CVD432, could help to determine the specific contribution of these EAEC pathotypes to the burden of diarrhoeal disease.