Positive selection on a bacterial oncoprotein associated with gastric cancer
© Delgado-Rosado et al; licensee BioMed Central Ltd. 2011
Received: 27 September 2011
Accepted: 11 November 2011
Published: 11 November 2011
Skip to main content
© Delgado-Rosado et al; licensee BioMed Central Ltd. 2011
Received: 27 September 2011
Accepted: 11 November 2011
Published: 11 November 2011
Helicobacter pylori is a vertically inherited gut commensal that is carcinogenic if it possesses the cag pathogenicity island (cag PaI); infection with H.pylori is the major risk factor for gastric cancer, the second leading cause of death from cancer worldwide (WHO). The cag PaI locus encodes the cagA gene, whose protein product is injected into stomach epithelial cells via a Type IV secretion system, also encoded by the cag PaI. Once there, the cagA protein binds to various cellular proteins, resulting in dysregulation of cell division and carcinogenesis. For this reason, cagA may be described as an oncoprotein. A clear understanding of the mechanism of action of cagA and its benefit to the bacteria is lacking.
Here, we reveal that the cagA gene displays strong signatures of positive selection in bacteria isolated from amerindian populations, using the Ka/Ks ratio. Weaker signatures are also detected in the gene from bacteria isolated from asian populations, using the Ka/Ks ratio and the more sensitive branches-sites model of the PAML package. When the cagA gene isolated from amerindian populations was examined in more detail it was found that the region under positive selection contains the EPIYA domains, which are known to modulate the carcinogenicity of the gene. This means that the carcinogenicity modulating region of the gene is undergoing adaptation. The results are discussed in relation to the high incidences of stomach cancer in some latin american and asian populations.
Positive selection on cagA indicates antagonistic coevolution between host and bacteria, which appears paradoxical given that cagA is detrimental to the human host upon which the bacteria depends. This suggests several non-exclusive possibilities; that gastric cancer has not been a major selective pressure on human populations, that cagA has an undetermined benefit to the human host, or that horizontal transmission of H.pylori between hosts has been more important in the evolution of H.pylori than previously recognized, reducing the selective pressure to lower the pathogenicity of the bacteria. The different patterns of adaptation of the gene in different human populations indicates that there are population specific differences in the human gut environment - due either to differences in host genetics or diet and other lifestyle features.
Helicobacter pylori is a Gram negative bacterium that lives in the human stomach as part of the normal gastric microbiome , and is generally present in the majority of the adult population . The bacterium has co-evolved with human populations  and is well adapted and largely specific to the human host. The ancestor of H.pylori was intestinal and during its evolution migrated to the stomach, facilitated by the evolution of a urease that combats the stomach's acid conditions [4, 5]. H.pylori strains may possess a cag pathogenicity island (cag PaI) that contains a cagA gene encoding a 128 kDa protein [6, 7]. The cag PaI seems to have entered the H.pylori genome by lateral gene transfer, after H.pylori differentiated from parental species [2, 8]. Many of the genes of the cag PAI are involved in translocation of the cagA protein into epithelial cells lining the stomach. However, the function of the cagA protein itself is unknown. Infection with cagA+ H.pylori is strongly associated with gastric carcinoma [9–11]; gastric carcinoma is the second leading cause of death from cancer worldwide . In addition, cagA+H.pylori is associated with chronic gastritis and peptic ulcers .
The mechanism of pathogenicity of cagA+ H.pylori is as follows. The bacteria attaches to the stomach wall and the cagA protein is injected into an epithelial cell by a bacterial Type IV secretion system, also encoded by the cag PaI locus . Once inside the cell, cagA is phosphorylated on tyrosine residues located within EPIYA domains by members of the src kinases such as c-src, Fyn, Yes , Lyn  and c-Abl . The cagA protein is membrane associated and interacts with numerous additional cellular proteins, including the oncoprotein Src homology 2 domain containing tyrosine phosphatase (SHP-2 ), microtubule affinity-regulating kinase (MARK2 ), growth factor receptor-bound protein 2 (Grb-2 ), hepatocyte growth factor receptor (c-Met ), C-terminal Src kinase (Csk ) and p38 (Crk ). Tyrosine phosphorylated cagA recruits and activates SHP-2, apparently mimicking the action of Gab1 . Consistent with the mimicry hypothesis, cagA is able to rescue Gab1 deficient Drosophila mutants , which is interesting given that cagA has no sequence similarity with Gab1, indeed it has no known homologs. The interaction with SHP-2 causes inhibition of its tumor suppressing activity . Epithelial cells that have been dysregulated adopt the elongated hummingbird phenotype . In addition, cagA activates the transcription factor NF-kB leading to the induction of interleukin 8 (IL-8) and subsequent inflammation . The activation of NF-kB occurs via SHP-2.
Variation in the EPIYA domains of cagA results in variation in the virulences of different cagA+ H.pylori strains . The EPIYA motifs are located in the C-terminal half of the cagA protein and are of types A-D. The EPIYA motifs are the major sites of tyrosine phosphorylation within the cagA protein. The eastern EPIYA-D motif, found in asian populations, is associated with stronger binding to SHP-2, while the western EPIYA-C motif is not. The presence of the EPIYA-D motif in asian cagA sequences may be responsible for the high rates of H.pylori associated disease in asian populations .
The study reported here investigates the evolutionary dynamics of the cagA gene from different human populations, and shows that the gene displays varying amounts of positive selection, implying host population genetic differences in the response to H.pylori infection, and indicating the benefit of the gene to H.pylori. The region of the cagA gene under selection contains the EPIYA domains. These observations are an apparent paradox, given the detrimental effects of the oncoprotein on the human host; various scenarios are discussed that may explain the data.
cagA sequences used in the study
H. pylori strain
Partial rRNA sequences for various Helicobacter species were obtained from Genbank; these were H.fennelliae (GenBank: AF348747), H.acinocychis (GenBank: NR_025940), H.pylori (GenBank: DQ202383), H.nemestrinae (GenBank: AF363064), H.heilmannii (GenBank: AF506794), H.cetorum (GenBank: FN565164), Helicobacter sp. ' solnick 9A1-T71' (GenBank: AF292381), H.bizzozeronii (GenBank: NR026372), H.salomonis (GenBank: NR026065) and H.felis (GenBank: NR025935). The sequences were aligned using the MAFFT program and phylogenetic relationships determined using MrBayes and a HKY model, selected using the jModelTest program. The simulation was run for 10000 generations, sampling every 100 generations. A burn-in of 25% was conducted and the consensus tree was constructed from the last 25% of the sampled generations.
The cagA gene sequences were analyzed for the presence of positive selection by likelihood ratio testing, comparing nested models, null and alternative, using the PAML program . Three tests were performed; the branches test [35, 36], sites test  and branches-sites test . An unrooted tree without branch lengths was used for the analysis, generated by the phylogenetic analysis, and the codon frequency table option was utilized in all analyses. Likelihood ratio testing was conducted to determine the signficance of 2Δl, the differences between the log likelihoods of the two models (where l is the log likelihood), using a χ2 distribution with 12 degrees of freedom for the branches model, a χ2 distribution and 2 degrees of freedom for the sites model and a χ2 distribution with 1 degree of freedom for the branches-sites model. The null model used for the branches test was a one-ratio model where Ka/Ks (ω) was the same for all branches, while the alternative model was the free-ratio model where ω was allowed to vary. The null model for the sites test was model 1a (neutral; model = 0, NSsites = 1, fix_omega = 0), and the alternative model was model 2a (selection; model = 0, NSsites = 2, fix_omega = 0). The null model for the branches-sites test was modified according to Yang et al.  (neutral; model = 2, NSsites = 2, fix_omega = 1, omega = 1). The alternative model was model A (selection; model = 2, NSsites = 2, fix_omega = 0).
2Δl was calculated as 73.6 for the branches test, which was statistically significant. Ka/Ks values of greater than 1 were observed for 5 branches (Figure 1); those leading to the Venezuela (1.56), Peru1 (1.04) and Peru2 (3.10) sequences, to the common ancestor of the amerindian sequences (1.03) and to the lineage leading from the common ancestor of the asian sequences (1.29). These branches are subject to positive selection, while the amerindian common ancestor is neutral over the length of the gene.
2Δl was calculated as 161 between the null and alternative models, for the sites test, which was statistically significant. Estimates of parameters were as follows: p0 = 0.51, p1 = 0.49, ω0 = 0.03, ω1 = 1 (neutral model), p0 = 0.47, p1 = 0.38, p3 = 0.14, ω0 = 0.03, ω1 = 1, ω2 = 3.74 (selection model). Sites identified as being under positive selection, with statistical significance according to the Bayes Empirical Bayes test , were: 101, 206, 306, 378, 532, 542, 548, 604, 651, 774, 793, 815, 831, 834, 869, 876, 886, 892, 901, 998, 1004. The numbering was based on the Peru1 sequence.
Statistics of the branches-sites positive selection analysis
Lineage on tree
Residues predicted to be under
positive selection (p < 0.05)
794, 834, 837
202, 274, 275, 277, 278, 279, 281, 282, 283, 287, 461, 834, 895, 896, 899, 900, 901, 903, 905, 908, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922
665, 799, 803
186, 198, 667, 808
Ancestral lineage of Amerindian strains
Ancestral lineage of Asian strains
Positive selection on cagA is likely to be due to avoidance of the adaptive immune response, IgG, or to enhance binding to cellular receptors which are antagonistically co-evolving. There is a strong immune response against the cagA protein (cagA is immunodominant); this may have led to an 'arms race' between host and bacteria, and hence the signature of positive selection. This is often the case with extracellular proteins of pathogens, either located on the cell surface or secreted. There is a precedent in bacteria, with the porB porin gene of Neisseria gonorrhoeae and meningitidis, and a variety of extracellular proteins from Escherichia coli. Secreted slr proteins from H.pylori also show signatures of positive selection . This scenario would imply that the regions of cagA under positive selection are immunogenic.
Codon usage analysis of the cagA genes
H. pylori Strain
Polymorphisms in the IL-1 gene cluster modify gastric cancer risk . The induction of IL-8 secretion by the cag PaI is a major stimulus of the immune response . Thus, differences in host interleukin genotypes may lead to differences in outcome for disease progression and differences in selective pressure on the cagA genes in different populations. Amerindians underwent a population bottleneck during the migration of their ancestors from Asia . Phenotypic evidence of this is the universality of the O blood group amongst amerindians , this may have led to a homogeneity of immune response. This may have affected the strains capacity to bind non O human blood antigens; most H.pylori strains are able to bind the A,B and O antigens via the babA adhesin, while amerindian strains from South America bind best to O antigens . It is interesting to note that the east asian population is also relatively genetically homogenous .
Both commensal and pathogenic bacteria possess mechanisms for the avoidance of the host immune system. Several mechanisms have been shown to be involved in avoidance of the immune system by H.pylori. However, cagA+ strains elicit a strengthened immune response and increased inflammation [56–58]. Inflammation may be a mechanism to obtain nutrients , however if cagA is evolving to avoid the immune system while at the same time stimulating it, then this seems contradictory.
Mortality figures from gastric cancer for populations examined in this study
Incidence of gastric cancer (per 100000)
Incidence of esophageal cancer (per 100000)
Given that amerindian and the ancestral asian cagA sequences show stronger signs of positive selection, and that asian and latin american populations can exhibit high incidences of gastric cancer, this might imply a link between the strength of positive selection on the cagA gene and the oncogenicity of the gene. The results of the sliding window analysis, where the cagA region containing the EPIYA domains is under positive selection, are consistent with this hypothesis. Further work is required. If verified, this form of sequence analysis may help identify at risk populations.
The signature of positive selection observed on the cagA gene indicates that the cagA protein is undergoing adaptive evolution in some strains, and is beneficial to the bacteria. Differences in rates of adaptation imply host specific differences. The benefit to the bacteria is mediated via the role of cagA within the pathogenicity island; the specific role of cagA, and that of the PaI, remain to be determined. In general, PaIs have a role in promoting survival of bacterial pathogens . The positive selection observed on the cagA oncogene is unusual as it is the first case observed of positive selection on an oncogene in a vertically transmitted pathogen. Positive selection is a feature of antagonistic coevolution, which implies harmful effects on the host, but also mutualistic coevolution, which implies benefits. Positive selection has been observed on the Epstein Barr Virus - encoded oncogene LMP1  and the human papillomavirus type 16 oncogene [65, 66], however these are horizontally transmitted pathogens where a balance is expected between virulence and transmissibility . This may imply that H.pylori has been horizontally transmitted to a greater extent than previously recognized.
In addition, potential beneficial effects of cagA at the population level via elimination of the elderly has been suggested  (this explanation relies on the theory of inclusive fitness ). This essentially views cagA as a gene that enhances intrinsic mortality in old individuals, however it is unclear whether intrinsic mortality in a subgroup of the population has ever been selected for. While H.pylori has largely been considered a pathogen, there is increasing evidence of its positive benefits to human health. For instance, H.pylori has a beneficial role in preventing esophageal cancer, by reducing acid reflux [71, 72], however in the past this has been unlikely to have provided much evolutionary benefit to the human population given that over 90% of patients are over 55 , while before the 20th century the average life expectancy of human populations was less than 40. The strongest inverse correlation between esophageal cancer occurrence and infection with H.pylori is in East Asia, attributed to the highly interactive (eastern) form of cagA, which causes pan- and corpus- predominant gastritis and reduces acid production . There is also an inverse relationship between H.pylori and asthma and allergies [74–76], obesity  and infant diarrhea . Asthma and obesity are modern illnesses, so are unlikely to have played a role in the evolutionary dynamics of the bacteria.
Ulcers are a modern disease , while gastric cancer has been recorded since ancient times. However, it is most prevalent in 55 year olds and over, this indicates that historically it is unlikely to have exerted a strong selective pressure, given that before the 20th century the average life expectancy was considerably lower. These considerations lead to the conclusion that the cagA gene is either insufficiently deleterious to the human host, that the cagA protein has a beneficial component to the host, or that horizontal transmission has been an important feature of H.pylori in the recent past. There is increasing evidence that in developing countries, horizontal transmission of H.pylori occurs due to poor sanitary conditions [80, 81]. If there is (or has been) significant horizontal transmission, then there may be population specific differences in the amount of horizontal transmission which may have led to differences in selective pressures on the pathogen.
H.pylori has been utilized as a model for infective carcinogenesis, and is a model of pathogen evolution. The results of this work suggest that the cagA gene is insufficiently deleterious to the human host, that the cagA protein has a benefit to the host or that horizontal inheritance has affected the evolutionary dynamics of the bacteria more than recognized. The results reported here offer an insight into important aspects of microbe-host coevolution.
This work was supported by the Biology Department, University of Puerto Rico - Rio Piedras.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.