Skip to main content

The endemic Helicobacter pylori population in Southern Vietnam has both South East Asian and European origins



The burden of Helicobacter pylori-induced gastric cancer varies based on predominant H. pylori population in various geographical regions. Vietnam is a high H. pylori burden country with the highest age-standardized incidence rate of gastric cancer (16.3 cases/100,000 for both sexes) in Southeast Asia, despite this data on the H. pylori population is scanty. We examined the global context of the endemic H. pylori population in Vietnam and present a contextual and comparative genomics analysis of 83 H. pylori isolates from patients in Vietnam.


There are at least two major H. pylori populations are circulating in symptomatic Vietnamese patients. The majority of the isolates (~ 80%, 66/83) belong to the hspEastAsia and the remaining belong to hpEurope population (~ 20%, 17/83). In total, 66 isolates (66/83) were cagA positive, 64 were hspEastAsia isolates and two were hpEurope isolates. Examination of the second repeat region revealed that most of the cagA genes were ABD type (63/66; 61 were hspEastAsia isolates and two were hpEurope isolates). The remaining three isolates (all from hspEastAsia isolates) were ABC or ABCC types. We also detected that 4.5% (3/66) cagA gene from hspEastAsia isolates contained EPIYA-like sequences, ESIYA at EPIYA-B segments. Analysis of the vacA allelic type revealed 98.8% (82/83) and 41% (34/83) of the strains harboured the s1 and m1 allelic variant, respectively; 34/83 carried both s1m1 alleles. The most frequent genotypes among the cagA positive isolates were vacA s1m1/cagA + and vacA s1m2/cagA + , accounting for 51.5% (34/66) and 48.5% (32/66) of the isolates, respectively.


There are two predominant lineages of H. pylori circulating in Vietnam; most of the isolates belong to the hspEastAsia population. The hpEurope population is further divided into two smaller clusters.


Helicobacter pylori is an important human pathogen that is likely to be present in gastric mucosa of over half of the world’s population. The prevalence of H. pylori infection appears to be higher in the low- and middle- income countries than developed countries, with infection prevalence between ethnic groups within countries often varied [1, 2]. Such localised differences might be attributable to socioeconomic factors [4,5,6], although H. pylori related issues may contribute. The prevalence of infection in Asia and Africa is 54.7% to 79.1%, respectively. In North- and South- America the prevalence is 37.1% and 63.4%, respectively and in Europe, the prevalence is on averages 47.0% [3]. Prevalence differences between racial and ethnic groups have been described in various parts of the world, but the extent to which such differences can be attributed to socioeconomic and other possible risk factors is unclear [4,5,6]. Vietnam is the easternmost mainland country in Southeast Asia with an estimated population of 96 million (2019, UNFPA-VN) among which there are more than 50 ethnic groups of different cultures; ~ 65% of these groups are located exclusively in remote or rural areas (2019, UNFPA-VN) [7, 8]. Earlier studies in both hospital and community settings showed a high prevalence of H. pylori infection in Vietnam [9,10,11]. There is considerable variation in socioeconomic status and lifestyle across a rapidly changing Vietnam, this study investigates the risk associated with H. pylori infection in a major urban community in southern Vietnam building on previous studies [9,10,11,12,13]. Importantly, this study examines international context of the H. pylori present in Vietnam in relation to the major H. pylori populations.

H. pylori has undergone localized co-evolution with humans for more than 60,000 years [14]. The pattern of distribution of H. pylori populations have a strong association with human migration and are named after the geographic regions historically associated with particular human populations [15] [16]. The pattern of distribution H. pylori populations is indicative of the epidemiology of this organism, being exclusively associated with humans and very localized transmission, almost vertical. Importantly, the incidence and severity of gastric disease associated with H. pylori infection is observed with particular H. pylori genetic types in particular regions of the world. For instance, in East Asian countries such as Japan and Korea the incidence of gastric cancer is higher relative to European and North American countries [17].

The cytotoxin associated gene pathogenicity island (CagPAI) is one of the major virulence determinants of H. pylori. Several virulence genes in the CagPAI trigger abnormal cellular signals in the host. This abnormal cell signalling is likely to contribute to H. pylori-infection associated disease, including gastric cancer (GC). The cagA gene, present in the CagPAI, is known to be an important virulence factor and plays a key role in pathogenesis. The cagA gene is not present in all H. pylori strains, more than 90% of H. pylori isolates from East Asian countries carry cagA, compared to 50–70% of isolates from the Western countries [18, 19]. Although, studies of H. pylori isolates from East Asia showed individuals carrying cagA positive strains have an increased risk of peptic ulcer disease (PUD) and/or GC, compared to those from Western countries carrying cagA positive strains [20,21,22]. Functionally, the protein encoded by cagA activates several signal transduction pathways that bind and disrupt the function of epithelial junctions, leading to aberrations in the functioning of the tight junction, cell polarity and cell differentiation in the host [23].

The H. pylori vacuolating cytotoxin A, encoded by the vacA gene, is endocytosed by the host cells and causing changes including membrane channel formation resulting in cytochrome c release which initiates apoptosis and a pro-inflammatory response [24]. Particular allelic variants of vacA and cagA are associated with H. pylori-associated disease sequelae. Allelic types are associated with H. pylori populations and are probably host-specific adaptive changes [25]. The typing scheme used for vacA is based on the middle (m) and signal (s) region of the gene with two types defined for each region; alleles: m1 m2 and S1 and S2 respectively. In vitro experiments showed s1m1 strains induce cell vacuolation more frequently than s1m2 or s2m2, from which it was inferred that the s1m1 was more cytotoxic [26].

Vietnam has emerged as a country with the highest age-standardized incidence rate (ASR) of GC (16.3 cases/100,000 for both sexes) in Southeast Asia (GLOBOCAN 2012; Previous studies have also reported the high prevalence of H. pylori infection in Vietnam and its association with peptic ulcer diseases, active gastritis, atrophy, and intestinal metaplasia [27]. As part of this prospective cross-sectional study, we have used isolate genome sequencing to enable the investigation of the H. pylori population types circulating in symptomatic Vietnamese patients. The genomic relationship between isolates and gene typing for the cagA and vacA genes (derived from the genome sequence for each isolate) provide key baseline information for identifying bacterial associated risk factors for H. pylori-associated disease in Vietnam and how these risk factors compare with H. pylori-associated disease in other parts of the world.

Materials and methods

Patient and specimen collection

We conducted a prospective cross-sectional study among patients attending at Gastroenterology Department of Gia Dinh Hospital, Ho Chi Minh City, Vietnam from August 2016 to February 2017. Instead of random selection, only patients with symptoms of upper gastrointestinal discomfort, heartburn, gastric or duodenal ulcer were eligible for enrolment. Candidate patients were informed about the study procedure and written informed consent was obtained for participation. Sociodemographic and clinical information was collected for each patient using a structured questionnaire at the time of clinical presentation. An endoscopic examination was performed by a trained clinician and two biopsy specimens (one from the gastric antrum and one from the corpus) were collected from each patient using well-washed and disinfected fibre optic endoscopes (model GIF XQ 30; Olympus, Japan). The biopsy specimens were transported to the laboratory in Stuart transport medium at 4 °C.

Isolation of H. pylori

Biopsy samples were vortexed vigorously for 5 min and plated on Brain Heart Infusion (BHI) agar (Oxoid Ltd, Hampshire, United Kingdom) supplemented with 7.5% sheep blood, 0.4% Isovitalex, and H. pylori Dent supplement (Oxoid, United Kingdom). Plates were incubated at 37 °C in an atmosphere of 5% O2, 15% CO2, and 80% N2 for 3 to 7 days. H. pylori colonies were identified based on their typical morphology, characteristic appearance on Gram staining, a positive urease test, and subsequently confirmed by MALDI_TOF (Bruker, Germany). Isolates were stored at minus 80 °C in 0.5 ml of brain heart infusion (BHI) broth with 20% glycerol.

Genomic DNA extraction and genome sequencing

Revived isolates were subcultured on selective BHI solid medium containing 7.5% sheep blood and 0.4% isovitalex under microaerophilic conditions (5% O2, 15% CO2, 80% N2) at 37 °C for 3–5 days [28]. Genomic DNA was prepared from confluent growth using a commercial DNA extraction kit (Qiagen DNA Mini kit, Germany). Genomic libraries were prepared using the Nextera DNA sample preparation kit (Illumina, San Diego, USA). Library sequencing was performed on the Illumina MiSeq instrument using the V3-600 cycle, paired-end kit (Illumina, CA. USA). Readsets for isolates sequenced as part of this study are available at National Center for Biotechnology (NCBI) under BioProject PRJNA689207

Bacterial genome assembly and annotation

Sequences were analysed using the Nullarbor pipeline ( In brief, low-quality bases and adaptor contamination were trimmed off with Trimmomatic [29], readsets with at least 35 × read depth of coverage were retained for analysis. Isolate purity was evaluated with Kraken (v0.10.5) 5 [30]. SPAdes (v.3.9.0) [31] and Prokka (v.1.12) were used for de novo assembly and genome annotation, respectively. [32]. We used tRNAScan and RNAmmer to identify tRNA and rRNA in the draft genomes, respectively [33, 34]. The identification of phage related regions was carried out using the PHASTER tool [35].

Phylogenetic analysis

Forty-two [42] reference H. pylori genome sequences representing selected H. pylori populations were downloaded from the NCBI, details are shown in Additional file 1: Table S1. Reads from the reference strains and the isolates in this study were aligned to the H. pylori strain 26695 (Accession: NC_000915) reference genome sequence using the Burrows-Wheeler Aligner MEM (v 0.7.15-r1140) algorithm [36] as implemented in Snippy; the core genome alignment was used to construct an SNP-based phylogenetic tree using FastTree [37]. SNPs were identified using Freebayes (v1.0.2) under a haploid model, with a minimum depth of coverage of 10× and allelic frequency of 0.9 required to confidently call an SNP [38]. The phylogenetic tree was visualized using MEGA-X [39].

Core genome and pan-genome analysis

OrthoMCL was used to identify orthologous clusters using predicted protein sequences from each of the studied isolates (minimum threshold of 50 amino acids in length with identity and e-value parameters were at 70% and 0.00001 respectively) [40]. The identified clusters were aligned against the EggNOG database to predict a functional category. Clusters that contained proteins with more than one domain with distinct categories were assigned multiple categories. The functional categories were graphically represented using R ( Proteins that could not be classified were assigned to category S (hypothetical). Graphical overviews of categorized strain-specific genes were produced using R.

Identification of virulence-associated genes and cag pathogenicity island

H. pylori virulence genes were obtained from VFDB [41]. Genes were detected using Abricate ( with a minimum 80% sequence identity and 90% gene coverage [42]. Virulence gene distribution across isolates was visualised using Phandango ( A visual overview of differences in gene content was obtained using Blast Ring Image Generator (BRIG) [43] with isolate genome sequences aligned against cagPAI of H. pylori strain 26695 (typical HpEurope) or strain F57 (typical hspEAsia).

Statistical analysis

Data analysis was performed using Statistical Package for Social Science (SPSS) software (IBM SPSS Statistics 23, NY USA). Baseline descriptive statistics were summarized for the variables of interest. Comparisons between groups were performed using either the chi-squared or Fisher’s exact tests for categorical variables; t-tests and the Mann–Whitney U-test were used for continuous variables. A two-sided P value of > 0.05 was considered statistically significant.

Ethics statement

The ethical review committee of the National University Ho Chi Minh City, Vietnam approved the study (Approval No: 702/DHQG-KHCN). Written informed consent was mandatory for patient enrolment in the study. For patients < 18 years, written informed consent was obtained from a parent or guardian.


Patient population

One hundred sixty-one patients were enrolled in the study from August 2016 to February 2017. Among the patients, 44.7% (72/161) were male. The age (median; interquartile range (IQR)) was 39.4; 32–48 years. Among the patients, 51.6% (83/161) presented with epigastralgia, 31.7% (51/161) with abdominal fullness and 23.0% (37/161) with indigestion. In endoscopic examination, 95.7% of patients had stomach inflammation including 74.5% (120/161) congestion, 37.9% (16/161) erosion, 26.1% (42/161) oedema (Additional file 2: Table S2). Among the patients, 57.1% (92/161) had a primary infection (diagnosed with H. pylori infection for the first time) and 42.8% (69/161) had secondary infections (i.e. had a previous history of H. pylori infection). There was no difference in age, sex, gender, smoking, alcohol consumption, clinical symptoms and endoscopic findings between primary and secondary infection, although the number of symptoms was higher in secondary infection patients. Among the 161 positive biopsy samples diagnosed for H. pylori, 156 were tested positive by rapid urease test and five samples by H. pylori antigen test. Initially, H. pylori was cultured from 59% (95/161) patients, although only 87.4% (83/95) of these isolates could be revived and analysed.

Genome characteristics

Summaries of the read data set and draft genome for each of the 83 H. pylori isolates are presented in Table 1. The read depth coverage in each of the isolate read sets ranged from 38–456×. The draft genome sequences comprised of between 16 and 83 coting’s. Overall, the average genome size was 1.6 Mb with 38.94% G + C content. For each isolate, the annotated genome sequence comprised between1451 and 1589 protein coding regions (CDS) with ~ 92% of the genome used for protein coding.

Table 1 Genome statistics of the whole-genome sequences of the 83 H. pylori isolates in this study

Single and incomplete phage associated region (8.1–13.5 kb) was detected in 17% (14/83) of the draft genome sequences. The phage sequences consist of between nine and 14 CDSs that encode either putative restriction-modification protein, TMP kinase, PcrA helicase, putative transposase, or other hypothetical proteins in addition to phage related genes (Additional file 3: Table S3).

Core and pan-genome analysis

The core- and pan- genome analysis by OrthoMCL identified 1,194 orthologous clusters (core genome) from the 119,366 annotated proteins in the 83 isolates. Among these 1070 orthologous clusters (core genome) were assigned functional categories using EggNOG database (Fig. 1a). A high proportion (12.7%, 136/1,070 and 7.7%, 83/1070) of the classified clusters belonged to the J (translation, ribosomal structure, and biogenesis), and M (cell membrane/envelope biogenesis) functional category, respectively. Proteins with no orthologues were detected in a small number of isolates, 26% (31/83) isolates contained either one or two proteins of this type. Most of these unique proteins were V (defence mechanism) or S (hypothetical) functional categories (Fig. 1b).

Fig. 1

A Functional classification of 1,194 core orthologous clusters produced from the set of predicted proteins encoded on the genome sequence of each the 83 H. pylori study isolates using OrthoMCL. B Functional classification of the 28 isolate specific genes identified as part of the comparison of the protein coding capacity of the 83 study isolates. On the X-axis is the number of genes in each functional class on the Y-axis

Phylogenetic analysis

The genomic relationship between the 83 study isolates and 42 reference genome sequences for which the H. pylori population type was known was inferred from the core genome using the H. pylori strain 26,695 (Accession: NC_000915) as the reference genome sequence for read mapping. The tree shown in Fig. 2 provides a visual summary of the relationship between isolates. The core genome comparison showed that 80% (66/83) of the isolates were part of the H. pylori hspEastAsia population and the remaining 20%, 17/83 of isolates were part of the H. pylori hpEurope population based on the core genome relationship with the 42 classified isolates (Fig. 2).

Fig. 2

A tree showing the core genome relationship between the 83 Vietnamese H. pylori isolates and 42 H. pylori reference genomes. The 83 Vietnamese isolates are indicated by black terminal branches, while classified isolates are shown with coloured terminal branches as follows: hspEastAsia (blue), hpEurope (brown), hspWAfrica (pink), hpNEAfrica (purple), hspAmerind (orange) and hpAsia2 (green). The tree was inferred using the core genome comparison method as implemented in Nullarbor with H. pylori strain 26695 (Accession: NC_000915) used as the reference genome sequence for read mapping. The tree was modified using tools available in FigTree and MEGA-X

Virulence factors

Virulence factors detection using the VFDB showed that 80% (66/83) Vietnamese isolates harboured between 110 and 113 virulence genes including all CagPAI genes and the vacA virulence genes whereas, 20% (17/83) of isolates contained 83 to 92 virulence genes. The second group of isolates usually lacked the cag1 to cag3 and cagA to cagZ genes of the CagPAI. Genes encoding urease enzymes, most of the flagella associated proteins, some endotoxins, and most of the Lewis antigens such as FutB, FutC and NeuA/FlmD were detected in all isolates (Fig. 3).

Fig. 3

At the left there is a tree showing the core genome relationship between the 83 Vietnamese isolates. The virulence gene content for each of the isolates is colour coded at the right. Virulence genes detected are those present in the VFDB, virulence genes were detected using Abricate. (Green shows genes detected with less than 90% gene coverage, while Orange shows genes detected with greater than 90% gene coverage. Purple shows the gene was not detected

The virulence properties of the isolates are presented in Table 2. A complete CagPAI was present in 80% (66/83) of the genomes; of these, 97% (64/66) CagPAI positive isolates belonged to hspEastAsia population and the remaining 3% (2/66) belonged to hpEurope population (Table 2).Among 17 hpEurope isolates, 15 were CagPAI negative. Most of the CagPAI positive hspEastAsia and hpEurope isolates lacked an orthologue to the DNA helicase (HP0548) present in the Western-type CagPAI sequence found in H. pylori strain 26695 (Fig. 4).

Table 2 H. pylori virulence factors (cagA and vacA) in study isolates
Fig. 4

Comparison of the genetic organization of cagPAI of Vietnamese H. pylori isolates with a Western-type cagPAI (H. pylori strain 26695). The innermost blue ring shows the strain 26695 sequence with the HpEurope classified Vietnamese isolates shown as yellow rings and hspEastAsia Vietnamese isolates shown as pink rings. The Figure was constructed using BRIG

Sequence analyses of the second repeat region of the cagA gene revealed that 95% (63/66), including two hpEurope isolates were of the ABD type, while the remaining three isolates (all hspEastAsia) were EPIYA-ABC or EPIYA-ABCC types (Table 2). Two hpEurope isolates had ABD type second repeat region of the cagA gene, which is an atypical characteristic of hpEurope strains. We also found 5% (3/63) of isolates containing an East Asian type cagA contained EPIYA-like sequences, ESIYA at EPIYA-B segments. Three vacA types were detected among the Vietnamese isolates, 34 isolates were s1m1 type, 48 isolates were s1m2 type and one isolate was s2m2 type. The most frequent genotypes among the cagA positive isolates were vacA s1m1/cagA + and vacA s1m2/ cagA + , accounted 51.5% (34/66) and 48.5% (32/66) of isolates, respectively.


H. pylori infection is associated with the development of gastric disease in the host; the frequency of infection and frequency of disease in the host varies across the world but there is an association between particular H. pylori genetic types in particular geographic regions with the disease. Developing effective strategies to manage H. pylori-associated disease relies on understanding the local H. pylori populations. This in conjunction with the significant H. pylori-associated disease burden in Vietnam highlights the important knowledge gap addressed by this study. Herein, we present genomic and epidemiological data for 83 Vietnamese H. pylori isolates. The frequency of H. pylori isolation was 59% (95/161) from the biopsies of symptomatic patients. This is similar to the result of earlier studies, where 270 randomly selected patients who underwent esophagogastroduodenoscopy at the endoscopy centres at either of two major hospitals in Hanoi and Ho Chi Minh (the biggest city in Northern and Southern Vietnam, respectively) [27]. Our phylogenetic data show that most H. pylori isolates from symptomatic Vietnamese patients are from the hspEastAsia population (80% of isolates). The dominance of the hspEastAsia population is consistent with the H. pylori population being strongly associated with human migration [16] where historical and emigrational evidence suggests the Vietnamese are more related to people from North Asia than to people from South Asia [44]. Moreover, migratory patterns with North Asia would have been influenced by the fact that Vietnam was under Chinese occupation for over a thousand years. Notably, a group of the Vietnamese isolates form an exclusive clade within the hspEAsia population, perhaps indicating that the Vietnamese were isolated from other South East Asian populations for an extended period; this may be supported by a study by Breurec et al. showing Khmer and Vietnamese isolates as deep branching members of the hspEastAsia H. pylori population [45]. More extensive sampling of H. pylori in the region would be required to confirm a H. pylori subpopulation for Vietnam. The Vietnamese H. pylori isolates that are part of the hpEurope population are likely to have arisen through the French colonial occupation of Vietnam and other parts of South East Asia during the 19th and early 20th centuries. We observe a small number of isolates that appear to be related to the representative isolates from the hpNEAfrica or hspWAfrica population used in our comparative analysis (Fig. 2). Another possibility is that these isolates are recombinant hybrids arising from the endemic hspEastAsia and hpEurope population strains now present in Vietnam [45].

The prevalence of H. pylori infection has been reported in between 50 to 80% in several studies conducted in adults in Vietnam, this is similar to Japan, Korea or China, and other South Asian nations [9,10,11, 46, 47]. The genetic characteristics and diversity of Vietnamese H. pylori strains could be a factor contributing to the high incidence of gastric cancer in Vietnam. Evidence indicates that the isoforms of vacA and the type and number of the EPIYA motifs in the cagA gene strongly influence the type and magnitude of the histological damage of the gastric mucosa. For example, the vacA s1m1 genotype has been associated with intestinal metaplasia, severe inflammation and a high risk of gastric cancer [20, 48, 49]. In this study, the s1m1vacA allelic combination was detected in 41% of isolates. In addition, East Asian cagA, which is more prevalent in Vietnamese isolates is more frequently associated with disease than Western cagA [20, 50, 51]. This study revealed a lower frequency of cagA than previous reports on Vietnamese H. pylori [52,53,54,55] which may contribute to the lower rates of gastric ulcer and gastric cancer observed in Vietnam. In dyspeptic patients from central Vietnam, the frequency of cagA + strains was 84% [54]. In H. pylori strains from Southern Vietnam with gastric cancer and peptic ulcer, all strains were cagA positive [52]. In this study, the cagA was found frequently with the vacA s1m1 allelic type (51.5%, 34/66), which is consistent with previous reports from South or North Vietnam isolates [27, 55]. The most frequent EPIYA motif found in our isolates was ABD (96.6%; 63/66), which is similar to previous reports from Vietnamese patients with the gastric disease [52, 55]. However, these frequencies were different in central Vietnam isolates, where vacA s1m1/ cagA + genotype was detected in 64.86% (48/74) of isolates and the cagA–ABD motif was found in a lower proportion (91%) [54].

We observed that 88.2% (15/17) of hpEurope isolates were either negative or possibly lost their cagA during the course of evolution or, if present, they had ABD type EPIYA-motif. The presence of ABD type EPIYA-motif pattern is an atypical characteristic of hpEurope strains where ABC type EPIYA-motif is more prevalent. The gene content and organization of genes of cagPAI are highly conserved. The phylogeny of most cagPAI genes including cagA was found to be similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori [56]. Recombination events during mixed infection have been identified as a major driving force behind allelic diversity in H. pylori cagPAI largely reflects that of H. pylori’s housekeeping genes being under diversifying selection or positive selection due to host polymorphisms which could even result in modified host protein interactions [56]. Accordingly, hpEurope and hspEastAsia strains are expected to carry a Western and an East Asian cagA respectively. A prominent example of amino acid diversity noted previously are the EPIYA motifs in the C- terminal half of cagA, which differ between Asian (hpAsia2; hspEastAsia) (type D) and all other populations [57]. The D type EPIYA repeat binds SHP-2 phosphatase more avidly than other types [22]. Furthermore, Furuta Y. et al. also clarified the recombination-mediated routes of cagA evolution and provided a solid basis for a deeper understanding of its function in pathogenesis [58]. Based on this observation, the predominant host may be applying a selective pressure on Vietnamese hpEurope strains for the ABD type cagA that is normally observed in the cagA of hspEastAsia lineage strains.


Our study confirmed the high prevalence of H. pylori infection and the most virulent genotypes combination vacA s1m1/cagA + in H. pylori isolates recovered from Vietnamese symptomatic patients, which may explain the higher incidence rate of gastric cancer in Vietnam. Our data on the genetic architecture of H. pylori strains isolated from symptomatic Vietnamese patients showed two predominant lineages, with the majority of isolates belonging to the hspEastAsia population. However, there is another group of Vietnamese isolates that is part of hpEurope population. Interestingly, the hpEurope population isolates are divided into two subclusters. Although phylogeny has been improved by increasing the number of genes analyzed, analyses of a limited number of genes cannot uncover more complex evolutionary events. Our study also has a limitation that almost all our enrolled patients were in the early stage of gastric diseases, so we could not explore the interaction between H. pylori genotypes and their outcomes.

Availability of data and materials

All sequence data are available at National Centre for Biotechnology (NCBI) under BioProject PRJNA689207 ( All other data and materials used for this publication are available under the OUCRU data sharing policy and can be requested at



Age-standardized incidence rate


Brain heart infusion


Cytotoxin associated gene pathogenicity island


Coding sequences


Gastric cancer


National Center for Biotechnology


Peptic ulcer disease


Single Nucleotide Polymorphism


Statistical Package for Social Science


Vacuolating cytotoxin A


United Nations Population Fund-Vietnam


Virulence Factor Database


  1. 1.

    Fox JG, Yan LL, Dewhirst FE, Paster BJ, Shames B, Murphy JC, Hayward A, Belcher JC, Mendes EN. Helicobacter bilis sp. nov., a novel helicobacter species isolated from bile, livers, and intestines of aged, inbred mice. J Clin Microbiol. 1995;33(2):445–54.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Eusebi LH, Zagari RM, Bazzoli F. Epidemiology of Helicobacter pylori infection. Helicobacter. 2014;19(Suppl 1):1–5.

    PubMed  Article  PubMed Central  Google Scholar 

  3. 3.

    Hooi JKY, Lai WY, Ng WK, Suen MMY, Underwood FE, Tanyingoh D, Malfertheiner P, Graham DY, Wong VWS, Wu JCY, et al. Global prevalence of helicobacter pylori infection: systematic review and meta-analysis. Gastroenterology. 2017;153(2):420–9.

    PubMed  Article  PubMed Central  Google Scholar 

  4. 4.

    Malaty HM, El-Kasabany A, Graham DY, Miller CC, Reddy SG, Srinivasan SR, Yamaoka Y, Berenson GS. Age at acquisition of Helicobacter pylori infection: a follow-up study from infancy to adulthood. Lancet. 2002;359(9310):931–5.

    PubMed  Article  PubMed Central  Google Scholar 

  5. 5.

    Kivi M, Johansson AL, Reilly M, Tindberg Y. Helicobacter pylori status in family members as risk factors for infection in children. Epidemiol Infect. 2005;133(4):645–52.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Epplein M, Signorello LB, Zheng W, Peek RM Jr, Michel A, Williams SM, Pawlita M, Correa P, Cai Q, Blot WJ. Race, African ancestry, and Helicobacter pylori infection in a low-income United States population. Cancer Epidemiol Biomarkers Prev. 2011;20(5):826–34.

    PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Rheinlander T, Samuelsen H, Dalsgaard A, Konradsen F. Hygiene and sanitation among ethnic minorities in Northern Vietnam: does government promotion match community priorities? Soc Sci Med. 2010;71(5):994–1001.

    PubMed  Article  PubMed Central  Google Scholar 

  8. 8.

    Vietnam. Ban chỉ đạo Tỏ̂ng điè̂u tra dân só̂ và nhà ở trung ương. The 2009 Vietnam population and housing census. Central Population and Housing Census, Steering Committee: Hanoi; 2010.

    Google Scholar 

  9. 9.

    Hoang TT, Bengtsson C, Phung DC, Sorberg M, Granstrom M. Seroprevalence of Helicobacter pylori infection in urban and rural Vietnam. Clin Diagn Lab Immunol. 2005;12(1):81–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Nguyen VB, Nguyen GK, Phung DC, Okrainec K, Raymond J, Dupond C, Kremp O, Kalach N, Vidal-Trecan G. Intra-familial transmission of Helicobacter pylori infection in children of households with multiple generations in Vietnam. Eur J Epidemiol. 2006;21(6):459–63.

    PubMed  Article  PubMed Central  Google Scholar 

  11. 11.

    Nguyen BV, Nguyen KG, Phung CD, Kremp O, Kalach N, Dupont C, Raymond J, Vidal-Trecan G. Prevalence of and factors associated with Helicobacter pylori infection in children in the north of Vietnam. Am J Trop Med Hyg. 2006;74(4):536–9.

    PubMed  Article  PubMed Central  Google Scholar 

  12. 12.

    Nguyen TV, Nguyen VB, et al. Prevalence and risk factors of Helicobacter pylori infection in Muong children in Vietnam. Ann Clin Lab Res. 2017;5:1.

    Article  Google Scholar 

  13. 13.

    Nguyen LX. Epidemiological features of Helicobacter pylori infection in children of five different ethnics in mountainous village. J Med Res. 2007;55(6):146–53.

    Google Scholar 

  14. 14.

    Linz B, Balloux F, Moodley Y, Manica A, Liu H, Roumagnac P, Falush D, Stamer C, Prugnolle F, van der Merwe SW, et al. An African origin for the intimate association between humans and Helicobacter pylori. Nature. 2007;445(7130):915–8.

    PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    Yamaoka Y. Helicobacter pylori typing as a tool for tracking human migration. Clin Microbiol Infect. 2009;15(9):829–34.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Falush D, Wirth T, Linz B, Pritchard JK, Stephens M, Kidd M, Blaser MJ, Graham DY, Vacher S, Perez-Perez GI, et al. Traces of human migrations in Helicobacter pylori populations. Science. 2003;299(5612):1582–5.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  17. 17.

    Bickenbach K, Strong VE. Comparisons of gastric cancer treatments: East vs West. J Gastric Cancer. 2012;12(2):55–62.

    PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Yamaoka Y, Kodama T, Kashima K, Graham DY, Sepulveda AR. Variants of the 3′ region of the cagA gene in Helicobacter pylori isolates from patients with different H. pylori-associated diseases. J Clin Microbiol. 1998;36(8):2258–63.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    Yamaoka Y, Orito E, Mizokami M, Gutierrez O, Saitou N, Kodama T, Osato MS, Kim JG, Ramirez FC, Mahachai V, et al. Helicobacter pylori in North and South America before Columbus. FEBS Lett. 2002;517(1–3):180–4.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  20. 20.

    Jones KR, Joo YM, Jang S, Yoo YJ, Lee HS, Chung IS, Olsen CH, Whitmire JM, Merrell DS, Cha JH. Polymorphism in the CagA EPIYA motif impacts development of gastric cancer. J Clin Microbiol. 2009;47(4):959–68.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Vilaichone RK, Mahachai V, Tumwasorn S, Wu JY, Graham DY, Yamaoka Y. Molecular epidemiology and outcome of Helicobacter pylori infection in Thailand: a cultural cross roads. Helicobacter. 2004;9(5):453–9.

    PubMed  Article  PubMed Central  Google Scholar 

  22. 22.

    Azuma T, Yamakawa A, Yamazaki S, Ohtani M, Ito Y, Muramatsu A, Suto H, Yamazaki Y, Keida Y, Higashi H, et al. Distinct diversity of the cag pathogenicity island among Helicobacter pylori strains in Japan. J Clin Microbiol. 2004;42(6):2508–17.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. 23.

    Backert S, Tegtmeyer N, Selbach M. The versatility of Helicobacter pylori CagA effector protein functions: the master key hypothesis. Helicobacter. 2010;15(3):163–76.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  24. 24.

    Roesler BM, Rabelo-Goncalves EM, Zeitune JM. Virulence factors of Helicobacter pylori: a review. Clin Med Insights Gastroenterol. 2014;7:9–17.

    PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Kusters JG, van Vliet AH, Kuipers EJ. Pathogenesis of Helicobacter pylori infection. Clin Microbiol Rev. 2006;19(3):449–90.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Yamaoka Y. Mechanisms of disease: Helicobacter pylori virulence factors. Nat Rev Gastroenterol Hepatol. 2010;7(11):629–41.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Nguyen TL, Uchida T, Tsukamoto Y, Trinh DT, Ta L, Mai BH, Le SH, Thai KD, Ho DD, Hoang HH, et al. Helicobacter pylori infection and gastroduodenal diseases in Vietnam: a cross-sectional, hospital-based study. BMC Gastroenterol. 2010;10:114.

    PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Nahar S, Mukhopadhyay AK, Khan R, Ahmad MM, Datta S, Chattopadhyay S, Dhar SC, Sarker SA, Engstrand L, Berg DE, et al. Antimicrobial susceptibility of Helicobacter pylori strains isolated in Bangladesh. J Clin Microbiol. 2004;42(10):4856–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46.

    PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–77.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33.

    Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35(9):3100–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, Wishart DS. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 2016;44(W1):W16-21.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. 36.

    Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5(3):e9490.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  38. 38.

    Garrison EMG. Haplotype-based variant detection from short-read sequencing. arXiv. preprint arXiv 2012.

  39. 39.

    Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35(6):1547–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–89.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41.

    Chen L, Yang J, Yu J, Yao Z, Sun L, Shen Y, Jin Q. VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res. 2005;33(Database issue):D325-328.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  42. 42.

    Qumar S, Nguyen TH, Nahar S, Sarker N, Baker S, Bulach D, Ahmed N, Rahman M. A comparative whole genome analysis of Helicobacter pylori from a human dense South Asian setting. Helicobacter. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA. BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics. 2011;12:402.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Pischedda S, Barral-Arca R, Gomez-Carballa A, Pardo-Seco J, Catelli ML, Alvarez-Iglesias V, Cardenas JM, Nguyen ND, Ha HH, Le AT, et al. Phylogeographic and genome-wide investigations of Vietnam ethnic groups reveal signatures of complex historical demographic movements. Sci Rep. 2017;7(1):12630.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45.

    Breurec S, Guillard B, Hem S, Brisse S, Dieye FB, Huerre M, Oung C, Raymond J, Tan TS, Thiberge JM, et al. Evolutionary history of Helicobacter pylori sequences reflect past human migrations in Southeast Asia. PLoS ONE. 2011;6(7):e22058.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46.

    Quach DT, Vilaichone RK, Vu KV, Yamaoka Y, Sugano K, Mahachai V. Helicobacter pylori infection and related gastrointestinal diseases in Southeast Asian Countries: an expert opinion survey. Asian Pac J Cancer Prev. 2018;19(12):3565–9.

    PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Asaka M, Kimura T, Kudo M, Takeda H, Mitani S, Miyazaki T, Miki K, Graham DY. Relationship of Helicobacter pylori to serum pepsinogens in an asymptomatic Japanese population. Gastroenterology. 1992;102(3):760–6.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  48. 48.

    Zhou W, Yamazaki S, Yamakawa A, Ohtani M, Ito Y, Keida Y, Higashi H, Hatakeyama M, Si J, Azuma T. The diversity of vacA and cagA genes of Helicobacter pylori in East Asia. FEMS Immunol Med Microbiol. 2004;40(1):81–7.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  49. 49.

    Sahara S, Sugimoto M, Vilaichone RK, Mahachai V, Miyajima H, Furuta T, Yamaoka Y. Role of Helicobacter pylori cagA EPIYA motif and vacA genotypes for the development of gastrointestinal diseases in Southeast Asian countries: a meta-analysis. BMC Infect Dis. 2012;12:223.

    PubMed  PubMed Central  Article  Google Scholar 

  50. 50.

    Singh K, Ghoshal UC. Causal role of Helicobacter pylori infection in gastric cancer: an Asian enigma. World J Gastroenterol. 2006;12(9):1346–51.

    PubMed  PubMed Central  Article  Google Scholar 

  51. 51.

    Chattopadhyay S, Patra R, Chatterjee R, De R, Alam J, Ramamurthy T, Chowdhury A, Nair GB, Berg DE, Mukhopadhyay AK. Distinct repeat motifs at the C-terminal region of CagA of Helicobacter pylori strains isolated from diseased patients and asymptomatic individuals in West Bengal, India. Gut Pathog. 2012;4(1):4.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. 52.

    Truong BX, Mai VT, Tanaka H, le Ly T, Thong TM, Hai HH, Van Long D, Furumatsu K, Yoshida M, Kutsumi H, et al. Diverse characteristics of the CagA gene of Helicobacter pylori strains collected from patients from southern vietnam with gastric cancer and peptic ulcer. J Clin Microbiol. 2009;47(12):4021–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. 53.

    Nguyen LT, Uchida T, Tsukamoto Y, Trinh TD, Ta L, Mai HB, Le HS, Ho DQ, Hoang HH, Matsuhisa T, et al. Clinical relevance of cagPAI intactness in Helicobacter pylori isolates from Vietnam. Eur J Clin Microbial Infect Dis. 2010;29(6):651–60.

    CAS  Article  Google Scholar 

  54. 54.

    Phan TN, Santona A, Tran VH, Tran TNH, Le VA, Cappuccinelli P, Rubino S, Paglietti B. Genotyping of Helicobacter pylori shows high diversity of strains circulating in central Vietnam. Infect Genet Evol. 2017;52:19–25.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  55. 55.

    Uchida T, Nguyen LT, Takayama A, Okimoto T, Kodama M, Murakami K, Matsuhisa T, Trinh TD, Ta L, Ho DQ, et al. Analysis of virulence factors of Helicobacter pylori isolated from a Vietnamese population. BMC Microbiol. 2009;9:175.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  56. 56.

    Olbermann P, Josenhans C, Moodley Y, Uhr M, Stamer C, Vauterin M, Suerbaum S, Achtman M, Linz B. A global overview of the genetic and functional diversity in the Helicobacter pylori cag pathogenicity island. PLoS Genet. 2010;6(8):e1001069.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  57. 57.

    Suerbaum S, Josenhans C. Helicobacter pylori evolution and phenotypic diversification in a changing host. Nat Rev Microbiol. 2007;5(6):441–52.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  58. 58.

    Furuta Y, Yahara K, Hatakeyama M, Kobayashi I. Evolution of cagA oncogene of Helicobacter pylori through recombination. PLoS ONE. 2011;6(8):e23499.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references


We would like to acknowledge Professor Niyaz Ahmed and the Pathogen Biology Laboratory at the University of Hyderabad for the invaluable assistance and support for this study.


No external funding was received for this project.

Author information




THN: Data collection, sequencing, sequence analysis; TMTH: Data collection, data analysis; PTHN: Data analysis, manuscript drafting; TDTN: Study design, data collection, manuscript review; BNQ: Patient enrolment, data collection, manuscript drafting; SQ: Data analysis, drafting the manuscript. DB: Bioinformatics analysis, manuscript review; VTN: Study design, data collection supervision, Study funding, data analysis supervision, manuscript writing and review; MR: Study design, data collection supervision, study funding, data analysis supervision, manuscript writing and review. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Motiur Rahman.

Ethics declarations

Ethics approval and consent to participate

The ethical review committee of the National University of Ho Chi Minh City, Vietnam approved the study (Approval No: 702/ĐHQG-KHCN). Written informed consent was mandatory for entry into the study, this was requested from a parent or guardian from those aged < 18 years.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Table S1. H. pylori reference strains used in this study.

Additional file 2:

Table S2. Sociodemographic, behavioral, clinical information of the 161 patients^ included in the study.

Additional file 3:

Table S3. Putative phage regions identified in isolates using PHASTER tool and major genes encoded within these regions.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nguyen, T.H., Ho, T.T., Nguyen-Hoang, TP. et al. The endemic Helicobacter pylori population in Southern Vietnam has both South East Asian and European origins. Gut Pathog 13, 57 (2021).

Download citation


  • H. pylori
  • Vietnam
  • Molecular epidemiology