The endemic Helicobacter pylori population in Southern Vietnam has both South East Asian and European origins

Background The burden of Helicobacter pylori-induced gastric cancer varies based on predominant H. pylori population in various geographical regions. Vietnam is a high H. pylori burden country with the highest age-standardized incidence rate of gastric cancer (16.3 cases/100,000 for both sexes) in Southeast Asia, despite this data on the H. pylori population is scanty. We examined the global context of the endemic H. pylori population in Vietnam and present a contextual and comparative genomics analysis of 83 H. pylori isolates from patients in Vietnam. Results There are at least two major H. pylori populations are circulating in symptomatic Vietnamese patients. The majority of the isolates (~ 80%, 66/83) belong to the hspEastAsia and the remaining belong to hpEurope population (~ 20%, 17/83). In total, 66 isolates (66/83) were cagA positive, 64 were hspEastAsia isolates and two were hpEurope isolates. Examination of the second repeat region revealed that most of the cagA genes were ABD type (63/66; 61 were hspEastAsia isolates and two were hpEurope isolates). The remaining three isolates (all from hspEastAsia isolates) were ABC or ABCC types. We also detected that 4.5% (3/66) cagA gene from hspEastAsia isolates contained EPIYA-like sequences, ESIYA at EPIYA-B segments. Analysis of the vacA allelic type revealed 98.8% (82/83) and 41% (34/83) of the strains harboured the s1 and m1 allelic variant, respectively; 34/83 carried both s1m1 alleles. The most frequent genotypes among the cagA positive isolates were vacA s1m1/cagA + and vacA s1m2/cagA + , accounting for 51.5% (34/66) and 48.5% (32/66) of the isolates, respectively. Conclusions There are two predominant lineages of H. pylori circulating in Vietnam; most of the isolates belong to the hspEastAsia population. The hpEurope population is further divided into two smaller clusters. Supplementary Information The online version contains supplementary material available at 10.1186/s13099-021-00452-2.


Introduction
Helicobacter pylori is an important human pathogen that is likely to be present in gastric mucosa of over half of the world's population. The prevalence of H. pylori infection appears to be higher in the low-and middle-income countries than developed countries, with infection prevalence between ethnic groups within countries often varied [1,2]. Such localised differences might be attributable to socioeconomic factors [4][5][6], although H. pylori related issues may contribute. The prevalence of infection in Asia and Africa is 54.7% to 79.1%, respectively. In North-and South-America the prevalence is 37.1% and 63.4%, respectively and in Europe, the prevalence is on averages 47.0% [3]. Prevalence differences between racial and ethnic groups have been described in various parts of the world, but the extent to which such differences can be attributed to socioeconomic and other possible

Open Access
Gut Pathogens *Correspondence: mrahman@oucru.org 1 Oxford University Clinical Research Unit, 764 Vo Van Kiet Street, Ward 1, District 5, Ho Chi Minh City, Vietnam Full list of author information is available at the end of the article risk factors is unclear [4][5][6]. Vietnam is the easternmost mainland country in Southeast Asia with an estimated population of 96 million (2019, UNFPA-VN) among which there are more than 50 ethnic groups of different cultures; ~ 65% of these groups are located exclusively in remote or rural areas (2019, UNFPA-VN) [7,8]. Earlier studies in both hospital and community settings showed a high prevalence of H. pylori infection in Vietnam [9][10][11]. There is considerable variation in socioeconomic status and lifestyle across a rapidly changing Vietnam, this study investigates the risk associated with H. pylori infection in a major urban community in southern Vietnam building on previous studies [9][10][11][12][13]. Importantly, this study examines international context of the H. pylori present in Vietnam in relation to the major H. pylori populations.
H. pylori has undergone localized co-evolution with humans for more than 60,000 years [14]. The pattern of distribution of H. pylori populations have a strong association with human migration and are named after the geographic regions historically associated with particular human populations [15] [16]. The pattern of distribution H. pylori populations is indicative of the epidemiology of this organism, being exclusively associated with humans and very localized transmission, almost vertical. Importantly, the incidence and severity of gastric disease associated with H. pylori infection is observed with particular H. pylori genetic types in particular regions of the world. For instance, in East Asian countries such as Japan and Korea the incidence of gastric cancer is higher relative to European and North American countries [17].
The cytotoxin associated gene pathogenicity island (CagPAI) is one of the major virulence determinants of H. pylori. Several virulence genes in the CagPAI trigger abnormal cellular signals in the host. This abnormal cell signalling is likely to contribute to H. pylori-infection associated disease, including gastric cancer (GC). The cagA gene, present in the CagPAI, is known to be an important virulence factor and plays a key role in pathogenesis. The cagA gene is not present in all H. pylori strains, more than 90% of H. pylori isolates from East Asian countries carry cagA, compared to 50-70% of isolates from the Western countries [18,19]. Although, studies of H. pylori isolates from East Asia showed individuals carrying cagA positive strains have an increased risk of peptic ulcer disease (PUD) and/or GC, compared to those from Western countries carrying cagA positive strains [20][21][22]. Functionally, the protein encoded by cagA activates several signal transduction pathways that bind and disrupt the function of epithelial junctions, leading to aberrations in the functioning of the tight junction, cell polarity and cell differentiation in the host [23].
The H. pylori vacuolating cytotoxin A, encoded by the vacA gene, is endocytosed by the host cells and causing changes including membrane channel formation resulting in cytochrome c release which initiates apoptosis and a pro-inflammatory response [24]. Particular allelic variants of vacA and cagA are associated with H. pyloriassociated disease sequelae. Allelic types are associated with H. pylori populations and are probably host-specific adaptive changes [25]. The typing scheme used for vacA is based on the middle (m) and signal (s) region of the gene with two types defined for each region; alleles: m1 m2 and S1 and S2 respectively. In vitro experiments showed s1m1 strains induce cell vacuolation more frequently than s1m2 or s2m2, from which it was inferred that the s1m1 was more cytotoxic [26].
Vietnam has emerged as a country with the highest age-standardized incidence rate (ASR) of GC (16.3 cases/100,000 for both sexes) in Southeast Asia (GLO-BOCAN 2012; http:// globo can. iarc. fr). Previous studies have also reported the high prevalence of H. pylori infection in Vietnam and its association with peptic ulcer diseases, active gastritis, atrophy, and intestinal metaplasia [27]. As part of this prospective cross-sectional study, we have used isolate genome sequencing to enable the investigation of the H. pylori population types circulating in symptomatic Vietnamese patients. The genomic relationship between isolates and gene typing for the cagA and vacA genes (derived from the genome sequence for each isolate) provide key baseline information for identifying bacterial associated risk factors for H. pylori-associated disease in Vietnam and how these risk factors compare with H. pylori-associated disease in other parts of the world.

Patient and specimen collection
We conducted a prospective cross-sectional study among patients attending at Gastroenterology Department of Gia Dinh Hospital, Ho Chi Minh City, Vietnam from August 2016 to February 2017. Instead of random selection, only patients with symptoms of upper gastrointestinal discomfort, heartburn, gastric or duodenal ulcer were eligible for enrolment. Candidate patients were informed about the study procedure and written informed consent was obtained for participation. Sociodemographic and clinical information was collected for each patient using a structured questionnaire at the time of clinical presentation. An endoscopic examination was performed by a trained clinician and two biopsy specimens (one from the gastric antrum and one from the corpus) were collected from each patient using well-washed and disinfected fibre optic endoscopes (model GIF XQ 30; Olympus, Japan). The biopsy specimens were transported to the laboratory in Stuart transport medium at 4 °C.

Isolation of H. pylori
Biopsy samples were vortexed vigorously for 5 min and plated on Brain Heart Infusion (BHI) agar (Oxoid Ltd, Hampshire, United Kingdom) supplemented with 7.5% sheep blood, 0.4% Isovitalex, and H. pylori Dent supplement (Oxoid, United Kingdom). Plates were incubated at 37 °C in an atmosphere of 5% O 2 , 15% CO 2 , and 80% N2 for 3 to 7 days. H. pylori colonies were identified based on their typical morphology, characteristic appearance on Gram staining, a positive urease test, and subsequently confirmed by MALDI_TOF (Bruker, Germany). Isolates were stored at minus 80 °C in 0.5 ml of brain heart infusion (BHI) broth with 20% glycerol.

Genomic DNA extraction and genome sequencing
Revived isolates were subcultured on selective BHI solid medium containing 7.5% sheep blood and 0.4% isovitalex under microaerophilic conditions (5% O 2 , 15% CO 2 , 80% N 2 ) at 37 °C for 3-5 days [28]. Genomic DNA was prepared from confluent growth using a commercial DNA extraction kit (Qiagen DNA Mini kit, Germany). Genomic libraries were prepared using the Nextera DNA sample preparation kit (Illumina, San Diego, USA). Library sequencing was performed on the Illumina MiSeq instrument using the V3-600 cycle, paired-end kit (Illumina, CA. USA). Readsets for isolates sequenced as part of this study are available at National Center for Biotechnology (NCBI) under BioProject PRJNA689207 https:// www. ncbi. nlm. nih. gov/ biopr oject/ PRJNA 689207

Phylogenetic analysis
Forty-two [42] reference H. pylori genome sequences representing selected H. pylori populations were downloaded from the NCBI, details are shown in Additional file 1: Table S1. Reads from the reference strains and the isolates in this study were aligned to the H. pylori strain 26695 (Accession: NC_000915) reference genome sequence using the Burrows-Wheeler Aligner MEM (v 0.7.15-r1140) algorithm [36] as implemented in Snippy; the core genome alignment was used to construct an SNP-based phylogenetic tree using FastTree [37]. SNPs were identified using Freebayes (v1.0.2) under a haploid model, with a minimum depth of coverage of 10× and allelic frequency of 0.9 required to confidently call an SNP [38]. The phylogenetic tree was visualized using MEGA-X [39].

Core genome and pan-genome analysis
OrthoMCL was used to identify orthologous clusters using predicted protein sequences from each of the studied isolates (minimum threshold of 50 amino acids in length with identity and e-value parameters were at 70% and 0.00001 respectively) [40]. The identified clusters were aligned against the EggNOG database to predict a functional category. Clusters that contained proteins with more than one domain with distinct categories were assigned multiple categories. The functional categories were graphically represented using R (http:// www.Rproje ct. org). Proteins that could not be classified were assigned to category S (hypothetical). Graphical overviews of categorized strain-specific genes were produced using R.

Statistical analysis
Data analysis was performed using Statistical Package for Social Science (SPSS) software (IBM SPSS Statistics 23, NY USA). Baseline descriptive statistics were summarized for the variables of interest. Comparisons between groups were performed using either the chi-squared or Fisher's exact tests for categorical variables; t-tests and the Mann-Whitney U-test were used for continuous variables. A two-sided P value of > 0.05 was considered statistically significant.

Ethics statement
The ethical review committee of the National University Ho Chi Minh City, Vietnam approved the study (Approval No: 702/DHQG-KHCN). Written informed consent was mandatory for patient enrolment in the study. For patients < 18 years, written informed consent was obtained from a parent or guardian.

Patient population
One

Genome characteristics
Summaries of the read data set and draft genome for each of the 83 H. pylori isolates are presented in Table 1.
The read depth coverage in each of the isolate read sets ranged from 38-456×. The draft genome sequences comprised of between 16 and 83 coting's. Overall, the average genome size was 1.6 Mb with 38.94% G + C content. For each isolate, the annotated genome sequence comprised between1451 and 1589 protein coding regions (CDS) with ~ 92% of the genome used for protein coding. Single and incomplete phage associated region (8.1-13.5 kb) was detected in 17% (14/83) of the draft genome sequences. The phage sequences consist of between nine and 14 CDSs that encode either putative restriction-modification protein, TMP kinase, PcrA helicase, putative transposase, or other hypothetical proteins in addition to phage related genes (Additional file 3: Table S3).

Core and pan-genome analysis
The core-and pan-genome analysis by OrthoMCL identified 1,194 orthologous clusters (core genome) from the 119,366 annotated proteins in the 83 isolates. Among these 1070 orthologous clusters (core genome) were assigned functional categories using EggNOG database (Fig. 1a). A high proportion (12.7%, 136/1,070 and 7.7%, 83/1070) of the classified clusters belonged to the J (translation, ribosomal structure, and biogenesis), and M (cell membrane/envelope biogenesis) functional category, respectively. Proteins with no orthologues were detected in a small number of isolates, 26% (31/83) isolates contained either one or two proteins of this type. Most of these unique proteins were V (defence mechanism) or S (hypothetical) functional categories (Fig. 1b).

Phylogenetic analysis
The genomic relationship between the 83 study isolates and 42 reference genome sequences for which the H. pylori population type was known was inferred from the core genome using the H. pylori strain 26,695 (Accession: NC_000915) as the reference genome sequence for read mapping. The tree shown in Fig. 2 provides a visual summary of the relationship between isolates. The core genome comparison showed that 80% (66/83) of the isolates were part of the H. pylori hspEastAsia population and the remaining 20%, 17/83 of isolates were part of the H. pylori hpEurope population based on the core genome relationship with the 42 classified isolates (Fig. 2).

Virulence factors
Virulence factors detection using the VFDB showed that 80% (66/83) Vietnamese isolates harboured between 110     Vietnamese isolates are indicated by black terminal branches, while classified isolates are shown with coloured terminal branches as follows: hspEastAsia (blue), hpEurope (brown), hspWAfrica (pink), hpNEAfrica (purple), hspAmerind (orange) and hpAsia2 (green). The tree was inferred using the core genome comparison method as implemented in Nullarbor with H. pylori strain 26695 (Accession: NC_000915) used as the reference genome sequence for read mapping. The tree was modified using tools available in FigTree and MEGA-X Fig. 3 At the left there is a tree showing the core genome relationship between the 83 Vietnamese isolates. The virulence gene content for each of the isolates is colour coded at the right. Virulence genes detected are those present in the VFDB, virulence genes were detected using Abricate.
(Green shows genes detected with less than 90% gene coverage, while Orange shows genes detected with greater than 90% gene coverage. Purple shows the gene was not detected The virulence properties of the isolates are presented in Table 2. A complete CagPAI was present in 80% (66/83) of the genomes; of these, 97% (64/66) CagPAI positive isolates belonged to hspEastAsia population and the remaining 3% (2/66) belonged to hpEurope population ( Table 2).Among 17 hpEurope isolates, 15 were Cag-PAI negative. Most of the CagPAI positive hspEastAsia and hpEurope isolates lacked an orthologue to the DNA helicase (HP0548) present in the Western-type CagPAI sequence found in H. pylori strain 26695 (Fig. 4).
Sequence analyses of the second repeat region of the cagA gene revealed that 95% (63/66), including two hpEurope isolates were of the ABD type, while the remaining three isolates (all hspEastAsia) were EPIYA-ABC or EPIYA-ABCC types ( Table 2). Two hpEurope isolates had ABD type second repeat region of the cagA gene, which is an atypical characteristic of hpEurope strains. We also found 5% (3/63) of isolates containing an East Asian type cagA contained EPIYA-like sequences, ESIYA at EPIYA-B segments. Three vacA types were detected among the Vietnamese isolates, 34 isolates were s1m1 type, 48 isolates were s1m2 type and one isolate was s2m2 type. The most frequent genotypes among the cagA positive isolates were vacA s1m1/cagA + and vacA s1m2/ cagA + , accounted 51.5% (34/66) and 48.5% (32/66) of isolates, respectively.

Discussion
H. pylori infection is associated with the development of gastric disease in the host; the frequency of infection and frequency of disease in the host varies across the world but there is an association between particular H. pylori genetic types in particular geographic regions with the disease. Developing effective strategies to manage H. pylori-associated disease relies on understanding the local H. pylori populations. This in conjunction with the significant H. pylori-associated disease burden in Vietnam highlights the important knowledge gap addressed by this study. Herein, we present genomic and epidemiological data for 83 Vietnamese H. pylori isolates. The frequency of H. pylori isolation was 59% (95/161) from the biopsies of symptomatic patients. This is similar to the result of earlier studies, where 270 randomly selected patients who underwent esophagogastroduodenoscopy at the endoscopy centres at either of two major hospitals in Hanoi and Ho Chi Minh (the biggest city in Northern and Southern Vietnam, respectively) [27]. Our phylogenetic data show that most H. pylori isolates from symptomatic Vietnamese patients are from the hspEastAsia population (80% of isolates). The dominance of the hspEastAsia population is consistent with the H. pylori population being strongly associated with human migration [16] where historical and emigrational evidence suggests the Vietnamese are more related to people from North Asia than to people from South Asia [44]. Moreover, migratory patterns with North Asia would have been influenced by the fact that Vietnam was under Chinese occupation for over a thousand years. Notably, a group of the Vietnamese isolates form an exclusive clade within the hspEAsia population, perhaps indicating that the Vietnamese were isolated from other South East Asian populations for an extended period; this may be supported by a study by Breurec et al. showing Khmer and Vietnamese isolates as deep branching members of the hspEastAsia H. pylori population [45]. More extensive sampling of H. pylori in the region would be required to confirm a H. pylori subpopulation for Vietnam. The Vietnamese H. pylori isolates that are part of the hpEurope population are likely to have arisen through the French colonial occupation of Vietnam and other parts of South East Asia during the 19th and early 20th centuries. We observe a small number of isolates that appear to be related to the representative isolates from the hpNEAfrica or hspWAfrica population used in our comparative analysis (Fig. 2). Another possibility is that these isolates are recombinant hybrids arising from the endemic hspEastAsia and hpEurope population strains now present in Vietnam [45]. The prevalence of H. pylori infection has been reported in between 50 to 80% in several studies conducted in adults in Vietnam, this is similar to Japan, Korea or China, and other South Asian nations [9-11, 46, 47]. The genetic characteristics and diversity of Vietnamese H. pylori strains could be a factor contributing to the high incidence of gastric cancer in Vietnam. Evidence indicates that the isoforms of vacA and the type and number of the EPIYA motifs in the cagA gene strongly influence the type and magnitude of the histological damage of the gastric mucosa. For example, the vacA s1m1 genotype has been associated with intestinal metaplasia, severe inflammation and a high risk of gastric cancer [20,48,49]. In this study, the s1m1vacA allelic combination was detected in 41% of isolates. In addition, East Asian cagA, which is more prevalent in Vietnamese isolates is more frequently associated with disease than Western cagA [20,50,51]. This study revealed a lower frequency of cagA than previous reports on Vietnamese H. pylori [52][53][54][55] which may contribute to the lower rates of gastric ulcer and gastric cancer observed in Vietnam. In dyspeptic patients from central Vietnam, the frequency of cagA + strains was 84% [54]. In H. pylori strains from Southern Vietnam with gastric cancer and peptic ulcer, all strains were cagA positive [52]. In this study, the cagA was found frequently with the vacA s1m1 allelic type (51.5%, 34/66), which is consistent with previous reports from South or North Vietnam isolates [27,55]. The most frequent EPIYA motif found in our isolates was ABD (96.6%; 63/66), which is similar to previous reports from Vietnamese patients with the gastric disease [52,55]. However, these frequencies were different in central Vietnam isolates, where vacA s1m1/ cagA + genotype was detected in 64.86% (48/74) of isolates and the cagA-ABD motif was found in a lower proportion (91%) [54].
We observed that 88.2% (15/17) of hpEurope isolates were either negative or possibly lost their cagA during the course of evolution or, if present, they had ABD type EPIYA-motif. The presence of ABD type EPIYAmotif pattern is an atypical characteristic of hpEurope strains where ABC type EPIYA-motif is more prevalent. The gene content and organization of genes of cagPAI are highly conserved. The phylogeny of most cagPAI genes including cagA was found to be similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori [56]. Recombination events during mixed infection have been identified as a major driving force behind allelic diversity in H. pylori cagPAI largely reflects that of H. pylori's housekeeping genes being under diversifying selection or positive selection due to host polymorphisms which could even result in modified host protein interactions [56]. Accordingly, hpEurope and hspEastAsia strains are expected to carry a Western and an East Asian cagA respectively. A prominent example of amino acid diversity noted previously are the EPIYA motifs in the C-terminal half of cagA, which differ between Asian (hpAsia2; hspEastAsia) (type D) and all other populations [57]. The D type EPIYA repeat binds SHP-2 phosphatase more avidly than other types [22]. Furthermore, Furuta Y. et al. also clarified the recombination-mediated routes of cagA evolution and provided a solid basis for a deeper understanding of its function in pathogenesis [58]. Based on this observation, the predominant host may be applying a selective pressure on Vietnamese hpEurope strains for the ABD type cagA that is normally observed in the cagA of hspEastAsia lineage strains.

Conclusions
Our study confirmed the high prevalence of H. pylori infection and the most virulent genotypes combination vacA s1m1/cagA + in H. pylori isolates recovered from Vietnamese symptomatic patients, which may explain the higher incidence rate of gastric cancer in Vietnam. Our data on the genetic architecture of H. pylori strains isolated from symptomatic Vietnamese patients showed two predominant lineages, with the majority of isolates belonging to the hspEastAsia population. However, there is another group of Vietnamese isolates that is part of hpEurope population. Interestingly, the hpEurope population isolates are divided into two subclusters. Although phylogeny has been improved by increasing the number of genes analyzed, analyses of a limited number of genes cannot uncover more complex evolutionary events. Our study also has a limitation that almost all our enrolled patients were in the early stage of gastric diseases, so we could not explore the interaction between H. pylori genotypes and their outcomes.