Evolution of the GII.3[P12] Norovirus from 2010 to 2019 in Jiangsu, China

Objectives Norovirus genotype GII.3[P12] strains have been an important pathogen for sporadic gastroenteritis infection. In previous studies of GII.3[P12], the number of specimens and time span are relatively small, which is difficult to truly reflect the infection and evolution of this type of norovirus. Here we report a molecular epidemiological study of the NoVs prevalent in Jiangsu between 2010 and 2019 to investigate the evolution of the GII.3[P12] strains in China. Methods In this study 60 GII.3[P12] norovirus strains were sequenced and analyzed for evolution, recombination, and selection pressure using bioanalysis software. Results The GII.3[P12] strains were continuously detected during the study period, which showed a high constituent ratio in males, in winter and among children aged 0–11 months, respectively. A time-scaled evolutionary tree showed that both GII.P12 RdRp and GII.3 VP1 sequences were grouped into three major clusters (Cluster I–III). Most GII.3[P12] strains were mainly located in sub-cluster (SC) II of Cluster III. A SimPlot analysis identified GII.3[P12] strain to be as an ORF1-intragenic recombinant of GII.4[P12] and GII.3[P21]. The RdRp genes of the GII.3[P12] showed a higher mean substitution rate than those of all GII.P12, while the VP1 genes of the GII.3[P12] showed a lower mean substitution rate than those of all GII.3. Alignment of the GII.3 capsid sequences revealed that three HBGA binding sites of all known GII.3 strains remained conserved, while several amino acid mutations in the predicted antibody binding sites were detected. The mutation at 385 was within predicted antibody binding regions, close to host attachment factor binding sites. Positive and negative selection sites were estimated. Two common positively selected sites (sites 385 and 406) were located on the surface of the protruding domain. Moreover, an amino acid substitution (aa204) was estimated to be near the active site of the RdRp protein. Conclusions We conducted a comprehensive analysis on the epidemic and evolution of GII.3[P12] noroviruses and the results suggested that evolution was possibly driven by intergenic recombination and mutations in some key amino acid sites. Supplementary Information The online version contains supplementary material available at 10.1186/s13099-021-00430-8.

proteins VP1 and VP2, respectively. VP1 is composed of shell (S) and protruding (P) domains with the latter being further divided into P1 and P2 domains [3]. Residues in the P2 domain are involved in the interaction with antibodies and histo-blood group antigens (HBGAs), which are supposed to be receptors or co-receptors of norovirus [4].
Up to now, noroviruses are phylogenetically classified into 10 genogroups (GI-GX) and, among them, at least five genogroups (GI, GII, GIV, GVIII and GIX) could infect humans [5]. GII.3 is a common cause for sporadic infection in infants and children as the second major genotype after GII.4 in China [6,7]. To date, more reports focus on norovirus genotypes, such as GII.2[P16], GII.4, and GII.17 [P17], that cause outbreaks. However, studies on the evolution of GII. 3[P12] in detail are scarce [8][9][10][11]. In these studies, the number of specimens and time span are relatively small, making it difficult to truly reflect the infection and evolution of this type of norovirus. In this report, we performed a molecular evolutionary study on the GII.P12 RdRp and GII.3 VP1 regions of the noroviruses from fecal samples collected from pediatric patients from 2010 to 2019.

Stool specimens
Stool specimens were collected from hospitalized children with acute gastroenteritis in Suzhou Children's Hospital during the period between January 2010 and December 2019 in Jiangsu, China. Acute gastroenteritis was defined to have at least three diarrheic stools and/ or vomiting per day, caused by bacteria, viruses, fungi or parasites, but did not include cholera, dysentery, typhoid, or paratyphoid. The age of the participants was no old than 59 months, and the number of monthly samples was not less than 25. The specimen would be tested for norovirus, rotavirus, astrovirus and sapovirus but not for bacteria. All stool specimens were stored at −80 °C until use.

Amplification of the RdRp and VP1 genes
Stool suspensions were prepared as 10% (w/v) in saline solution. Viral nucleic acid was extracted and tested for infection of NoVs by using MagMAXTM-96 Viral RNA Isolation Kit (Applied Biosystems, Foster City, CA) and the Qiagen Probe RT-PCR Kit (Qiagen, Hilden, Germany) on a 7500 real-time PCR platform (Applied Biosystems, Foster City, CA) with primers as described previously [12]. Norovirus-positive samples were detected with a region of 1095 bp targeted in the ORF1/ ORF2 junction of the viral genome by one-step reverse transcription polymerase chain reaction (RT-PCR) for norovirus genotyping [12]. The norovirus genotyping tools used for analysis included the following resources online: http:// www. rivm. nl/ mpf/ norov irus/ typin gtool and https:// norov irus. ng. philab. cdc. gov.

Evolutionary analyses
Nucleotide and amino acid (aa) sequence alignment was performed using BioEdit software. Nucleotide substitution models were identified using the Akaike information corrected criterion (AICc) implemented in JModelTest v2.1.7 [13]. The best of three clock models (strict clock, relaxed clock exponential and relaxed clock log normal) and three tree prior models (coalescent constant population, coalescent bayesian skyline and coalescent exponential population) were estimated using the pathsampling/stepping stone-sampling marginal likelihood estimation method. The optimal dataset was estimated as the relaxed clock exponential and exponential tree prior models for RdRp, relaxed clock log normal and bayesian skyline tree prior models for VP1, respectively (Additional file 1: Table S1). Convergence was evaluated by the effective sample size by Tracer v1.7, and values more than 200 were acceptable. Evolutionary analyses, including substitution rate, the time to most recent common ancestor (tMRCA), Bayesian skyline plot analysis (BSP) and phylogenetic analyses, were estimated using the relaxed clock exponential and the uncorrelated lognormal relaxed molecular clock, HKY (RdRp) and GTR + G (VP1) substitution, and Bayesian skyline coalescent models in BEAST v1.8.0 [14]. Markov chain Monte Carlo (MCMC) sample chains were run for 1.5 × 10 8 steps for the RdRp and VP1 genes. The convergence of parameters was evaluated by Tracer v1.7.1, and FigTree v1.4.3 was used to view phylogenetic trees. The reliability of branches was supported by 95% HPDs. The relative frequency of amino acid occurrence (bits), grouped by amino acid chemistry (polar, neutral, basic, acidic, and hydrophobic) at informative sites were visualized using sequence logos (WebLogo v3; http:// weblo go. three pluso ne. com/) for each protein.

Recombination detection analyses
Recombination detection was performed with the software package RDP v4.97 and Simplot 3.5.1 [15,16]. RDP analysis was set up with an automated option including the following seven methods: RDP, GENE-CONV, BootScan, MaxChi, Chimaera, SiScan and 3Seq. For this study, only those sequences with events that were statistically significant (p ≤ 0.05) by at least five methods were reported as recombinants. The SimPlot analysis was performed by setting the window width and the step size to 200 bp and 20 bp, respectively.

Selective pressure analysis
Datamonkey was used to determine specific positive and negative selected sites by rates of nonsynonymous and synonymous change (dN/dS) at every codon within the GII.3 VP1 protein [17]. The data set contained 163 aligned norovirus GII.3 VP1 sequences with 548 codons. Four methods, including single likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), Fast, Unconstrained Bayesian AppRoximation (FUBAR), and mixed effects model of evolution (MEME), were chosen to detect selected sites. Sites under positive selection (dN > dS) were determined by a p value of < 0.05 (SLAC, FEL, and MEME) and posterior probability of > 0.9 (FUBAR). Negative selection sites (dN < dS) were estimated using SLAC, FEL, and FUBAR methods.

Mapping of amino acid substitutions of the RdRp and P protein
A structural model of the capsid protein sequence in GII. P12 and GII.3 genotype was viewed and edited using PyMol v1.8. The RdRp and P protein crystal structures [Protein Data Bank (PDB) accession number 1SH0 and 6IR5] of the GII.P4 and GII.3 noroviruses (GenBank code: AJ583672 and U02030) were used to construct the dimer of the RdRp and P domain (including the P1 and P2 domains), respectively. Amino acid substitutions, conserved motif, active sites and RNA binding sites were mapped on the RdRp structure [18]. Positive selection sites and HBGA binding sites were mapped on the P domain structure.

Statistical analyses
Data were type-entered into a database and analyzed using SPSS software version 18 (SPSS, Chicago, IL). Categorical variables were compared by Pearson χ 2 test, Fisher's Exact Test or Linear-by-Linear. Comparison between the two groups pairwisely was calculated by Bonferroni correction method. p < 0.05 were considered statistically significant.

Temporal changes in the detection frequencies of the GII.3[P12] NoV
During the study period, a total of 3667 specimens submitted for the detection of NoV and other diarrhea viruses were included for analysis. Annual specimens ranged from 300 in 2017 to 431 in 2012. We analyzed the temporal changes of the genotypes of the NoV strains circulated from 2010 to 2019 (Fig. 1)

Epidemiological and clinical features
All of the samples were tested for norovirus, rotavirus, astrovirus and sapovirus but not for bacteria. Of the 600 genotyped NoV-positive samples, 380 were isolated from males and 220 from females, resulting in a male-to-female ratio of approximately 1.73:1 and considerably higher (2.25:1) with respect to GII.3[P12] ( Table 1). GII.4 and other genotypes showed high constituent ratios among children aged 0-11 months and 12-23 months. However, GII.3[P12] showed a higher constituent ratio in age group 0-11 months (69.9%) than in 12-23 months (18.6%). NoVs could be detected throughout the year. The detection rate of GII. 3[P12] was the highest in winter (43.8%, 42/96), followed by spring (34.2%, 26/76), and there was significant difference between the two seasons. GII.4 was the highest in autumn (71.1%, 185/258), followed by summer (65.3%, 111/170), and there was significant difference between the seasons as well. There was, however, no significant difference between seasons among the GII.Other strains. The clinical features of vomiting and fever were analyzed in diarrhea patients. In each genotype, the constituent ratio of these two clinical symptoms was similar, about 47% and 24% for vomiting and fever, respectively.
Based on the relaxed clock exponential-RdRp sequences could be grouped into four major clusters (I, II, III and IV). Although all strains in this analysis had a GII.P12 RdRp gene except for an undefined ancestral RdRp genotype, six different VP1 genotypes were identified including GII.2, GII.3, GII.4, GII.10, GII.12 and GII.13. Cluster     We also used 60 full VP1 gene sequences of the GII. 3[P12] noroviruses collected in our study and 103 GII.3 VP1 sequences available in the GenBank to generate a time-scale evolutionary tree. As Fig. 2b shows, the GII.3 VP1 gene phylogeny contained three clusters (Cluster I-III). Although all strains in this analysis had a GII.3 VP1 gene, six different RdRp genotypes were identified, which included GII.P3, GII.P12, GII.P16, GII.P21, GII. P29, and an undefined ancestral RdRp genotype. Strains in Cluster I included the undefined ancestral RdRp genotype and Cluster II had the genotype GII.3 RdRp. Strains collected from 1975 to 1983 were defined as ancestral. The accession number of them was KY442319, KY442320, HM072042, HM072045, HM072046 and PQ379713. GII.P29 genotype strains were not clustered because only a few of them were available with a complete VP1 gene, and strains, however, were closer to the ones in Cluster III in the evolutionary tree. Cluster III contained three RdRp genotypes (GII.P12, GII.P16, and GII.P21.) detected during the years from 2000 to 2019, which were further split into two subclades (Sub C I and II). Strains in Sub C I and II were associated with the GII. P12 and GII.P21 RdRp. Specifically, the GII.P12 genotype strains were mainly in Sub C II and the GII.P21 genotype strains in Sub C I. The earliest strain of the GII.3[P12] strain (EU187437) was also in Sub C II. The GII.P16 genotype strains belonged to Sub C II and have been recently detected in different VP1 genotypes, including GII.2, GII.4 and GII.13. Overall, of the 103 strains from GenBank typed based on the RdRp gene sequence, the majority (58, 56.31%) possessed a GII.P12 RdRp, which were followed by the strains in GII.P21 (23,22.33%), GII. P3 (11,10.68%), and others (11,10.68%) RdRp.

Molecular phylogenetic characteristics of recombinant noroviruses
To characterize potential recombination events of the GII.3[P12] strains, a region of 1428 bp in the ORF1/ ORF2 junction of the viral genome was analyzed by Sim-Plot v3.5.1 (Fig. 3) . The recombination breakpoint was identified at position nucleotide 801, corresponding to the nucleotide positioned at 5033 in the whole viral genome, which was localized in the ORF1 region for the strain. The same recombination event was also detected with the RDP v4.97 software using seven different methods (RDP, GENECONV, Chimaera, MaxChi, 3Seq, BootScan, and SiScan), which was confirmed to be statistically significant (p values < 0.01) with a Bootscan analysis.

Relative effective population size over time
A Bayesian skyline plot (BSP) coalescent model was used to measure the demographic history of the GII.P12 and GII.3 strains, which essentially plot Ne τ as a function of time. This model predicts the relative effective population size over time, when changes in effective population size reflect a change in genetic diversity. Notably, the effective population sizes of the GII.P12 strains (Fig. 4a) appeared to have two peaks, a large one around the year of 2003 and a small one around 2011, with two bottlenecks evident around 2002 and 2013. Considering the associated MCC tree (Fig. 2a), five different VP1 genotypes that comprised of the GII.2, GII.3, GII.4, GII.10 and GII.13 were observed in 2003, leading to a sudden gain of viral genetic diversity, which resulted in a steep increase in the skyline plot. The GII.P12 with the GII.3 VP1 genotype values (Fig. 4b) reached a peak around 2011, followed by a bottleneck around 2013, and then increased slightly, which coincided with the GII.P12 genotype as shown in Fig. 2a.
Population diversity was predicted to be better measured using the RdRp gene instead of the VP1 gene. The BSP obtained from the GII.3 capsid dataset (Fig. 4c) showed a pattern that was difficult to reconcile with the viral genetic diversity, which occurred to have five peaks

Gene evolution rates
The mean rate of nucleotide substitutions per site per year and the date of the most recent common ancestor were estimated separately for the GII.P12 RdRp and GII.3 VP1 data set based on the strict molecular clock model, which were listed in Table 2

Amino acid alignment
Informative amino acid sequences were aligned, including 122 complete RdRp sequences of the GII. P12 strains from 1996 to 2019 (Fig. 5a) and 163 VP1 sequences of the GII.3 strains from 1972 to 2019 (Fig. 5b, c). A conserved substitution was defined as   Sites 385, 389, and 406 may be important evolutionary sites since these sites were highly variable with three or four different conserved substitutions. In particular, site 385 was found peripheral to the HBGA-binding site II, also located in the predicted antibody binding sites, which may have an important effect on the viral antigenicity. Several unique amino acid mutations for SC-II-P12 of Cluster III (site 6, 175, 179 and 209) were found in NS region. Interestingly, more unique amino acid mutations for SC-I of Cluster III were found in NS region (site 24), P1 region (site 228, 476 and 510) and P2 region (site 364, 412 and 417), respectively.

Estimation of positive and negative selection sites in VP1
Selection pressure on each site in the capsid gene was analyzed for the GII. 3

Mapping of amino acid substitutions of the RdRp and P protein
In order to evaluate the possible effect of substitutions on protein function, we mapped the amino acid substitutions of GII.P12 RdRp genotypes onto three-dimensional structures of the RdRp protein (Fig. 6a). A total of 11 amino acid substitutions were identified among the GII.P12 strains. Of these, the substitution at aa204 was located close to the active site and no substitution was located close to RNA binding site (Fig. 6a). Furthermore, we mapped the HBGA binding sites in Fig. 5B, C (green color) and two common positively selected sites in Table 3 onto the GII.3 capsid dimer three-dimensional structure (Fig. 6b). The sites under positive selection were located on the surface of the protruding domain. Site 385, a predicted antibody binding site, was adjacent to site 386, an HBGA binding site. Of significance, the substitutions occurring at these sites may result in the emergence of the current epidemic GII.3 variant strains. These results suggest that selective pressure from hosts caused amino acid substitution of the virus.

Discussion
During our study period, the GII.3[P12] strains had never caused a large-scale outbreak. They had been involved in sporadic epidemics all the time. Many strains had disappeared or been replaced by newer ones over time and even GII.4 strains were replaced by new subtypes.
Although the detection rate of the GII.3[P12] strains was secondary to GII.4, they had never been replaced and disappeared. The GII.4 genotype had been dominant globally for two decades with the most recent one, Sydney2012, emerging in 2012 [19]. As the first non-GII.4 epidemic variant, GII.17 caused an increasing number of outbreaks in 2014/15 in China [11,12].
In 2016/17, another non-GII.4 epidemic variant, the GII.2[P16] strain, emerged and rapidly became the predominant genotype throughout the mainland China [9,   GII.4 subtypes somehow did not tip the balance. A recent report suggested that genotype specific herd immunity in infants and young children lasted for at least a few years, thereby influencing the endemic NoV genotype in the next season [20]. This observation is consistent with our results in this report. The detection rate of GII. 3[P12] increased every other year, which rarely led to an explosive epidemic. Based on RdRp and VP1 gene sequences, the branching pattern was shown to be associated with intergenic recombination within the ORF1/ORF2 region. Both GII. P12 and GII. 3 [23], which had been the predominant strains that caused local norovirus outbreaks. The recombination of GII.3 with the RdRp gene of the current popular strain may be a beginning of a new trend for NoV evolution. However, no more recombinant strains of this genotype have been found up to date.
The fitness of a virus is influenced by its genetic diversity [24]. GII.4 noroviruses have predominated for more than 20 years [1], which suggests that GII.4 noroviruses may possess greater viral fitness in human populations as compared to other norovirus genotypes [25]. GII.4 norovirus has a high VP1 gene evolutionary rate (5.33 × 10 -3 ), which may increase its ability to evade immune responses [26]. The VP1 gene of some rare genotypes, such as GII.17 and GII.2, showed considerable genetic diversity and evolved at a high rate of substitutions/site/year due to its possible acquisition of a novel polymerase [11,22]. We compared the substitution rate of GII.3[P12] associated with the GII.P12 RdRp and GII.3 VP1. Over time the GII.P12 and GII.3 genes had evolved under the influence of several recombination events, thus it was unlikely that the gene had evolved at a steady rate. As shown in Table 2, after the GII.3[P12] recombination, mean substitution rates of the GII.P12 RdRp increased while those of the GII.3 VP1 decreased. Because the 95% HPD scores were overlapping, p > 0.05, there was no statistical difference between the rates. However, we could still make some suppositions from the trend. The result may suggest that the recombination was more beneficial to the evolution of GII.P12 RdRp than to GII.3 VP1. The increased substitution rate conferred by the recombinant GII.3 VP1 probably produced a fitter virus, thus the GII.3[P12] emerged as a prevalent pediatric strain.
The substitution rate is a measurement of the underlying mutation rate combined with selective and transmission pressures but not a direct measurement of the mutation rate conferred by the RdRp [27]. An analysis of the substitution rate in the RdRp gene (rather than VP1) may better serve as an indication of the underlying RdRp mutation rate, since the RdRp protein should be less exposed to strong selective pressures [26]. Unlike GII.3 VP1, partial GII.P12 RdRp gene sequence was analyzed as a measure of the substitution rate, due to a low sample size with complete gene sequences. A previous report showed that densely sampled short RdRp sequences provided more reliable evolutionary data than fewer longer sequences [26]. In the present study, the VP1 genes of the GII.3 showed a lower mean substitution rate (4.132 × 10 -3 substitutions per site per year) than in a previous report (4.82 × 10 -3 substitutions per site per year) [21]. Several factors, such as sequence numbers, methodology, and collection dates, may affect the differences in the evolutionary rates between the previous and present data.
The effective population size may reflect virus genome populations in the host, which is predicted to be better measured using the RdRp gene instead of the VP1 gene [28]. Our results were in line with this view. Compared with GII.3 VP1 dataset, the BSP obtained from the GII. P12 RdRp dataset was easier to understand. The greatest peak in the population size was observed following a large bottleneck just prior to 2003, which was likely due to the gain of five different VP1 sequences. The population size of the polymerase type increased to the year when the VP1 genotypes of the strains were obtained, which was consistent with a previous report [25]. The second bottleneck around 2013, after a small peak around 2011, occurred probably because the number of the strains in both SCI and SCII increased significantly around 2011, while the strains in the SCI were rarely detected after 2013.
The driving force for the GII. 3[P12] norovirus evolution was not only intergenic recombination but also amino acid substitutions [29]. A previous study showed that the amino acids around the active sites regulate viral genome replication [30]. GII.P12 contains 11 amino acid substitutions, one (aa204) of which is adjacent to the RdRp active sites. The amino acid substitutions in the capsid protein primarily occurred, which was responsible for changed antigenicity and infectivity of the strains [31]. Three of 27 substitutions in the VP1 protein, sites 385, 389, and 406, were more prominent which were highly variable with three or four different conserved substitutions. These three sites had also been identified under positive selection.
Host defense mechanisms may lead to virus immune escape with mutations which are thought to the result of the positive selection [28]. We used four methods (FEL, IFEL, SLAC, and FUBAR) to make a candidate list of positively selected amino acid sites. The SLAC method is appropriate for detecting non-neutral evolution and may be a stricter algorithmic model for estimating positive selection sites [28]. Positive selection was estimated to occur at fifteen sites with amino acid substitutions even though the SLAC method only identified three of them. These results indicate that they were likely to be major sites of selection at a population level. Based on these four methods, two sites at 385 and 406 under positive selection were estimated in all GII.3 strains and GII.3[P12] strains, positioned on the outer surface loops of the protruding P2 domain in the VP1. The site 389 was not considered as a positive selection site in the GII.3[P12] strains. As shown in Table 3, there were no amino acid substitutions at site 389 in the GII.3[P12] strains, which was in accordance with the result of the positive selection. Site 385 under the positive selection was peripheral to the HBGA-binding site II, which was also located in the predicted antibody binding sites. A single amino acid change may be sufficient to cause immune escape from immunity [32]. A high variability of this site may be due to the pressure under immune selection.

Conclusion
In conclusion, we report an evolutionary analysis of the epidemic spread of the norovirus GII. 3[P12] strains in samples collected from the epidemic spread over a period of 10 years in Jiangsu province. Multiple genetic signatures linked to substitution rate, selective pressure, HBGA binding, and antigenic epitopes were identified. Our data indicate that the evolution of the GII.3[P12] strains was driven by intergenic recombination and amino acid substitutions. Faster evolutionary rates of the GII.P12 than GII.3 VP1 after the GII.3[P12] recombination revealed that the recombination was more beneficial to the evolution of the GII.P12 RdRp than the GII.3 VP1. Considering norovirus GII.3[P12] is still rapidly evolving after current epidemic spread and has been sporadically detected outside Asia, close attention to the mechanisms driving the evolution of norovirus strains is important for future noroviral control and prevention.
Additional file 1: Table S1. Clock and prior models test using pass sampling or stepping-stone sampling method.