Insights about the epidemiology of Salmonella Typhimurium isolates from different sources in Brazil using comparative genomics

Background Salmonella enterica subsp. enterica serovar Typhimurium (S. Typhimurium) is an important zoonotic agent worldwide. The aim of this work was to compare genetically 117 S. Typhimurium isolated from different sources over 30 years in Brazil using different genomics strategies. Results The majority of the 117 S. Typhimurium strains studied were grouped into a single cluster (≅ 90%) by the core genome multilocus sequence typing and (≅ 77%) by single copy marker genes. The phylogenetic analysis based on single nucleotide polymorphism (SNP) grouped most strains from humans into a single cluster (≅ 93%), while the strains isolated from food and swine were alocated into three clusters. The different orthologous protein clusters found for some S. Typhimurium isolated from humans and food are involved in metabolic and regulatory processes. For 26 isolates from swine the sequence types (ST) 19 and ST1921 were the most prevalent ones, and the ST14, ST64, ST516 and ST639 were also detected. Previous results typed the 91 S. Typhimurium isolates from humans and foods as ST19, ST313, ST1921, ST3343 and ST1649. The main prophages detected were: Gifsy-2 in 79 (67.5%) and Gifsy-1 in 63 (54%) strains. All of the S. Typhimurium isolates contained the acrA, acrB, macA, macB, mdtK, emrA, emrB, emrR and tolC efflux pump genes. Conclusions The phylogenetic trees grouped the majority of the S. Typhimurium isolates from humans into a single cluster suggesting that there is one prevalent subtype in Brazil. Regarding strains isolated from food and swine, the SNPs’ results suggested the circulation of more than one subtype over 30 years in this country. The orthologous protein clusters analysis revealed unique genes in the strains studied mainly related to bacterial metabolism. S. Typhimurium strains from swine showed greater diversity of STs and prophages in comparison to strains isolated from humans and foods. The pathogenic potential of S. Typhimurium strains was corroborated by the presence of exclusive prophages of this serovar involved in its virulence. The high number of resistance genes related to efflux pumps is worrying and may lead to therapeutic failures when clinical treatment is needed.

Background Nontyphoidal Salmonella (NTS) strains have been an important enteric agent transmitted mainly by contaminated foods worldwide [1]. According to Kirk and collaborators [2], it was estimated that 153 million infections and 56,969 deaths occurred around the globe due to salmonellosis in 2010. Moreover, data from the Centers for Disease Control and Prevention (CDC), estimated that 1.35 million infections, 26,500 hospitalizations and 420 deaths occur in the United States every year due to Salmonella [3].
In Brazil, Salmonella has been the first or second most common foodborne pathogen isolated from outbreaks in recent years [4]. However, until now there are few published studies that have characterized the possible differences between Brazilian Salmonella enterica subsp. enterica serovar Typhimurium (S. Typhimurium) strains isolated from human, food and animal sources by whole genome sequencing (WGS).
Salmonella Typhimurium is one of the main Salmonella generalist serovar, which has been isolated from pork in Europe, Oceania, Asia and North America, from poultry in North America and Oceania, from beef in Africa, Latin America and Europe, and from seafood in Europe [5]. Therefore, this serovar has been transmitted from animals and humans in different parts of the world and is characterized as a zoonotic agent causing losses of million of dollars for the pork, poultry and beef producing industry [1,6].
According to the CDC, S. Typhimurium can also infect domestic pets and recently was responsible for an outbreak linked to contact with small pet turtles that affected 35 people from nine states and generated 11 hospitalizations [7].
WGS has been more accessible in the last few years and is used for molecular characterization studies [8]. Furthermore, different phylogenetic strategies can be performed after sequencing, such as construction of phylogenetic trees based on the core genome multilocus sequencing typing (cgMLST), from single copy marker genes and from single nucleotide polymorphism (SNPs), besides comparison and analysis of orthologous protein clusters (OrthoVenn) and verification of the sequence type (ST) through multilocus sequence typing (MLST) [9][10][11]. In addition, it has been possible to characterize the different prophages that contribute to Salmonella pathogenicity including identification of genes known to have functions such as virulence, metabolism and signaling [12].
It is important to emphasize that the monitoring of resistant NTS strains has been of great importance due to its continued emergence worldwide [13,14]. According to Jajere, multidrug resistant (MDR) Salmonella has been a serious public health problem because it may lead to treatment failure when the uses of antimicrobial drugs are necessary [14]. In the United States, it was estimated that 212,500 infections and 70 deaths occur due to drug resistant NTS every year [13].
It is known that hundreds of genes can confer resistance to antibiotics in NTS and some were previously described for the S. Typhimurium strains isolated from humans and different foods in Brazil including genes related to resistance to aminoglycosides, tetracyclines, sulfonamides, trimethoprim, beta lactams, fluoroquinolones, phenicol and macrolides [15]. However, antibiotic resistance is multifactorial and little is known about resistance genes related to efflux pumps, which can be an important factor that confers resistance to some antibiotics, such as fluoroquinolones, beta lactams, macrolides and aminoglycosides [15,16].
The aim of this work was to compare genetically S. Typhimurium isolates from humans, food and swine in Brazil from over 30 years using different genomics strategies, such as phylogenetic trees, protein orthologous clusters analysis, MLST, prophages and resistance genes related to efflux pump.

cgMLST
The cgMLST grouped the 120 S. Typhimurium genomes studied, which included the three references analysed in two main groups designated A and B (Fig. 1). Cluster A comprised 12 genomes of ST19 isolated from humans. Cluster B comprised a total of 108 genomes comprising strains isolated from humans, different foods and swine of ST19, ST1649, ST3343, ST1921 and ST313 in the case of strains isolated from humans and food, besides ST19, ST639, ST14, ST516, ST64 and ST1921 concerning strains isolated from swine. All three references were allocated in Cluster B. The CFSAN033848 and CFSAN033855 genomes isolated from humans were genetically distinct and did not group closely to any other isolates.

Phylogenetic tree (ggTree) and orthologous protein clusters analysis
The ggTree grouped the 120 S. Typhimurium genomes studied, which included the three references analysed, in Seribelli et al. Gut Pathog (2021) 13:27 Fig. 1 Phylogenetic analysis with cgMLST profiles based on soft core of 3002 genes selected for 117 Salmonella Typhimurium genomes isolated from humans (n = 43), foods (n = 48) and swine (n = 26) in Brazil three groups designated A, B and C with cluster A subdivided in A.1 and A.2, cluster B subdivided in B.1 and B.2 (Fig. 2). Cluster A.1 comprised 84 genomes of ST19, ST1649, ST14, ST516, ST639, ST64, ST313, ST3343 and ST1921 isolated from humans, diverse foods and swine and the reference genomes. Cluster A.2 comprised nine genomes of ST19 isolated from humans, food and swine. Cluster B.1 comprised 20 genomes of ST19 from food and swine. Cluster B.2 comprised four genomes of ST19 isolated from human and food. Cluster C comprised three genomes of ST19 isolated from food and swine.
The orthologous protein clusters analysis was performed for the genomes that were more related to LT2, 14028S and D23580 references (Fig. 2). The comparisons indicated the orthologous protein clusters presented in the genomes of the strains of this study and absent in the references. The different unique orthologous protein clusters found are involved in metabolic and regulatory processes showed in detail in Table 1.

snpTree
The snpTree grouped the 120 S. Typhimurium genomes studied, which included the three references analysed, in three groups designated A, B and C (Fig. 3). Cluster A comprised 81 genomes of ST19, ST14, ST516, ST639, ST64, ST1649, ST313, ST3343 and ST1921 isolated from humans, food and swine, plus all three references. Cluster B comprised 28 genomes including one strain isolated from human and 27 strains isolated from different foods and swine of ST19. Cluster C comprised seven genomes including one strain isolated from human and six strains isolated from food of ST19. The CFSAN033890 genome isolated from human was genetically distinct and did not group closely to any other isolates.

Discussion
In this study, 117 S. Typhimurium isolates from humans (n = 43), food (n = 48) and swine (n = 26) in Brazil were compared using genomic analyses, such as phylogenetic trees, orthologous protein clusters detection, MLST analysis, and blast identification of prophages and resistance genes related to efflux pumps.
The majority of the 117 S. Typhimurium strains studied were grouped into a single cluster (≅ 90%) by the core genome multilocus sequence typing and (≅ 77%) by single copy marker genes (Figs. 1 and 2). The phylogenetic analysis based on single nucleotide polymorphism (SNPs) grouped most strains from humans into a single cluster (≅ 93%), while the strains isolated from food and swine were grouped into three clusters (Fig. 3). Therefore, snpTree was more efficient at discriminating S. Typhimurium isolates from swine and different foods in Brazil.
It is important to mention that the present study provided additional information about S. Typhimurium strains isolated from humans, food and swine in Brazil because such strains have rarely been studied in a one health perspective combining all available data [15,[17][18][19].
Previous studies performed by our research group using different molecular typing techniques (PFGE, MLVA and CRISPR-MVLST) and a SNP-based tree by the CFSAN pipeline corroborated with the finding of snp-Tree by CSI Phylogeny 1.4. indicating the possible presence of a prevalent subtype for S. Typhimurium strains isolated from humans and with more than one circulating subtype for strains isolated from food [15,[17][18][19].
According to Jensen, homologous genes can be divided into orthologous and paralogs genes [20]. Orthologous genes originated from a common ancestor during speciation events and keep the same function, while, paralogs genes originated from duplication events and do not maintain the same function [21]. Therefore, the OrthoVenn2 is a web server capable to annotate and compare orthologous protein clusters from the whole genome among different species [21]. In the present study, S. Typhimurium genomes were compared to LT2, 14028S and D23580 references and had their unique protein orthologous clusters determined (Fig. 2). All S. Typhimurium isolates compared to LT2 (Comparisons 1, 2, 3 and 4) were composed of ST19 isolated from humans in the São Paulo State before the 1990s. There were some unique orthologous protein clusters, including transposition (DNA-mediated), transposition, viral genome integration into host DNA and Trehalose transport that were commonly present in these strains, but absent in the corresponding LT2 reference strain. The S. Typhimurium isolates compared to 14028S contained ST19 strains isolated from food in the Rio Grande do Sul, Santa Catarina and Bahia States between 2006 and 2012 (Comparison 5). The S. Typhimurium isolates compared to D23580 contained ST313 and ST19 strains isolated from humans and food in the São Paulo and Paraná States between 1995 and 2010 (Comparisons 6 and 7).
The different orthologous protein clusters found are involved in metabolic and regulatory processes, such as transposition, DNA replication, cell adhesion, formate oxidation, trehalose transport, lyase activity and response to mercury ion. These results showed that despite being of the same serovar there are unique orthologous protein clusters in the strains studied in comparison to the reference strains which were maintained in these S. Typhimurium strains during natural selection and adaptation (Table 1).
In this study, MLST was performed only for swine isolates, because the STs for humans and food isolates were previously described by Almeida et al. [22]. Of the 26 S. Typhimurium strains isolated from swine studied, 16 (61.5%) belonged to the ST19, three (11.5%) to the ST1921, two (7.6%) to the ST14, two (7.6%) to the ST64, one (3.8%) to the ST516, one (3.8%) to the ST639 and one did not have its ST detected. Previous works showed that the ST19 was the most common ST found for strains of human and food origins, with ST313 being the second most prevalent and ST1921, ST3343 and ST1649 were also detected among these strains [22]. S. Typhimurium isolates from swine showed greater diversity in the seven housekeeping genes studied despite having a lower number of strains (n = 26) in comparison to the number of S. Typhimurium strains isolated from humans (n = 43) and food (n = 48). ST19 was the most commonly observed in swine, with ST1921 the second most prevalent and with ST14, ST64, ST516 and ST639 also observed.
For ST19 it has been reported 29,572 Salmonella isolated from human, reptile, ovine, swine, poultry, food and bovine from France, Mexico, China, Germany, Scotland, Portugal, Qatar, Korea, Ireland, United States (US), United Kingdom (UK) and Denmark according to the Enterobase (12/15/2020). The ST313 has been linked to 3049 samples isolated predominantly from humans in Kenya, Ethiopia, Zimbabwe, Malawi, Mali and Nigeria [10].
It is important to emphasize that the classic MLST sequencing scheme uses only seven housekeeping genes to determine a sequence type (ST) from the nucleotides differences found in the sequences of all alleles [23]. Furthermore, cgMLST focuses on the nucleotide differences between the set of 3002 conserved genes of Salmonella genus [10]. It is known that the ST19 has been more prevalent in S. Typhimurium strains which causes predominantly gastroenteritis worldwide, suggesting that in the tree based on cgMLST there is a greater diversity in the 3002 conserved genes because S. Typhimurium strains isolated from humans of this ST were found in the cluster A and B (Fig. 1).
According to Brussow et al. [27], the Fels-1 prophage encodes the sodCIII and nanH genes related to the production of superoxide dismutase and neuraminidase in S. Typhimurium, respectively. Furthermore, the Fels-2 prophage carries genes that are apparently related to regulation and adhesion of S. Typhimurium to host cells [12].
The Gifsy and Fels prophages have already been described in S. Typhimurium isolated in various parts of the world, such as Australia, Europe, China, among others [28][29][30]. It is important to emphasize that other prophages were also found in the S. Typhimurium strains studied including Salmon 118970_sal3 and Haemop-HP1 (Table 2). Moreover, two dozen other prophages were detected in the S. Typhimurium strains studied, but there is less information about them related to pathogenicity and/or virulence of this serovar ( Table 2).
In addition, S. Typhimurium isolates from swine showed 6 (23.1%) unique prophages despite having a lower number of strains analysed (n = 26) in comparison to S. Typhimurium strains isolated humans (n = 43) and food (n = 48) that presented 7 (16.3%) and 3 (6.25%) unique prophages, respectively, suggesting the greater diversity in these mobile genetic element for S. Typhimurium strains isolated from swine in Brazil (Table 2).
Resistance to multiple drugs in bacteria has been a serious public health problem worldwide [31]. It is known that there are four main mechanisms that can cause this resistance, such as target alteration, drug inactivation, decreased permeability and drug expulsion through the production of efflux pumps [32].
In the present study, the acrA, acrB, macA, macB, mdtK, emrA, emrB, emrR, tolC, mdsA, mdsB, mdfA and cmlA1 genes were detected among the S. Typhimurium strains isolated from humans, food and swine. All of the isolates contained the acrA, acrB, macA, macB, mdtK, emrA, emrB, emrR and tolC genes (Table 3). Other genes related to production efflux pump, such as oqxAB and floR were previously reported in [15]. The AcrAB efflux system has been described as responsible for the intrinsic resistance to many antibiotics that can be used in medical practice for the treatment of S. Typhimurium, such as fluoroquinolones and beta-lactams [16]. According to the World Health Organization (WHO), Salmonella spp. was described as a high priority category pathogen in fluoroquinolones resistance of the Global Priority Pathogens List [31].

Conclusions
The phylogenetic trees grouped the majority of the S. Typhimurium isolates from humans into a single cluster suggesting that there is one prevalent subtype in Brazil. Regarding strains isolated from food and swine, the results by SNPs analysis suggested the circulation of more than one subtype over 30 years in this country. The orthologous protein clusters analysis revealed unique genes in the strains studied mainly related to bacterial metabolism. S. Typhimurium isolates from swine showed greater diversity of STs and prophages in comparison to S. Typhimurium strains isolated from humans and food. The pathogenic potential of S. Typhimurium strains was corroborated by the presence of exclusive prophages of this serovar involved in their virulence. The high number of resistance genes related to efflux pump is worrying and may cause therapeutic failures when clinical treatment is needed. Altogether, this study provided relevant data on the genomic characterization of S. Typhimurium strains isolated from different sources in Brazil using WGS.

Bacterial strains
A total of 117 S. Typhimurium strains isolated from humans (43), food (48) and swine (26) between 1983 and 2013 in Brazil were studied (Table 4). These strains were selected from the collections of the Adolfo Lutz Institute of Ribeirão Preto (IAL-RP), of the Oswaldo Cruz Foundation from Rio de Janeiro (FIOCRUZ-RJ) and of the Brazilian Agricultural Research Corporation (EMBRAPA).

Whole genome sequencing
The DNA of the 117 S. Typhimurium strains was extracted according to Campioni and Falcão using phenol-chloroform-isoamyl alcohol method [36]. Libraries were prepared using 1 ng of genomic DNA with the Nextera XT DNA library preparation kit (Illumina, San Diego, CA, USA) and the genomes were sequenced using the NextSeq 500 desktop sequencer with the NextSeq 500/500 high-output version 2 kit (Illumina) for 2 × 151 cycles according to the manufacturer's instructions at the U.S. Food and Drug Administration (FDA), College Park, Maryland, USA. The genomes were assembled using the software SPAdes and CLC Genomics Workbench version 10.0.1 [37] and the quality of the assemblies were evaluated using the software QUAST [38]. The genomes ranged from 4.6 to 5.1 Mb in size, as described for other Salmonella strains [39]. Sequencing generated an average G+C content of 52.04%, which is similar to that reported previously for other Salmonella isolates [40]. The number of contigs per assembly for each isolate ranged between 47 and 827. Finally, the coverage (×) ranged from 13× to 753×. Detailed information on the sequencing of the 117 S. Typhimurium genomes can be found in Almeida et al. and Seribelli et al. [41,42].
cgMLST The cgMLSTFinder 1.1 analysis was determined from a set of reads for all 117 S. Typhimurium genomes and three different references of this serovar were chosen, which included LT2, 14028S and D23580 and compared using the services of the center for genomic epidemiology for Salmonella (Enterobase) available at https:// cge. cbs. dtu. dk/ servi ces/ cgMLS TFind er/ [10].

SNP tree
The phylogenetic tree based on SNPs of the whole genome sequencing was performed by CSI Phylogeny 1.4 (Call SNPs & Infer Phylogeny) of the Center for Genomic Epidemiology at https:// cge. cbs. dtu. dk/ servi ces/ CSIPh yloge ny/-following the parameters: select min. depth at SNP positions 10×, select min. relative depth at SNP positions 10%, select minimum distance between SNPs (prune) 10 bp, select min. SNP quality 30, select min. read mapping quality 25 and select min. Z-score 1.96 [45]. The SNPs matrix included was a maximum of 30,873 SNPs among all S. Typhimurium strains studied.

Multilocus sequence typing (MLST)
MLST was performed in the present study for the 26 S. Typhimurium isolates from swine using the MLST 2.0 of the Center for Genomic Epidemiology for Salmonella enterica available in https:// cge. cbs. dtu. dk/ servi ces/ MLST/ [46]. The seven housekeeping genes included: aroC, dnaN, hemD, hisD, purE, sucA and thrA [23,46]. The STs of the S. Typhimurium isolates from humans and different foods were previously described in Almeida et al. [22] and were performed in the same way as described above.

Prophages detection
The genomes of all 117 S. Typhimurium strains were used to search the prophages by PHAge Search Tool Enhanced Release (PHASTER) that is an online platform for the rapid identification and annotation of prophages sequences in bacterial genomes and plasmids available in http:// phast er. ca/ [47].

Efflux pumps
The genomes of all 117 S. Typhimurium strains were used to search for resistance genes related to efflux pump. Resistance gene identifier (RGI) is part of the Comprehensive Antibiotic Resistance Database (CARD) and was performed with high quality/coverage (includes contigs > 20,000 bp and excludes prediction of partial genes). Software is available at https:// card. mcmas ter. ca/ analy ze/ rgi [48].