Skip to main content

Transcriptome-wide association study identified candidate genes associated with gut microbiota



Gut microbiota is closely associated with host health and disease occurrence. Host genetic factor plays an important role in shaping gut microbial communities. The specific mechanism of host-regulated gene expression affecting gut microbiota has not been elucidated yet. Here we conducted a transcriptome-wide association study (TWAS) for gut microbiota by leveraging expression imputation from large-scale GWAS data sets.


TWAS detected multiple tissue-specific candidate genes for gut microbiota, such as FUT2 for genus Bifidobacterium in transverse colon (PPERM.ANL = 1.68 × 10–3) and SFTPD for an unclassified genus of Proteobacteria in transverse colon (PPERM.ANL = 5.69 × 10–3). Fine mapping replicated 3 candidate genes in TWAS, such as HELLS for Streptococcus (PIP = 0.685) in sigmoid colon, ANO7 for Erysipelotrichaceae (PIP = 0.449) in sigmoid colon. Functional analyses detected 94 significant GO terms and 11 pathways for various taxa in total, such as GO_NUCLEOSIDE_DIPHOSPHATASE_ACTIVITY for Butyrivibrio (FDR P = 1.30 × 10–4), KEGG_RENIN_ANGIOTENSIN_SYSTEM for Anaerostipes (FDR P = 3.16 × 10–2). Literature search results showed 12 genes prioritized by TWAS were associated with 12 diseases. For instance, SFTPD for an unclassified genus of Proteobacteria was related to atherosclerosis, and FUT2 for Bifidobacterium was associated with Crohn’s disease.


Our study results provided novel insights for understanding the genetic mechanism of gut microbiota, and attempted to provide clues for revealing the influence of genetic factors on gut microbiota for the occurrence and development of diseases.


Gut microbiota is an enormous and complex ecosystem, which is closely associated with the host by affecting metabolism, immunity and other physiological functions [1, 2]. Numerous studies have suggested that the correlation of gut microbiota with the incidence of complex diseases. A case–control study showed the microbial pattern of women with breast cancer is different from healthy women in terms of bacterial type, relative abundance and function [3]. A cohort study of Indian Children found that the proportion of Firmicutes in Autistic Spectrum Disorder (ASD) children was higher than healthy children [4]. In addition, the gut microbiota might involve in modulation of body mass index and blood lipid level according to the LifeLines-DEEP population cohort study which consists of 893 subjects [5]. However, the mechanism of a large part of diseases induced by gut microbiota is still unclear, needing further research to elucidate.

The composition of the gut microbiota is shaped by multiple factors including environment, diet, medication as well as internal parameters [6]. In recent decades, great deal of evidence has indicated that host genetic factor plays indispensable role in shaping the gut microbial communities. Lim et al. found monozygotic twin pairs had more similar gut microbial communities compared with other family members, and 50 gut microbial taxa (58.8%) showed significant heritability among the 85 taxa identified with heritability estimates valued ranging between 13.1% and 45.7% [7]. Additionally, based on a large (n = 645) mouse advanced intercross line, microbial quantitative trait loci (mbQTLs) could significantly affect gut microbial taxa [8]. Moreover, microbial genome-wide association analysis (mGWAS) has been conducted in recent years to reveal loci related to the gut microbiota. According to a previous study, Lactococcus bacteria could be affected by single nucleotide polymorphism (SNP) rs2294239 in ZNRF3 gene, which is associated with body fat distribution [9].

The gut microbiota can be regarded as a trait affected by genetic factors [8]. Although GWAS has contributed to a great number of genetic clues related to complex diseases and traits, it has limitation in explaining how the genetic variations regulate gene expression alone because the SNPs identified mainly located in non-coding regions [10]. In recent years, expression quantitative trait loci (eQTLs) have been widely used to elucidate the influence of genetic variants at gene expression level [11]. Subsequently, integrated analysis of GWAS and eQTLs became practical in exploring the effect of gene expression on complex traits [12]. One such family of methods is transcriptome-wide association study (TWAS), which was conducted to impute expression from genetic data, showing great power to prioritize candidate genes of complex traits interested, and has been used to identify the associations between many diseases and genes [13]. For example, Liao et al. identified KAT2B and TMEM161B as causal genes for attention deficit hyperactivity disorder by TWAS [14]. Another TWAS detected 25 genes, including CELA3B, whose predictive expression was statistically significantly associated with pancreatic cancer risk [15]. To the best of our knowledge, no TWAS was applied in gut microbiota study until now.

In this study, we performed TWAS analysis and fine mapping of gut microbiota for multiple tissues by leveraging expression imputation from large-scale GWAS data sets. Subsequently, functional analysis was conducted for exploration of the biological functions and pathways of significant gene sets. Furthermore, we sorted out diseases associated with gut microbiota candidate genes by manually reviewing the literature.


mGWAS of gut microbiota

The human microbiota GWAS summary data were obtained from a study published by Hughes et al. [16]. The study projects consisted of 2223 individuals from the Flemish Gut Flora Project (FGFP) cohort. DNA was extracted from frozen fecal samples and used for 16S ribosomal RNA gene sequencing subsequently. Among 499 taxon-derived abundances in FGFP, 92 taxa met the analysis criteria, which were identified independent phenotypes. The presence/absence (P/A) phenotype (binary) and the zero-truncated (all zero values set as missing) abundance (AB) phenotype (continuous) were generated for taxa where > 5% of individuals in FGFP had an abundance measurement of zero. The genome-wide genotyping of FGFP was conducted using either the Human Core Exome v.1.0 array or the Human Core Exome v.1.1 array. Snptest.2.5.0 was used for association analysis. In brief, 157 microbial traits, including 62 presence/absence (P/A-HB) and 95 in abundance (AB-RNT) microbial phenotypes were included in the subsequent analysis. Detailed information on subjects, study design, statistical analysis and quality control can be found in the publication [16].

TWAS of gut microbiota

TWAS of gut microbiota was performed by FUSION software, which precomputed the gene expression weights of various tissues using a small set of individuals with both gene expression and genotype data. The cis-genetic component of expression was then imputed into much larger sets of phenotyped individuals according to SNP genotype data. In this study, we used Bayesian Sparse Linear Mixed Model (BSLMM) to calculate the SNP expression weight of a gene's 1-Mb cis loci [17]. Let w denotes the weights. Z denotes the scores of gut microbiota. L denotes the SNP-correlation matrix. The association testing statistics between predicted gene expression and each taxon was calculated as \({Z}_{TWAS}=w{{\prime}}Z/\left({w}^{{\prime}}Lw\right)1/2\). The imputed expression data can be regarded as a linear model of genotypes with weights based on the correlation between gene expression and SNPs in the training data, linkage disequilibrium (LD) among SNPs was considered [13]. Finally, the association between target traits and the expression level of genes was estimated by integrating analysis of mGWAS summary data with gene expression weights. The precomputed expression weights of tissues derived from the genotype-Tissue expression (GTEx) project were downloaded from FUSION websites ( Specific in this study, we used the sigmoid colon and transverse colon as reference panels. Following the recommendation in FUSION software [13], we generated the cleaned mGWAS summary statistics data by leverage LD reference panel for further analyses, and the mGWAS summary statistics have not been trimmed or thresholded before. The percentage of SNPs in the LD reference available in the FGFP mGWAS data was approximately 13.8% for each microbial trait. We implemented 2000 permutation tests for each FUSION analysis to reduce the inflation of by-chance QTL co-localization. In this study, the analytical permutation P value (PPERM.ANL) < 0.05 were considered to be significant.

TWAS fine mapping

Fine-mapping of causal gene sets (FOCUS) approach was performed for prioritizing genes with strong evidence for causality in TWAS analyses [18]. FOCUS integrates GWAS summary data and expression prediction weights estimated from the eQTL reference panel, considering the LD of all SNPs in the risk region, and finally estimates the probability (probability estimates of causality, PIP) of any given gene set to explain the TWAS signal [18] for each gene. The gene included in 90%-credible set is more likely to be causal than any other gene in the region. Consistently with TWAS analyses, the transverse colon and sigmoid colon were used as the reference panels in FOCUS analysis. The threshold for screening of mGWAS summary data was 1 × 10–5 [16].

Functional analyses

The gut microbiota related genes identified by TWAS (PPERM.ANL < 0.05) were used for functional analyses by Functional Mapping and Annotation (FUMA) online platform [19]. P values were calculated by FUMA for each Gene Ontology (GO) term and pathway. The FDR P value < 0.05 was considered as significant.

Verification of gene and disease association

The literature mining was performed to show the lists of diseases related to the genes. The PubMed ( was searched to identify whether the significant genes of each taxon identified by TWAS were the causal gene of the target diseases.


TWAS results

In total, the TWAS of 157 microbial traits were performed by FUSION. In presence/absence (P/A-HB) phenotype, 1693 genes were identified by TWAS for overall 62 microbial traits (Additional file 1: Table S1, Additional file 2: Table S2, Additional file 3: Table S3), such as TOB2P1 for Enterococcaceae in sigmoid colon (PPERM.ANL = 1.94 × 10–50), KCNIP3 for Veillonellaceae in transverse colon (PPERM.ANL = 8.35 × 10–33), WDR6 for Coprococcus in sigmoid colon (PPERM.ANL = 1.1 × 10–16). Accordingly, 2247 genes were detected for 95 microbial traits in abundance (AB-RNT) phenotype, such as WDR6 for Butyrivibrio in sigmoid colon (PPERM.ANL = 1.24 × 10–64), FBXO41 for Clostridium XlVa in transverse colon (PPERM.ANL = 1.47 × 10–21), CENPE for Veillonellaceae in sigmoid colon (PPERM.ANL = 2.30 × 10–17). Table 1 summarizes the top 20 significant genes associated with microbiota in two phenotypes, respectively.

Table 1 Top 20 candidate genes detected by TWAS in P/A and AB models

We summarized overlapped candidate genes for different microbial traits (Fig. 1, Additional file 4: Table S4), such as NDUFV3 for Lentisphaerae (HB), Bacteroidales (HB), Prevotella (HB), an unclassified genus of order Clostridiales (RNT), an unclassified genus of family Ruminococcacea (RNT), Victivallis (HB), Bacteroides (RNT), Sporobacter (RNT), an unclassified genus of phylum Bacteroidetes (HB), Chao diversity (RNT) and the number of genera observed (RNT); and SFTPD gene for Rhodospirillaceae (HB), Alphaproteobacteria (HB), an unclassified genus of phylum Proteobacteria (HB), Rhodospirillales (HB) and an unclassified genus of family Rhodospirillaceae (HB). Table 2 shows top 6 genes with the most repeats for microbial traits.

Fig. 1
figure 1

Top 14 overlapped candidate genes with the most repetitions in all microbial traits. Circos shows the top 14 candidate genes with the most repeats of all gut microbiota in transverse colon and sigmoid colon. The associations for each OTU with multiple genes are also exhibited. The labels on the left of the figure represent gene names, and the labels on the right are sorted alphabetically, representing different OTUs

Table 2 Top 6 overlapped candidate genes for different microbial traits

Fine mapping results

We performed fine mapping by FOCUS for 157 microbial traits with two reference panels, and finally found 11 genes included in 90%-credible sets, indicating the genes may causally associated with microbial traits (Table 3). Among them, 3 genes have been identified in TWAS analyses: HELLS for Streptococcus (RNT) (PIP = 0.685) in sigmoid colon, HELLS for Streptococcaceae (RNT) (PIP = 0.665) in sigmoid colon, ANO7 for Erysipelotrichaceae (RNT) (PIP = 0.449) in sigmoid colon, and STAG3L4 for Lachnospiraceae (RNT) (PIP = 0.171) in transverse colon.

Table 3 Potentially causal genes for microbial traits detected by FOCUS

Functional analyses results

The significant genes identified by TWAS for each microbial trait in the two tissues were subjected to functional analysis (Additional file 7: Table S7). Totally, we detected 94 GO terms in two phenotypes. For instance, GO_NUCLEOSIDE_DIPHOSPHATASE_ACTIVITY was significant for Butyrivibrio (RNT) (FDR P = 1.30 × 10–4), GO_CONDENSED_CHROMOSOME_CENTROMERIC_REGION was significantly associated with Acidaminococcus (HB) (FDR P = 1.17 × 10–3), GO_SPECTRIN_BINDING was detected to be correlated with Burkholderiales (RNT) (FDR P = 1.69 × 10–3), and GO_VACUOLE was associated with Enterobacteriaceae (RNT) (FDR P = 2.84 × 10–3).

FUMA also identified 11 pathways related to microbial traits, such as KEGG_RENIN_ANGIOTENSIN_SYSTEM for Anaerostipes (RNT) (FDR P = 3.16 × 10–2), KEGG_PURINE_METABOLISM for Veillonellaceae (HB) (FDR P = 7.35 × 10–3), KEGG_JAK_STAT_SIGNALING_PATHWAY for Enterococcaceae (RNT) (FDR P = 2.60 × 10–2). Table 4 shows the top 10 gene ontology terms and KEGG pathways of the significant genes.

Table 4 Top 10 significant GO and KEGG pathways for microbial traits

Association between candidate genes and diseases

The selected top genes in Tables 1 and 2 were searched on PubMed website to explore the possible relationship with diseases, and 12 genes were found to be associated with 12 diseases (Table 5). For instance, HELLS for Streptococcus in sigmoid colon was related to colorectal cancer [20], and SFTPD for an unclassified genus of Proteobacteria in transverse colon was detected to be related to atherosclerosis [21]. Specifically, although not included in the top genes, FUT2 for Bifidobacterium was suggested to be the causal gene for Crohn's disease (CD) in previous study [22].

Table 5 The list of candidate genes associated with diseases


Host genes have been shown to be closely related to the ecosystem of the gut microbiota. Previous studies have detected multiple candidate genes associated with specific taxa [23,24,25]. Recent studies indicated that noncoding regulatory regions play an important role in influencing human complex traits. The gut microbiota was once suggested as a complex trait of the host affected by mbQTL [8], so we speculate that the host can influence the composition of the gut microbiota and the abundance of specific groups by regulating gene expression. In this study, TWAS was performed to prioritize candidate genes affecting gut microbiota at gene expression level by integrating GWAS summary data and specific pre-computed tissue expression profile. Finally, we identified numbers of genes and pathways related to microbial traits, and some of the genes have been reported to be associated with specific diseases by previous studies.

TWAS and fine mapping both prioritized several candidate genes for gut microbiota, such as HELLS for Streptococcus in sigmoid colon, ANO7 for Erysipelotrichaceae in sigmoid colon. We attempted to explore the relationship between gut microbiota candidate genes and diseases. HELLS encodes lymphoid specific, which participates in the establishment and maintenance of DNA methylation with chromatin remodeling through its ATPase activity [20]. HELLS expression was proved to be significantly associated with the colorectal cancer progression and a higher pathological grade [20]. Aberrant bands of the HELLS was observed in seven colorectal cancers by polymerase chain reaction-based single strand conformation polymorphism assay [26]. Streptococcus has been identified as colorectal cancer candidate pathogens in previous researches [27, 28]. ANO7 has been found to play a central role in prostate cancer progression, and its elevated expression correlates with disease severity and outcome [29]. Notably, the abundance of Erysipelotrichaceae was observed to be increased in prostate cancer patients [30]. In the treatment of prostate cancer by androgen axis targeted therapy, men receiving the treatment showed a significant decrease in the abundance of sequencing reads assigned to Erysipelotrichaceae [31]. In gut microbiota of mice, the abundance of Erysipelotrichaceae was also different between cancer bearing mice and healthy mice [32].

FUT2 was detected to be associated with Bifidobacterium in transverse colon in TWAS. FUT2 gene encodes α-1, 2-fucosyltransferase for the expression of ABH blood group antigens on mucosal surfaces, and determines the ability to secrete blood group antigens into gastrointestinal secretions. Individuals who have homozygous non-coding variants in FUT2 are nonsecretors, and ABH antigens are not expressed in mucosal secretions and surfaces, generally called as sese [33, 34]. Accordingly, secretory type was expressed as SeSe and Sese [34].

The alterations of FUT2 genotype resulted in a significant shift of microbial composition, that is, the gardening effect of FUT2 polymorphism on phylogenetic composition of the gut microbiota [34]. Present studies consistently show the genome-wide significant association between FUT2 non-secretor allele and CD in various races [22, 35]. It is suggested that FUT2 gene loss-of-function allele homozygotes change the gut microbiota of CD patients [36,37,38,39]. FUT2 polymorphism may also partly contribute to CD susceptibility by shaping community composition and structure of microbiota [36, 37]. Previous studies showed genus Bifidobacterium had higher diversity, richness and abundance in secretors compared with non-secretors [40, 41]. Moreover, increased genus Bifidobacterium is related to successful clinical outcome or remission of therapy in CD [42]. Further studies are warranted to identify the interactions between FUT2, Bifidobacterium and CD.

TWAS also identified SFTPD as a candidate gene for an unclassified genus of Proteobacteria in transverse colon. SFTPD encodes surfactant protein D, which is an important host defense lectin. It aggregates and enhances phagocytosis of microbes and dying host cells [43]. SFTPD is mainly expressed in lung, but also distributes in gallbladder and gut, and could shape intestinal microbial ecosystem [43]. Some potential evidence has carried out the link between SFTPD and phylum Proteobacteria. Nexoe et al., found a strong positive correlation between inflammatory activity and expression of SFTPD in the intestinal epithelium from Inflammatory Bowel Disease (IBD) patients [44], while the increase of Proteobacteria is one of the most consistent observations in IBD individuals [45].

SFTPD was reported exacerbating the development of atherosclerosis in previous literatures [21, 46,47,48]. In recent decades, bacterial infections and chronic inflammation have become possible causes of cardiovascular disease. Atherosclerosis is a chronic inflammatory process driven by lipids in the walls of the great arteries [49]. SFTPD has been proved to play a predominant role in pro-inflammatory [50, 51]. According to previous studies, the genus of Proteobacteria were involved in the formation of atherosclerosis. For instance, Proteus vulgaris was found to be present in the plaques and intestines of the same individual [52], Proteus mirabilis can interact with atherosclerosis plaques in human coronary arteries via specific molecular to exacerbate the progression of disease [53]. In addition, the abundance of Proteus in the blood of cardiovascular disease patients was observed to be increased compared with healthy individuals [52]. In mouse disease models, the reduction of phylum Proteobacteria abundance can exert a therapeutic effect on atherosclerosis [54]. Since the SFTPD is related to the abundance of bacteria from phylum Proteobacteria based on our findings, we hypothesized that the microbiota could affect susceptibility to atherosclerosis by genetic regulation.

KEGG_RENIN_ANGIOTENSIN_SYSTEM was detected to be associated with Anaerostipes in functional analysis. In a recent study, the fewer abundance of Anaerostipes was observed in primary aldosteronism patients than healthy individuals [55]. Bier et al. have confirmed that high salt diet could lead to decreased the abundance of taxa from the Anaerostipes genus [56]. Moreover, Anaerostipes was found to be correlated with higher estimated glomerular filtration rate in the overall population [57].

To the best of our knowledge, we conducted the first large-scale comprehensive sigmoid colon and transverse colon tissue-specific TWAS for gut microbiota, and performed fine mapping based on TWAS for further confirmation. The candidate genes for gut microbiota were further explored for the link between various taxa and diseases. Our study also has three potential limitations. First, only individuals of European ancestry from Germany and Belgium were included in the analysis, so the results cannot be generalized to other ethnic groups. Second, the information about diet and drug use of individuals is lack so that we can’t rule out the effects of diet and medication on the composition of gut microbiota. Third, it should be marked that the purpose of this study is to screen and prioritize candidate genes for gut microbiota, the results should be interpreted with caution. At present, research based on the interaction of genes and gut microbiota still needs more extensive exploration, further functional studies should be performed to confirm our findings and elucidate the mechanisms which genes act on gut microbiota.


To be conclude, we performed TWAS analyses and identified multiple candidate genes and pathways of gut microbiota. We found that some candidate genes may also involve in the susceptibility of diseases, and attempted to provide clues for revealing the influence of genetic factors on gut microbiota for the occurrence and development of diseases. Our findings may provide new insight into the influence of genetic factors on the composition of gut microbiota, in addition to suggesting the potential role of gut microbiota in the mechanism of genetic factors contributing to disease susceptibility. Further studies are needed to demonstrate specific biological mechanisms in the future.

Availability of data and materials

Raw 16S data used in the original GWAS study are available at the European Genome/ Phenome Archive under accession no. EGAS00001004420. The microbiome GWAS summary data support the findings of this study are available online at the University of Bristol data repository with the identifier The precomputed expression weights of tissues derived from the genotype-Tissue expression (GTEx) project were downloaded from FUSION websites The TWAS results of this study have been deposited in the Figshare repository at the following


  1. 1.

    Schoeler M, Caesar R. Dietary lipids, gut microbiota and lipid metabolism. Rev Endocr Metab Disord. 2019;20(4):461–72.

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Hagan T, Cortese M, Rouphael N, Boudreau C, Linde C, Maddur MS, et al. Antibiotics-driven gut microbiome perturbation alters immunity to vaccines in humans. Cell. 2019;178(6):1313-28e13.

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Plaza-Díaz J, Álvarez-Mercado AI, Ruiz-Marín CM, Reina-Pérez I, Pérez-Alonso AJ, Sánchez-Andujar MB, et al. Association of breast and gut microbiota dysbiosis and the risk of breast cancer: a case–control clinical study. BMC Cancer. 2019;19(1):495.

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Pulikkan J, Mazumder A, Grace T. Role of the gut microbiome in autism spectrum disorders. Adv Exp Med Biol. 2019;1118:253–69.

    CAS  PubMed  Google Scholar 

  5. 5.

    Fu J, Bonder MJ, Cenit MC, Tigchelaar EF, Maatman A, Dekens JA, et al. The gut microbiome contributes to a substantial proportion of the variation in blood lipids. Circ Res. 2015;117(9):817–24.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Zhernakova A, Kurilshikov A, Bonder MJ, Tigchelaar EF, Schirmer M, Vatanen T, et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science. 2016;352(6285):565–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Lim MY, You HJ, Yoon HS, Kwon B, Lee JY, Lee S, et al. The effect of heritability and host genetics on the gut microbiota and metabolic syndrome. Gut. 2017;66(6):1031–8.

    CAS  PubMed  Google Scholar 

  8. 8.

    Benson AK, Kelly SA, Legge R, Ma F, Low SJ, Kim J, et al. Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. Proc Natl Acad Sci USA. 2010;107(44):18933–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Bonder MJ, Kurilshikov A, Tigchelaar EF, Mujagic Z, Imhann F, Vila AV, et al. The effect of host genetics on the gut microbiome. Nat Genet. 2016;48(11):1407–12.

    CAS  PubMed  Google Scholar 

  10. 10.

    Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet. 2019;51(4):592–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Westra HJ, Franke L. From genome to function by studying eQTLs. Biochim Biophys Acta. 2014;1842(10):1896–902.

    CAS  PubMed  Google Scholar 

  12. 12.

    Bosse Y. Genome-wide expression quantitative trait loci analysis in asthma. Curr Opin Allergy Clin Immunol. 2013;13(5):487–94.

    CAS  PubMed  Google Scholar 

  13. 13.

    Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245–52.

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Liao C, Laporte AD, Spiegelman D, Akcimen F, Joober R, Dion PA, et al. Transcriptome-wide association study of attention deficit hyperactivity disorder identifies associated genes and phenotypes. Nat Commun. 2019;10(1):4450.

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Zhong J, Jermusyk A, Wu L, Hoskins JW, Collins I, Mocci E, et al. A transcriptome-wide association study identifies novel candidate susceptibility genes for pancreatic cancer. J Natl Cancer Inst. 2020;112(10):1003–12.

    PubMed  PubMed Central  Google Scholar 

  16. 16.

    Hughes DA, Bacigalupe R, Wang J, Rühlemann MC, Tito RY, Falony G, et al. Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. Nat Microbiol. 2020;5(9):1079–87.

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Zhou X, Carbonetto P, Stephens M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet. 2013;9(2):e1003264.

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G, Gusev A, et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat Genet. 2019;51(4):675–82.

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Watanabe K, Taskesen E, van Bochoven A, Posthuma D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun. 2017;8(1):1826.

    PubMed  PubMed Central  Google Scholar 

  20. 20.

    Liu X, Hou X, Zhou Y, Li Q, Kong F, Yan S, et al. Downregulation of the helicase lymphoid-specific (HELLS) gene impairs cell proliferation and induces cell cycle arrest in colorectal cancer cells. Onco Targets Ther. 2019;12:10153–63.

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Sorensen GL, Bladbjerg EM, Steffensen R, Tan Q, Madsen J, Drivsholm T, et al. Association between the surfactant protein D (SFTPD) gene and subclinical carotid artery atherosclerosis. Atherosclerosis. 2016;246:7–12.

    CAS  PubMed  Google Scholar 

  22. 22.

    Ye BD, Kim BM, Jung S, Lee HS, Hong M, Kim K, et al. Association of FUT2 and ABO with Croh’s disease in Koreans. J Gastroenterol Hepatol. 2020;35(1):104–9.

    CAS  PubMed  Google Scholar 

  23. 23.

    Lamas B, Richard ML, Leducq V, Pham HP, Michel ML, Da Costa G, et al. CARD9 impacts colitis by altering gut microbiota metabolism of tryptophan into aryl hydrocarbon receptor ligands. Nat Med. 2016;22(6):598–605.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Ruan JW, Statt S, Huang CT, Tsai YT, Kuo CC, Chan HL, et al. Dual-specificity phosphatase 6 deficiency regulates gut microbiome and transcriptome response against diet-induced obesity in mice. Nat Microbiol. 2016;2:16220.

    CAS  PubMed  Google Scholar 

  25. 25.

    Vijay-Kumar M, Aitken JD, Carvalho FA, Cullender TC, Mwangi S, Srinivasan S, et al. Metabolic syndrome and altered gut microbiota in mice lacking Toll-like receptor 5. Science. 2010;328(5975):228–31.

  26. 26.

    Choi YJ, Yoo NJ, Lee SH. Mutation of HELLS, a chromatin remodeling gene, gastric and colorectal cancers. Pathol Oncol Res. 2015;21(3):851–2.

    PubMed  Google Scholar 

  27. 27.

    Gagnière J, Raisch J, Veziant J, Barnich N, Bonnet R, Buc E, et al. Gut microbiota imbalance and colorectal cancer. World J Gastroenterol. 2016;22(2):501–18.

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Cheng Y, Ling Z, Li L. The intestinal microbiota and colorectal cancer. Front Immunol. 2020;11:615056.

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Kaikkonen E, Rantapero T, Zhang Q, Taimen P, Laitinen V, Kallajoki M, et al. ANO7 is associated with aggressive prostate cancer. Int J Cancer. 2018;143(10):2479–87.

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Kalinen S, Kallonen T, Gunell M, Ettala O, Jambor I, Knaapila J, et al. Gut microbiota signatures associate with prostate cancer risk. medRxiv. 2021;80:134.

    Google Scholar 

  31. 31.

    Sfanos KS, Markowski MC, Peiffer LB, Ernst SE, White JR, Pienta KJ, et al. Compositional differences in gastrointestinal microbiota in prostate cancer patients treated with androgen axis-targeted therapies. Prostate Cancer Prostatic Dis. 2018;21(4):539–48.

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Marco A. De Velasco KS, Yurie Kura, Eri Banno, Naomi Ando, Noriko Sako, Kazuhiro Yoshikawa, Kazuto Nishio, Hirotsugu Uemura, editor Abstract 3340: Prostate cancer alters gut microbiota in mice. AACR Annual Meeting; 2020; Philadelphia.

  33. 33.

    Ferrer-Admetlla A, Sikora M, Laayouni H, Esteve A, Roubinet F, Blancher A, et al. A natural history of FUT2 polymorphism in humans. Mol Biol Evol. 2009;26(9):1993–2003.

    CAS  Google Scholar 

  34. 34.

    Tong M, McHardy I, Ruegger P, Goudarzi M, Kashyap PC, Haritunians T, et al. Reprograming of gut microbiome energy metabolism by the FUT2 Crohn’s disease risk polymorphism. ISME J. 2014;8(11):2193–206.

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35.

    McGovern DP, Jones MR, Taylor KD, Marciante K, Yan X, Dubinsky M, et al. Fucosyltransferase 2 (FUT2) non-secretor status is associated with Crohn’s disease. Hum Mol Genet. 2010;19(17):3468–76.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Rausch P, Rehman A, Künzel S, Häsler R, Ott SJ, Schreiber S, et al. Colonic mucosa-associated microbiota is influenced by an interaction of Crohn disease and FUT2 (Secretor) genotype. Proc Natl Acad Sci USA. 2011;108(47):19030–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Wacklin P, Tuimala J, Nikkilä J, Sebastian T, Mäkivuokko H, Alakulppi N, et al. Faecal microbiota composition in adults is associated with the FUT2 gene determining the secretor status. PLoS ONE. 2014;9(4):e94863.

    PubMed  PubMed Central  Google Scholar 

  38. 38.

    Rao S, Shi M, Han X, Lam MHB, Chien WT, Zhou K, et al. Genome-wide copy number variation-, validation- and screening study implicates a new copy number polymorphism associated with suicide attempts in major depressive disorder. Gene. 2020;755:144901.

    CAS  PubMed  Google Scholar 

  39. 39.

    Rausch P, Künzel S, Suwandi A, Grassl GA, Rosenstiel P, Baines JF. Multigenerational influences of the Fut2 gene on the dynamics of the gut microbiota in mice. Front Microbiol. 2017;8:991.

    PubMed  PubMed Central  Google Scholar 

  40. 40.

    Wacklin P, Mäkivuokko H, Alakulppi N, Nikkilä J, Tenkanen H, Räbinä J, et al. Secretor genotype (FUT2 gene) is strongly associated with the composition of Bifidobacteria in the human intestine. PLoS ONE. 2011;6(5):e20113.

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Kumbhare SV, Kumar H, Chowdhury SP, Dhotre DP, Endo A, Mättö J, et al. A cross-sectional comparative study of gut bacterial community of Indian and Finnish children. Sci Rep. 2017;7(1):10555.

    PubMed  PubMed Central  Google Scholar 

  42. 42.

    Neurath MF. Host-microbiota interactions in inflammatory bowel disease. Nat Rev Gastroenterol Hepatol. 2020;17(2):76–7.

    PubMed  Google Scholar 

  43. 43.

    Holmskov U, Thiel S, Jensenius JC. Collections and ficolins: humoral lectins of the innate immune defense. Annu Rev Immunol. 2003;21:547–78.

    CAS  PubMed  Google Scholar 

  44. 44.

    Nexoe AB, Pilecki B, Von Huth S, Husby S, Pedersen AA, Detlefsen S, et al. Colonic epithelial surfactant protein D expression correlates with inflammation in clinical colonic inflammatory bowel disease. Inflamm Bowel Dis. 2019;25(8):1349–56.

    PubMed  Google Scholar 

  45. 45.

    Matsuoka K, Kanai T. The gut microbiota and inflammatory bowel disease. Semin Immunopathol. 2015;37(1):47–55.

    CAS  PubMed  Google Scholar 

  46. 46.

    Hirano Y, Choi A, Tsuruta M, Jaw JE, Oh Y, Ngan D, et al. Surfactant protein-D deficiency suppresses systemic inflammation and reduces atherosclerosis in ApoE knockout mice. Cardiovasc Res. 2017;113(10):1208–18.

    CAS  PubMed  Google Scholar 

  47. 47.

    Barrow AD, Palarasah Y, Bugatti M, Holehouse AS, Byers DE, Holtzman MJ, et al. OSCAR is a receptor for surfactant protein D that activates TNF-alpha release from human CCR2+ inflammatory monocytes. J Immunol. 2015;194(7):3317–26.

    CAS  PubMed  Google Scholar 

  48. 48.

    Hu F, Zhong Q, Gong J, Qin Y, Cui L, Yuan H. Serum surfactant protein D is associated with atherosclerosis of the carotid artery in patients on maintenance hemodialysis. Clin Lab. 2016;62(1–2):97–104.

    CAS  PubMed  Google Scholar 

  49. 49.

    Chistiakov DA, Melnichenko AA, Myasoedova VA, Grechko AV, Orekhov AN. Role of lipids and intraplaque hypoxia in the formation of neovascularization in atherosclerosis. Ann Med. 2017;49(8):661–77.

    CAS  PubMed  Google Scholar 

  50. 50.

    Sorensen GL. Surfactant protein d in respiratory and non-respiratory diseases. Front Med (Lausanne). 2018;5:18.

    Google Scholar 

  51. 51.

    Colmorten KB, Nexoe AB, Sorensen GL. The dual role of surfactant protein-D in vascular inflammation and development of cardiovascular disease. Front Immunol. 2019;10:2264.

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Klionsky DJ, Abdel-Aziz AK, Abdelfatah S, Abdellatif M, Abdoli A, Abel S, et al. Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)(1). Autophagy. 2021;17(1):1–382.

    Google Scholar 

  53. 53.

    Xue Y, Li Q, Park CG, Klena JD, Anisimov AP, Sun Z, et al. Proteus mirabilis targets atherosclerosis plaques in human coronary arteries via DC-SIGN (CD209). Front Immunol. 2020;11:579010.

    CAS  PubMed  Google Scholar 

  54. 54.

    Li F, Zhang T, He Y, Gu W, Yang X, Zhao R, et al. Inflammation inhibition and gut microbiota regulation by TSG to combat atherosclerosis in ApoE(−/−) mice. J Ethnopharmacol. 2020;247:112232.

    CAS  PubMed  Google Scholar 

  55. 55.

    Liu Y, Jiang Q, Liu Z, Shen S, Ai J, Zhu Y, et al. Alteration of gut microbiota relates to metabolic disorders in primary aldosteronism patients. Front Endocrinol (Lausanne). 2021;12:667951.

    PubMed  PubMed Central  Google Scholar 

  56. 56.

    Bier A, Braun T, Khasbab R, Di Segni A, Grossman E, Haberman Y, et al. A high salt diet modulates the gut microbiota and short chain fatty acids production in a salt-sensitive hypertension rat model. Nutrients. 2018;10(9).

  57. 57.

    Mazidi M, Shekoohi N, Covic A, Mikhailidis DP, Banach M. Adverse impact of Desulfovibrio spp. and beneficial role of Anaerostipes spp. on renal function: insights from a mendelian randomization analysis. Nutrients. 2020;12(8).

Download references


Not applicable.


This study was supported by the National Natural Scientific Foundation of China (81673112, 81703177), the Key projects of international cooperation among governments in scientific and technological innovation (2016YFE0119100), the Natural Science Basic Research Plan in Shaanxi Province of China (2017JZ024), and the Fundamental Research Funds for the Central Universities.

Author information




CP and FZ conceived and designed the study; CP and YN wrote the manuscript; SC collected the data and carried out the statistical analyses; YJ, YW, CL, HZ, JC, JZ, ZZ, XY and PM made preparations for the manuscript at first. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Feng Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

All authors gave their consent for publications.

Competing interests

The authors state that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Table S1. The number of candidate genes for each microbial traits identified by TWAS.

Additional file 2

: Table S2. TWAS results for gut microbiota in sigmoid colon.

Additional file 3: Table S3.

TWAS results for gut microbiota in transverse colon.

Additional file 4: Table S4.

Top 14 overlapped candidate gene for defferent microbial traits.

Additional file 5: Table S5.

Fine mapping results for gut microbiota in sigmoid colon.

Additional file 6: Table S6.

Fine mapping results for gut microbiota in transverse colon.

Additional file 7: Table S7.

Functional analyses results for microbial traits.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pan, C., Ning, Y., Jia, Y. et al. Transcriptome-wide association study identified candidate genes associated with gut microbiota. Gut Pathog 13, 74 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Gut microbiota
  • Transcriptome-wide association study (TWAS)
  • Genome-wide association study (GWAS)
  • Pathway