Skip to main content

Table 3 List of all the data that was utilized in the microbe–disease prediction

From: In-silico computational approaches to study microbiota impacts on diseases and pharmacotherapy

Data

Source

Original state

Similarity process

URL

HMDAD

The HMDAD database provides documentation of population disorders of disease-related microorganisms in PubMed

HMDAD integrated 483 disease-microbe entries which include 39 diseases and 292 microbes

They're reduced to 450 known MDAs that are then utilized to calculate GIP kernel, Cosine, and Spearman correlation similarity

https://www.cuilab.cn/hmdad

PERYTON

The content of Peryton is entirely supported by the manual curation of biomedical journals. Using reference tools to construct database dictionaries, diseases and Microbiota are supplied in a well-structured, well-organized format

There are currently over 7,900 entries in the database, which link 43 diseases and 1,396 microorganisms

Peryton also provides interactive visualizations, and the data may be downloaded straight to your computer for local storage and analysis

https://dianalab.e-ce.uth.gr/peryton/

GEN-BASED

On DisGeNET, you may find GDAs from UNIPROT, CGI, ClinGen, Genomics England, CTD (human subset), PsyGeNET, Orphanet, and those produced from text mining MEDLINE abstracts

Between 17 549 genes and 24 166 diseases, there are 628 685 GDAs covered. There are 37 diseases mapped, 1850 chromosomes, and 2715 GDAs Size/coverage in HMDAD

The neighbor-based similarity approach calculates GDA scores which were used to find further commonalities among a selection of disorders

https://www.disgenet.org

SYMPTOM-BASED disease data

HSDN pulls data from PubMed's large-scale medical bibliographic records of disease–symptom correlations

Simultaneous counting and TF-IDF weight values for 322 symptoms and 4442 disorders, with 147 97 connections and 22 mapped diseases, 269 symptoms, and 1858 associations of disease symptoms

The symptom-based illness similarity is calculated using Co-occurrence TF-IDFs between one illness and other symptoms

https://www.nature.com/articles/ncomms5212

Semantics-based disease data

MeSH trees are in the National Library of Medicine for a hierarchical definition of disease

Hierarchical trees systematically describe a variety of diseases 33 diseases of size/coverage mapped in HMDAD

The DAG-based semantic similarity of two disease trees made up of hierarchical descriptors is calculated

https://meshb.nlm.nih.gov/search

PROTEIN

STRING is a database that collects protein–protein interactions and data on proteins from several sources

At the species level, 1391 microbes were mapped, with gene neighbor scores of 932 370 pairs of COGs

The neighborhood score is used to determine if there is an edge between two COGs. Also provides interactive visualizations

https://string-db.org

Comprehensive Antibiotic Resistance Database (CARD)

A carefully curated resource offering high-quality reference material on the molecular basis of antimicrobial resistance (AMR), with a focus on the genes, proteins, and mutations implicated in AMR

CARD found 2441 model reference sequences, 853 single nucleotide alterations, as well as an increasing number of indels, frame shift, and nonsense mutations linked to antimicrobial resistance

Additional search criteria include mutations conferring AMR (if relevant) and curated BLAST(P/N) bit score cut-offs are included in the ontology

https://card.mcmaster.ca/

Disbiome

Created in 2018, is a more comprehensive database that is constantly updated every three months

As of December 2019, the Disbiome database includes 322 diseases, 1,470 microbiome organisms, and 9,102 experiments published in 1,018 scholarly articles

The human annotation guarantees a clear and organized presentation of the material that is accessible

https://disbiome.ugent.be/home/

MicroPhenoDB

There are 5677 non-redundant correlations between 1781 microorganisms and 542 human illness phenotypes across more than 22 human body locations in this study

In addition, MicroPhenoDB has 696,934 connections between 27,277 clade-specific core genes and 685 microorganisms

The software allows scientists to search DNA and RNA sequences for potential pathogens without running the usual meta-genomic data processing and assembly steps

http://www.liwzlab.cn/microphenodb