Skip to main content

Table 1 Summary of machine learning applications in gut microbiome studies.

From: Machine learning for data integration in human gut microbiome

Category

Predictive task

Algorithm

Performance

Sample size

Data type

Data source

Reference

Phenotypic prediction

T2D risk

SVM; mRMR

AUC = 0.81

345

MG

SRA045646

[4]

RF

AUC = 0.83

96

MG

ERP002469

[5]

LightGBM

AUC = 0.73

1832

16 S rRNA

CNP0000829

[48]

CRC risk

Lasso

AUC > 0.8

141

MG

ERP005534

[129]

mRMR

AUC = 0.77

96

MG

ERP008729

[20]

CVD risk

RF

AUC = 0.7

951

16 S rRNA

American Gut Project [130]

[19]

IBD risk

RF

AUC > 0.86

155

MG; metabolomics

PRJNA400072; PR000677

[12]

 

MetaNN

AUC = 0.89

425

16 S rRNA

PRJNA237362

[131]

Cholera

SVM

AUC = 0.8

76

16 S rRNA

PRJEB17860

[132]

Obesity

RF

AUC = 0.66

253

MG

ERP003612

[8, 133]

 

MVIB

AUC = 0.66

[134]

Hypertension

RF

AUC = ~ 0.9

196

MG; metabolomics

PRJEB13870

[119]

Liver cirrhosis

DeepMicro + SVM

AUC = 0.9

237

MG

ERP005860

[54, 135]

EPCNN

AUC = 0.95

[136]

Alcoholic hepatitis

Logistic regression

AUC = 0.89

43

MG; metabolomics

ERP106878

[121]

Recommended therapeutics

Infliximab treatment

RF

AUC > 0.86

16

16 S rRNA

PRJEB22028

[96]

 

Immunotherapy

RF

AUC = 0.6

103

MG

PRJEB22893; PRJNA399742

[137]

Personalized nutrition

Glucose response

Gradient boosting

PCC = ~ 0.7

800

16 S rRNA

PRJEB11532

[118]

Stratification

Enterotypes

PAM Clustering

3 clusters

154

16 S rRNA

NCBI SRA

[94]

 

2 clusters

25

MG

–

[17]

 

2 clusters

98

16 S rRNA

SRX020773

[74]

 

Identification of CAGs

Canopy-based clustering

7,381 CAGs

396

MG

ERP002061

[95]

  1. These applications have been mainly classified into phenotypic prediction, precision medicine and stratification of population
  2. SRA*, SRX*, ERP* and PRJ* from NCBI Short Read Archive (SRA) or EMBL European Nucleotide Archive (ENA); CNP* from Sequence Archive of China National GeneBank (CNGB); Metabolomics data PR000677 from the National Institutes of Health Common Fund’s Metabolomics Data Repository and Coordinating Center
  3. PCC Pearson correlation coefficient of predicted and measured values, PAM partitioning around mediods, CAGs Co-abundance gene groups, MG metagenomics