Skip to main content
Fig. 2 | Microbial Cell Factories

Fig. 2

From: Machine learning for data integration in human gut microbiome

Fig. 2

The main categories of machine learning algorithms for analysis of the gut microbiome. a Characteristics of supervised and unsupervised learning. Supervised learning can learn a function to map the independent variables (features) with the known dependent variable (called label) from a training dataset, whereas unsupervised learning methods purely learn and discover novel hidden patterns from the given dataset without the dependent variable, i.e., unlabeled as in the box. S1-S4 in row and Feat1-Feat4 in column represent different samples and features, respectively. For supervised learning, labels in various colors indicate different continuous values or classes; b clustering analysis. As an unsupervised learning method, it purely discovers novel patterns from a dataset based on similarities or dissimilarities between training samples. For example, here samples can be stratified into the four clusters by k-means clustering that minimizes the within-cluster sum of squares. Each color denotes one cluster; c relationships between decision trees, RF and gradient boosting; d comparison of XGBoost and LightGBM. XGBoost splits the tree level-wise (also called depth-wise), while LightGBM splits the tree leaf-wise. The decision node in red color represents the node can be split into children node at each layer; e deep learning. In a deep neural network architecture, multiple (here two) hidden layers (blue color) are connected in a cascade fashion between input (green color) and output layers (red color). Each of these layers takes input from its previous layer and transforms the data into a more abstract form as an output for next layer

Back to article page