Random Forest

NIAD+ML

ML algorithms

Random Forest

Random Forest is a powerful and widely used machine learning algorithm that belongs to the ensemble learning family.

The “forest” is made up of a collection of decision trees, where each tree is built using a random subset of the training data (via the bagging method, sometimes pasting).

When splitting a node, RF searches for the best feature among a random subset of features: instead of searching for the very best feature when splitting a node, it searches for the best feature among a random subset of features, the algorithm results in greater tree diversity.

Classification can be done by assigning the class that obtains the majority of votes (hard voting), or assigning the class that obtains the highest average probability (soft voting). When sampling (either with bagging or pasting), some samples are omitted during training (out-of-bag samples).

Extremely Randomized Trees (Extra-Trees) use random thresholds when splitting a node. This reduces the training time (finding the best threshold is time-consuming).

Feature importance with Random Forest

Gini Importance: The Gini importance calculates how much each feature contributes to the reduction in impurity or the increase in homogeneity when splitting data.

Features that result in significant impurity reduction are considered more important.

Decision Tree Linear regression