random forests art

What are Random Forests

Random forests (RF) are a popular machine learning technique for both classification and regression. Furthermore, it involves an ensemble of decision trees, which we train on random subsets of the input data.

How does random forests algorithm work

For example, if we have 100 samples in the training data, we will train each decision tree on 100 examples. However, since each decision tree trains on a different subset, we need to duplicate some examples in that subset to get 100 total examples.

Therefore, each decision tree trains to specialize in their own training subset. We know this type of data sampling by the name of bagging (bootstrap aggregating).

During training, each decision tree in the forest learns to make a prediction based on a sequence of if-else conditions. In continuation, the algorithm applies these conditions to the input features.

Pros and cons

One of the key advantages of random forests is their ability to handle high-dimensional data and large datasets. They are also relatively easy to use and interpret, which makes them a popular choice in many applications.

RF are also robust to noise, outliers and they can handle missing values in the input data. Another great advantage is that they can provide a measure of feature importance. This can be useful for understanding the underlying structure of the data.

However, decision trees tend to overfit, but this problem is resolved once we take into account output from all the trees inside the RF. This is also why most prefer to use random forests rather than single decision trees.


To conclude, random forests is a powerful and versatile technique for machine learning tasks. It performs particularly good in applications with high-dimensional data and large datasets.

I hope this article helped you gain a better understanding about random forests in machine learning and perhaps even inspire you to learn even more.

Share this article:

Related posts