Bagging

Ensemble methods combine multiple classifiers into a single output. Some ensemble methods may combine different types of classifiers, but the ones we will focus on here combine multiple iterations of the same type of classifier. These methods belong to a family of ensemble methods called “Perturb and Combine”.

Perturb and Combine

Some methods of classification are extremely sensitive to noise in the data. In these, small changes in the data can have huge impacts on the model itself. We call these unstable models. However, the instability does not hugely affect the accuracy of the model.

Perturb and Combine models attempt to take advantage of this instability by iteratively perturbing the data and training a new classifier. At the end, we combine the models together by voting. Some perturbation methods include

  • Resampling – Taking additional samples from a population
  • Subsampling – Taking samples from the original dataset
  • Adding Noise – Adding noise to the original dataset
  • Adaptively Reweight – This is for the Boosting Algorithm

Perturb and Combine methods attempt to train the model to signal in the data, and drown out the noise. For that purpose, we perturb the data and train models, then combine the trained classifiers. The idea is that while every model will be wrong to some extent, by using models with high variance we hope they will be wrong in different ways. In this way, by averaging we will get a model that is reliably better than any individual model. This guide will focus on Bagging, which falls under the Subsampling Method.

Bagging

Bagging stands for Bootstrap Aggregation. Bootstrapping is a method of iteratively subsampling the data. For a given dataset, we draw a sample of size t \leq n with replacement from the original dataset. The goal is that iteratively drawing from the existing sample, we can simulate new draws from the population. Aggregation refers to aggregating the class votes from the models trained on each of the bootstrap samples.

  • For i in 1 to B (the number of bootstrap samples)
    • Take a bootstrap sample of size t\leq n: X* =(X_1*,\ldots,X_t*)
    • Train a model on X*
  • Now you have a set of trained classifiers to make predictions with. The final aggregate classifier can be obtained by averaging the output probabilities.

Notes for Use

Bagging reduces Variance. Each single classifier is unstable, meaning it has high variance. The aggregated classifier reduces the variance, without increasing the bias. (assuming we’re using a low bias model.) Bagging works well for unstable learning algorithms. However, bagging can actually slightly degrade the performance of stable learning agorithms, so it’s only appropriate on models with low bias and high variance. A good example meeting these criteria are tree based models, as we’ll see.

Random Forest

The Random Forest method is an extension of bagging when applied to tree models. Random Forest attempts to increase the variance of individual tree models, with the hope of capturing more information about the data. In Random Forest, we attempt to increase the variance of the individual models in 2 ways:

  • We introduce the m_{try} concept. At each split, rather than trying every possible split on every variable, we randomly select m_{try} variables from the set and attempt to make a split on one of those variables
  • We do not prune the individual models. We leave them in their fully grown state

Random Forests can be implemented in R using the randomForest package.