Overfitting of decision tree and tree pruning, How to avoid overfitting in data mining

Overfitting of tree:

Before overfitting of tree, let’s revise test data and training data;

Training Data:

Training data is the data that is used for prediction.

Test Data:

Test data is used to assess the power of training data in prediction.


Overfitting means too many un-necessary branches in the tree. Overfitting results in different kind of anomalies that are the results of outliers and noise.

[quads id=1]

How to avoid overfitting?

There are two techniques to avoid overfitting;

  1. Pre-pruning
  2. Post-pruning


Pree-Pruning means to stop the growing tree before a tree is fully grown.

2. Post-Pruning:

Post-Pruning means to allow the tree to grow with no size limit. After tree completion starts to prune the tree.

Advantages of pree-pruning and post-pruning:

  • Pruning controls to increase the tree un-necessary.
  • Pruning reduces the complexity of tree.