Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
Random Forest is one of the widely used technique in machine learning. In online machine learning competition this algorithm is used widely by winning teams. This can be used as both classification and regression technique.
Random decision forests correct for decision treesâ€™ habit of overfitting to their training set
In decision tree,we used greedy algorithm to create a single decision
tree with the data. We train algorithm with cross validation techniques
to avoid overfitting. In decision tree, model is given by a single
tree. Predication is done by traversing the tree from top to bottom.Decision are taken at leaf nodes.
Random Forest means forest of trees. It contains large number of decision tree which helps in taking a decision. Each tree is made up of same strategy of making single decision tree. While taking a decision, we take vote for all small decision tree and decide the class by majority votes.
For example, In a binary classification problem, with different setting, we can create hundreds of decision tree. For predication, if 80% trees are class 1 (moving from leaf node from root ) and 20% trees are saying class 2, decision will be class 1.
Using random forest instead of decision trees are advantages that are follows:
- Random Forest can handle missing data.
- They can be used for both supervised learning tasks.
- They can handle large dimensionality in the data.
- Also, Using multiple tress instead of single reduces over-fitting possibility.
Some disadvantage of Random Forest are given below:
- We have very little control over the model.
- As the model is combination of trees, our model is much complex as compare to single decision tree.
- It is difficult to explain because of hundreds or thousands of trees.
Random Forest Algorithm
Random forest are also known as Ensemble Technique. These techniques are divide and conquer approach. It uses small number of weak learner to generate a strong learner. In random Forest, small learners are small trees. Together with the power of majority voting they make strong learner.
Steps for random forest algorithm:
- Consider training data set with N records. Create samples by taking N records from this data set with replacement.
- It means first we select a record from N records randomly. Now after choosing it, we again choose a record from N records. In this strategy,record may repeat.
- Select a number m which is less than number of attributes K. These are the number of attributes considered randomly from total attributes at the time of building small tree(m<k).
- Build p number of trees with different samples and randomly chosen m attributes.
- Grow each tree without pruning consider all attribute or where no more possibility of division(Only one class left).
- Make predictions by using majority voting of all trees.