Out of Bag Estimation Error:
In Breiman's unique execution of the random forest algorithm, each tree is prepared on around 2/3 of the information or dataset, and the remaining 1/3 information is gotten out of the sack. As the tree is manufactured, each tree would thus be able to be tried on the examples not utilized in a structure that tree. It is the out of the pack mistake gauge - an inner blunder gauge of arbitrary timberland as it is being built.
In random forests, there is no requirement for cross-validation or a different test set to get an unbiased estimate of the test set error. It is evaluated internally, during the run, as follows:
Each tree is built using another bootstrap sample from the primitive data. Around one-third of the cases are let well enough alone for the bootstrap sample and not utilized in the development of the kth tree.
Put each case left out in the development of the kth tree down the kth tree to obtain a classification. Hence, a test set classification is gathered for each case in around one-third of the trees. Toward the finish of the run, take j to be the class that got a large portion of the votes each time case n was oob.
The extent of times that j isn't identical to the actual class of n found the mean value of overall cases is the oob error estimate. It has proven to be unbiased in many tests.
The above example shows how Out Of Bag estimation is performed.
Uses of Random Forest:
- Image Classification
- DNA studies.
- Diseases prediction.
- Regression Problems.
- Stock Market
- Easy to use the method.
- It is helpful for both classification and regression problems solving.
- No pruning
- The classifier won't over-fit the model.
The limitation of random forest is that a large number of trees can make the algorithm too slow and ineffective for the prediction.
It is a two-step approach, where one first uses subsets of the original data to produce a series of averagely performing model and then boost. Unlike bagging, in the classical boosting, the creation of a small tree that is subsets is not random, and it depends on the performance and the error rate of the previous one.
- Sequential ensemble: try to add new models that do well where previous models lack.
- Aim to decrease bias
- Suitable for low variance and high bias models
Standard boosting techniques involve:
- Gradient Boosting
- Adaptive Boosting
Adaptive Boosting (Ada Boost):
Ada boost works on improving those areas or the features where the base learner fails. The base learner in machine learning is said to weak learners. When a boosting technique is applied, these weak learners combine to form a strong learner. Any of the machine learning algorithms that take weights as a parameter on a training data can be used for a base learner.
Suppose we take the training data and randomly sample points from this data and apply the ML algorithm to classify the points. After sampling from training data and applying to the ML algorithm, the models fit, as shown below.
You can see that we have allowed equivalent loads to every datum point and connected a choice stump to group them as + (plus) or – (minus). The choice stump (D1) has created a vertical line at the left side to characterize the information focuses. We see that this vertical line has mistakenly anticipated three + (plus) as – (minus). In such a case, we'll allot higher loads to these three + (plus) and apply another choice stump.
You can see the size of the previously miss-classified is bigger. It is due to the high weighted average is assigned to them. Again a line has been drawn, and this time three miss-classification occurs.
Previously miss-classification is given high weights, and that is classified correctly, their weights are decreased. Now again, the line is drawn to separate the two classes.
Now we combine all three stages, and we will get a strong learner derived from a weak learner.
It works well when the data is similar to each other. It combines all the weak learner and fit them on different training data with different weights. It initiates with the original dataset and gives equal weights to each target variable. If the algorithm makes the wrong prediction, it gives more weight to it so that next time probability becomes high when randomly selecting the trees.
We can use any ML algorithm as a base classifier in the Ada boost if it accepts weights.
Initially, all the data points are given an equal number of weights so that there is no discrimination at the starting point, and all points are considered equally. The weight that is given initially is 1/N, where N is the number of samples.
If classification is correct than weight is reduced by the formula:
correct_classification = old_weight * exp( -1 * learning_rate ) / base_1
And if points are wrongly classified than weights are increased by the formula:
wrong_classification = old_weight * exp( +1 * learning_rate ) / base_1
Where base_1 are all the data points.
- N estimator: It controls the number of weak learners.
- Learning rate: Control the contribution of weak learners in the final combination.
- Base Estimator: It helps to specify different ML algorithms.
Ada boost is very sensitive to noisy data and outliers and can easily be defeated by the noisy data.
Gradient boosting is another boosting technique. Here it also tries to create a strong learner from the weak learner by minimizing the loss function. The gradient boosting method builds trees one at a time where each new tree helps to correct an error made by the previously trained tree.
In this lecture, we deep dive into the random forest, which is a variant of decision tree and other variants too. These all variant have their importance.