Drawbacks of Boosting:
- Boost will overfit the model by combining the weak learner.
- Very high training time.
- Training a face detector will take 2 weeks on a modern computer.
The variant of Ada Boost:
Discrete Ada boost is used as an improved variant of Ada boost. In which instead calculating the errors, we compute the probability of correct classification.
BIAS-VARIANCE trade Off:
Bagging aggregates predictions of its base estimators and produces a final prediction. This is a measure designed to reduce over-fitting (variance) - however, this leads to an increase in bias, which is compensated for by the reduction in variance though.
Boosting combines several weak learners to give a single active learner by iteratively training the weak learners, then learning from their errors and misclassifications and improving on them further. This leads to an increase in variance but reduces bias significantly.
The variance-bias trade-off is up to the user.
Other Boosting Examples:
- Brown Boost: Brown boost is a boosting algorithm that is robust to the noisy data.
- So boosting: It can be used for semi-supervised learning in the case in which there is redundancy in features.
- LpBoost: Lp-Boost belongs to a boosting family. It maximizes the margin between training samples of the given data and hence also belongs to the class maximizing algorithm of supervised learning.
Real Life Applications of Ensemble Boosting:
- Image Recognition
- Credit Card Fraud
- Genes Classification
- Medical diagnoses
Ensemble methods can help you win machine learning competitions by devising sophisticated algorithms and producing results with high accuracy, the effectiveness of these methods is undeniable, and their benefits in appropriate applications can be tremendous. In fields such as healthcare, even the smallest amount of improvement in the accuracy of machine learning algorithms can be something precious.
Now we will develop a small project in which we use two classifiers of decision tree family and will be using a regression technique to forecast the bitcoin price data.
This small project is available on my GitHub. The link is as follows:
So let's dig into the code.
So first we import all the library that we will be using in this code.
After that, we scrape the tabular data from the website using pandas read HTML. The data look like this.
It contain the dates ,open price, highest price, closing price, and lowest price. Moreover, it contains volume and market cap.
After that, we will make the data clean because without cleaning the data our model will not be able to forecast correctly because there are many empty rows. We have to remove such rows and make the data more acceptable before feeding it to the algorithm.
If we look upon the data, we see few rows in the last have “-” value in the volume, so we have to remove these rows and change the type of volume and market cap to int to reduce somehow its dimensionality.
The date time is given as a string. The plotting will not recognize the string when we try to plot it on the x-axis. So before this, we first have to convert the string into proper date time format that python can understand.
After that, we set the date as an index. The plotting library will automatically understand that the index has to be on the x-axis. We then drop the old index and plot the closing of bitcoin. The graph looks like this.
We can see that data starts in 2014 and ends in 2019 which is approximately 5 years of data.
Now we split the data. Now we will train the model on two decision tree — the first one in a random forest regression technique. Regression is because the data is continuous, not categorical.
The second one is the XG booster which is a boosting technique. The accuracy of random forest is 92, and XG boost is of 92%, which makes it a good classifier.
We now have to shift the data. The shifting of data will be of 30 days. So now we will shift the data.
When the data is shifted than we predict this data.
Now we will create the index of the next 30 days. Now we will predict the shifted data.
The green line is the forecasted.