Using TQDM
TQDM is awesome package. It's really buggy with its progress bar rendering for cells within notebooks, but it work great in Python script that y..{
Decision Tree Classifier
A decision tree classier is a straightforward tree-like model. The classifier is just a decision tree and split the classes on each layer via a ..{
Confusion Matrices
Confusion matrices are commonly used in most classifications problems. I used constantly in a recent fraud detection challenge to see the potent..{
Crosstab Table
The panda's crosstab function is really useful for creating when you want to create pivot tables for dimensional features. with one line of code..{
Downsampling
In supervised learning, many datasets contain data that is class imbalanced. Therefore you will have to downsample the majority class to match ..{
Import Matlab Data
Some data that is provided by universities appears as a .mat file. These file type is unique to matplotlib and can be imported via the io functi..{
Cross Validation and K-Fold
Cross-Validation is the general scoring technique for the model using only the training data set. The K-Fold cross-validation is a better heuri..{
DBSCAN
DBSCAN stands for density-based spatial clustering of applications with noise. The algorithim select random points on the hyperplane and if the ..{
Heatmaps
Heatmaps are crazy useful. I use them as diagnostic plots to take a look at the feature correlation in my data frame and to understand the cro..{
Histograms
Plotting histograms for distributions is a common task in every dataset. The basic blue histogram can be boring and dull in matplotlib. Therefo..{
K-Nearest Neighbours Classifier
KNN is a very simple machine learning algorithim. Given a distance parameter and nearest neighbours parameter. The algorithim use premise that ..{
Null Values
There are multiple ways to handle missing data. Some people come up with some very creative solutions. This notebook contains some basic method..{
Plotting Residuals
Plotting you residuals for regression problems is crazy useful. This is another diagnostic plot that you can use to figure out if you did someth..{
Randomized Grid Search
Manual hyperparameter searching? No way. Scikit Learn has a got an amazing random grid search function that can give us a hint into the best par..{
Standardization
As I have learned recently from a recent model. Standardization is recommended before training any machine learning model. If you wanted to scal..{